- 
                Notifications
    You must be signed in to change notification settings 
- Fork 214
improve error reporting for failing shell commands (and EasyBuild crashes) #4351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…asyBuildError being raised, add crash hook, improve general error reporting
…ll command, log command output before log message that mentions success of failure for command
…ul error reporting incl. step name and easyconfig filename
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the new format of the error messages, it's indeed more clear.
        
          
                easybuild/tools/run.py
              
                Outdated
          
        
      | def pad_4_spaces(msg): | ||
| return ' ' * 4 + msg | ||
|  | ||
| cmd_name = err.cmd.split(' ')[0] | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems not needed, it's already available in err.cmd_name
        
          
                easybuild/tools/run.py
              
                Outdated
          
        
      | super(RunShellCmdError, self).__init__(msg, *args, **kwargs) | ||
|  | ||
|  | ||
| def print_run_shell_cmd_error(err): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could make this part of RunShellCmdError, something like RunShellCmdError.print(). Then we simplify its use to something like
except RunShellCmdError as err:
    err.print()
        
          
                easybuild/tools/run.py
              
                Outdated
          
        
      | tmpdir = tempfile.mkdtemp(prefix='shell-cmd-error-') | ||
| output_fp = os.path.join(tmpdir, f"{cmd_name}.out") | ||
| with open(output_fp, 'w') as fp: | ||
| fp.write(err.output or '') | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file has the same contents as the log file of the corresponding step, isn't it? For instance, using the example in your description, this creates /tmp/eb-gig4d62n/shell-cmd-error-03bewi0y/make.out while /tmp/eb-gig4d62n/easybuild-run-seeh08kw.log contains the output of that same command. Unless we can strip the actual error message from the output, this seems unnecessary. We can just print the path to the log file of that step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, no, since a step could be running a whole bunch of commands, and also emit way more log messages.
I think it's useful to have a log file that only includes the output of the shell command that failed, so you can start from the top and work your way down the command output without first having to ignore a bunch of other log messages that you don't care much about.
We could also considering logging the full environment in which the command was started in this command log (maybe only if --debug is used), I would definitely not do that in the step log...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, you're right, I thought you were referring to a log for the full step, but run_shell_cmd indeed also logs to easybuild-run*log (when trace is enabled), so we're logging the exact same thing twice.
| @lexming I've tackled most of your remarks in ece71c2 W.r.t. saving command output twice in case of i) running with  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
First step towards vastly better error reporting for failing shell commands.
Example output:
compared to the more messy (and partially duplicate) output with potentially very long lines and a feeble attempt at reducing the size of the output by cutting it down to just the first 300 characters of command output that EasyBuild 4.x produces:
Next steps (in follow-up PRs) could be:
run_make_cmdandrun_pip_cmdin easyblocks);richto produce colored output by default (ERRORin red, etc.);