Bash fail on error

Possible Duplicate: Automatic exit from bash shell script on error How can I have bash stop on the first command failure, without putting stuff like this all through my code? some_prog || exi...

Possible Duplicate:
Automatic exit from bash shell script on error

How can I have bash stop on the first command failure, without putting stuff like this all through my code?

some_prog || exit 1
some_other_prog || exit 1

Community's user avatar

asked Aug 13, 2010 at 6:45

Matt Joiner's user avatar

Matt JoinerMatt Joiner

110k107 gold badges369 silver badges521 bronze badges

0

Maybe you want set -e:

www.davidpashley.com/articles/writing-robust-shell-scripts.html#id2382181:

This tells bash that it should exit the script if any statement returns a non-true return value. The benefit of using -e is that it prevents errors snowballing into serious issues when they could have been caught earlier. Again, for readability you may want to use set -o errexit.

Behrang's user avatar

Behrang

46.1k25 gold badges115 silver badges158 bronze badges

answered Aug 13, 2010 at 6:50

Alok Singhal's user avatar

Alok SinghalAlok Singhal

91.2k20 gold badges125 silver badges157 bronze badges

6

One point missed in the existing answers is show how to inherit the error traps. The bash shell provides one such option for that using set

-E

If set, any trap on ERR is inherited by shell functions, command substitutions, and commands executed in a subshell environment. The ERR trap is normally not inherited in such cases.


Adam Rosenfield’s answer recommendation to use set -e is right in certain cases but it has its own potential pitfalls. See GreyCat’s BashFAQ — 105 — Why doesn’t set -e (or set -o errexit, or trap ERR) do what I expected?

According to the manual, set -e exits

if a simple commandexits with a non-zero status. The shell does not exit if the command that fails is part of the command list immediately following a while or until keyword, part of the test in a if statement, part of an && or || list except the command following the final && or ||, any command in a pipeline but the last, or if the command’s return value is being inverted via !«.

which means, set -e does not work under the following simple cases (detailed explanations can be found on the wiki)

  1. Using the arithmetic operator let or $((..)) ( bash 4.1 onwards) to increment a variable value as

    #!/usr/bin/env bash
    set -e
    i=0
    let i++                   # or ((i++)) on bash 4.1 or later
    echo "i is $i" 
    
  2. If the offending command is not part of the last command executed via && or ||. For e.g. the below trap wouldn’t fire when its expected to

    #!/usr/bin/env bash
    set -e
    test -d nosuchdir && echo no dir
    echo survived
    
  3. When used incorrectly in an if statement as, the exit code of the if statement is the exit code of the last executed command. In the example below the last executed command was echo which wouldn’t fire the trap, even though the test -d failed

    #!/usr/bin/env bash
    set -e
    f() { if test -d nosuchdir; then echo no dir; fi; }
    f 
    echo survived
    
  4. When used with command-substitution, they are ignored, unless inherit_errexit is set with bash 4.4

    #!/usr/bin/env bash
    set -e
    foo=$(expr 1-1; true)
    echo survived
    
  5. when you use commands that look like assignments but aren’t, such as export, declare, typeset or local. Here the function call to f will not exit as local has swept the error code that was set previously.

    set -e
    f() { local var=$(somecommand that fails); }        
    g() { local var; var=$(somecommand that fails); }
    
  6. When used in a pipeline, and the offending command is not part of the last command. For e.g. the below command would still go through. One options is to enable pipefail by returning the exit code of the first failed process:

    set -e
    somecommand that fails | cat -
    echo survived
    

The ideal recommendation is to not use set -e and implement an own version of error checking instead. More information on implementing custom error handling on one of my answers to Raise error in a Bash script

Содержание

  1. How to Exit When Errors Occur in Bash Scripts
  2. Exit When Any Command Fails
  3. Exit Only When Specific Commands Fail
  4. Conclusion
  5. Suggested Articles
  6. How to catch and handle errors in bash
  7. Bash Error Handling Tip #1: Check the Exit Status
  8. Bash Error Handling Tip #2: Exit on Errors in Bash
  9. Bash Error Handling Tip #3: Try and Catch Statements in Bash
  10. Conclusion
  11. If you find this tutorial helpful, I recommend you check out the series of bash shell scripting tutorials provided by Xmodulo.
  12. Support Xmodulo
  13. Learn Bash error handling by example
  14. More Linux resources
  15. Career advice

How to Exit When Errors Occur in Bash Scripts

By Evan Sangaline | September 11, 2017

It’s a common issue that scripts written and tested on GNU/Linux don’t run correctly on macOS–or vice versa–because of differences between the GNU and BSD versions of the core utils. Error messages can get drowned in the script output, making it far from obvious that something isn’t executing correctly. There are a couple of easy fixes to avoid problems like this, but they rely on some bash features that you may not be familiar with if you don’t do a ton of scripting. I’ll summarize my two approaches here and hopefully they’re of some use to you if you’re looking for a how-to guide for this specific problem.

Exit When Any Command Fails

This can actually be done with a single line using the set builtin command with the -e option.

Putting this at the top of a bash script will cause the script to exit if any commands return a non-zero exit code. We can get a little fancier if we use DEBUG and EXIT traps to execute custom commands before each line of the script is run and before the script exits, respectively. Adding the following lines will allow us to print out a nice error message including the previously run command and its exit code.

For example, if we run ls —fake-option then we’ll see an informative error message as follows.

Exit Only When Specific Commands Fail

The global solution is often fine, but sometimes you only care if certain commands fail. We can handle this situation by defining a function that needs to be explicitly invoked to check the status code and exit if necessary.

That bottom line isn’t strictly necessary, but it allows us to use !! and have it expand to the last command executed. For example, we can check explicitly for an error like this

will pass the exit code of the previous command as the first argument to exit_on_error() and then !! will expand to ls —fake-option as the second and third arguments. The second and third arguments–plus any further arguments if they were there–are then recombined by slicing $ <@:2>.

This approach will print the same error message as the one in the previous section, but will only check the commands that are explicitly followed by exit_on_error calls.

Conclusion

Hopefully you found one of these two methods helpful! If you’re ever struggling with any devops or infrastructure issues then please reach out about our consulting services in these areas.

Suggested Articles

If you enjoyed this article, then you might also enjoy these related ones.

Источник

How to catch and handle errors in bash

Last updated on March 28, 2021 by Dan Nanni

In an ideal world, things always work as expected, but you know that’s hardly the case. The same goes in the world of bash scripting. Writing a robust, bug-free bash script is always challenging even for a seasoned system administrator. Even if you write a perfect bash script, the script may still go awry due to external factors such as invalid input or network problems. While you cannot prevent all errors in your bash script, at least you should try to handle possible error conditions in a more predictable and controlled fashion.

That is easier said than done, especially since error handling in bash is notoriously difficult. The bash shell does not have any fancy exception swallowing mechanism like try/catch constructs. Some bash errors may be silently ignored but may have consequences down the line. The bash shell does not even have a proper debugger.

In this tutorial, I’ll introduce basic tips to catch and handle errors in bash. Although the presented error handling techniques are not as fancy as those available in other programming languages, hopefully by adopting the practice, you may be able to handle potential bash errors more gracefully.

Bash Error Handling Tip #1: Check the Exit Status

As the first line of defense, it is always recommended to check the exit status of a command, as a non-zero exit status typically indicates some type of error. For example:

Another (more compact) way to trigger error handling based on an exit status is to use an OR list:

With this OR statement, is executed if and only if returns a non-zero exit status. So you can replace with your own error handling routine. For example:

Bash provides a built-in variable called $? , which tells you the exit status of the last executed command. Note that when a bash function is called, $? reads the exit status of the last command called inside the function. Since some non-zero exit codes have special meanings, you can handle them selectively. For example:

Bash Error Handling Tip #2: Exit on Errors in Bash

When you encounter an error in a bash script, by default, it throws an error message to stderr , but continues its execution in the rest of the script. In fact you see the same behavior in a terminal window; even if you type a wrong command by accident, it will not kill your terminal. You will just see the «command not found» error, but you terminal/bash session will still remain.

This default shell behavior may not be desirable for some bash script. For example, if your script contains a critical code block where no error is allowed, you want your script to exit immediately upon encountering any error inside that code block. To activate this «exit-on-error» behavior in bash, you can use the set command as follows.

Once called with -e option, the set command causes the bash shell to exit immediately if any subsequent command exits with a non-zero status (caused by an error condition). The +e option turns the shell back to the default mode. set -e is equivalent to set -o errexit . Likewise, set +e is a shorthand command for set +o errexit .

However, one special error condition not captured by set -e is when an error occurs somewhere inside a pipeline of commands. This is because a pipeline returns a non-zero status only if the last command in the pipeline fails. Any error produced by previous command(s) in the pipeline is not visible outside the pipeline, and so does not kill a bash script. For example:

If you want any failure in pipelines to also exit a bash script, you need to add -o pipefail option. For example:

Therefore, to protect a critical code block against any type of command errors or pipeline errors, use the following pair of set commands.

Bash Error Handling Tip #3: Try and Catch Statements in Bash

Although the set command allows you to terminate a bash script upon any error that you deem critical, this mechanism is often not sufficient in more complex bash scripts where different types of errors could happen.

To be able to detect and handle different types of errors/exceptions more flexibly, you will need try/catch statements, which however are missing in bash. At least we can mimic the behaviors of try/catch as shown in this trycatch.sh script:

Here we define several custom bash functions to mimic the semantic of try and catch statements. The throw() function is supposed to raise a custom (non-zero) exception. We need set +e , so that the non-zero returned by throw() will not terminate a bash script. Inside catch() , we store the value of exception raised by throw() in a bash variable exception_code , so that we can handle the exception in a user-defined fashion.

Perhaps an example bash script will make it clear how trycatch.sh works. See the example below that utilizes trycatch.sh .

In this example script, we define three types of custom exceptions. We can choose to raise any of these exceptions depending on a given error condition. The OR list || throw allows us to invoke throw() function with a chosen value as a parameter, if returns a non-zero exit status. If is completed successfully, throw() function will be ignored. Once an exception is raised, the raised exception can be handled accordingly inside the subsequent catch block. As you can see, this provides a more flexible way of handling different types of error conditions.

Granted, this is not a full-blown try/catch constructs. One limitation of this approach is that the try block is executed in a sub-shell. As you may know, any variables defined in a sub-shell are not visible to its parent shell. Also, you cannot modify the variables that are defined in the parent shell inside the try block, as the parent shell and the sub-shell have separate scopes for variables.

Conclusion

In this bash tutorial, I presented basic error handling tips that may come in handy when you want to write a more robust bash script. As expected these tips are not as sophisticated as the error handling constructs available in other programming language. If the bash script you are writing requires more advanced error handling than this, perhaps bash is not the right language for your task. You probably want to turn to other languages such as Python.

Let me conclude the tutorial by mentioning one essential tool that every shell script writer should be familiar with. ShellCheck is a static analysis tool for shell scripts. It can detect and point out syntax errors, bad coding practice and possible semantic issues in a shell script with much clarity. Definitely check it out if you haven’t tried it.

If you find this tutorial helpful, I recommend you check out the series of bash shell scripting tutorials provided by Xmodulo.

Support Xmodulo

This website is made possible by minimal ads and your gracious donation via PayPal or credit card

Please note that this article is published by Xmodulo.com under a Creative Commons Attribution-ShareAlike 3.0 Unported License. If you would like to use the whole or any part of this article, you need to cite this web page at Xmodulo.com as the original source.

Xmodulo © 2020 ‒ About ‒ Powered by DigitalOcean

Источник

Learn Bash error handling by example

Posted: June 29, 2021 |

In this article, I present a few tricks to handle error conditions—Some strictly don’t fall under the category of error handling (a reactive way to handle the unexpected) but also some techniques to avoid errors before they happen.

Case study: Simple script that downloads a hardware report from multiple hosts and inserts it into a database.

Say that you have a cron job on each one of your Linux systems, and you have a script to collect the hardware information from each:

If everything goes well, then you collect your files in parallel because you don’t have more than ten systems. You can afford to ssh to all of them at the same time and then show the hardware details of each one.

Here are some possibilities of why things went wrong:

  • Your report didn’t run because the server was down
  • You couldn’t create the directory where the files need to be saved
  • The tools you need to run the script are missing
  • You can’t collect the report because your remote machine crashed
  • One or more of the reports is corrupt

The current version of the script has a problem—It will run from the beginning to the end, errors or not:

Next, I demonstrate a few things to make your script more robust and in some times recover from failure.

The nuclear option: Failing hard, failing fast

The proper way to handle errors is to check if the program finished successfully or not, using return codes. It sounds obvious but return codes, an integer number stored in bash $? or $! variable, have sometimes a broader meaning. The bash man page tells you:

For the shell’s purposes, a command which exits with a zero exit
status has succeeded. An exit status of zero indicates success.
A non-zero exit status indicates failure. When a command
terminates on a fatal signal N, bash uses the value of 128+N as
the exit status.

More Linux resources

As usual, you should always read the man page of the scripts you’re calling, to see what the conventions are for each of them. If you’ve programmed with a language like Java or Python, then you’re most likely familiar with their exceptions, different meanings, and how not all of them are handled the same way.

If you add set -o errexit to your script, from that point forward it will abort the execution if any command exists with a code != 0 . But errexit isn’t used when executing functions inside an if condition, so instead of remembering that exception, I rather do explicit error handling.

Take a look at version two of the script. It’s slightly better:

Here’s what changed:

  • Lines 11 and 12, I enable error trace and added a ‘trap’ to tell the user there was an error and there is turbulence ahead. You may want to kill your script here instead, I’ll show you why that may not be the best.
  • Line 20, if the directory doesn’t exist, then try to create it on line 21. If directory creation fails, then exit with an error.
  • On line 27, after running each background job, I capture the PID and associate that with the machine (1:1 relationship).
  • On lines 33-35, I wait for the scp task to finish, get the return code, and if it’s an error, abort.
  • On line 37, I check that the file could be parsed, otherwise, I exit with an error.

So how does the error handling look now?

As you can see, this version is better at detecting errors but it’s very unforgiving. Also, it doesn’t detect all the errors, does it?

When you get stuck and you wish you had an alarm

The code looks better, except that sometimes the scp could get stuck on a server (while trying to copy a file) because the server is too busy to respond or just in a bad state.

Another example is to try to access a directory through NFS where $HOME is mounted from an NFS server:

And you discover hours later that the NFS mount point is stale and your script is stuck.

A timeout is the solution. And, GNU timeout comes to the rescue:

Here you try to regularly kill (TERM signal) the process nicely after 10.0 seconds after it has started. If it’s still running after 20.0 seconds, then send a KILL signal ( kill -9 ). If in doubt, check which signals are supported in your system ( kill -l , for example).

If this isn’t clear from my dialog, then look at the script for more clarity.

Back to the original script to add a few more options and you have version three:

What are the changes?:

  • Between lines 16-22, check if all the required dependency tools are present. If it cannot execute, then ‘Houston we have a problem.’
  • Created a remote_copy function, which uses a timeout to make sure the scp finishes no later than 45.0s—line 33.
  • Added a connection timeout of 5 seconds instead of the TCP default—line 37.
  • Added a retry to scp on line 38—3 attempts that wait 1 second between each.

There other ways to retry when there’s an error.

Waiting for the end of the world-how and when to retry

You noticed there’s an added retry to the scp command. But that retries only for failed connections, what if the command fails during the middle of the copy?

Sometimes you want to just fail because there’s very little chance to recover from an issue. A system that requires hardware fixes, for example, or you can just fail back to a degraded mode—meaning that you’re able to continue your system work without the updated data. In those cases, it makes no sense to wait forever but only for a specific amount of time.

Here are the changes to the remote_copy , to keep this brief (version four):

How does it look now? In this run, I have one system down (mac-pro-1-1) and one system without the file (macmini2). You can see that the copy from server dmaf5 works right away, but for the other two, there’s a retry for a random time between 1 and 60 seconds before exiting:

If I fail, do I have to do this all over again? Using a checkpoint

Suppose that the remote copy is the most expensive operation of this whole script and that you’re willing or able to re-run this script, maybe using cron or doing so by hand two times during the day to ensure you pick up the files if one or more systems are down.

You could, for the day, create a small ‘status cache’, where you record only the successful processing operations per machine. If a system is in there, then don’t bother to check again for that day.

Some programs, like Ansible, do something similar and allow you to retry a playbook on a limited number of machines after a failure ( —limit @/home/user/site.retry ).

A new version (version five) of the script has code to record the status of the copy (lines 15-33):

Did you notice the trap on line 22? If the script is interrupted (killed), I want to make sure the whole cache is invalidated.

And then, add this new helper logic into the remote_copy function (lines 52-81):

The first time it runs, a new new message for the cache directory is printed out:

If you run it again, then the script knows that dma5f is good to go, no need to retry the copy:

Imagine how this speeds up when you have more machines that should not be revisited.

Leaving crumbs behind: What to log, how to log, and verbose output

If you’re like me, I like a bit of context to correlate with when something goes wrong. The echo statements on the script are nice but what if you could add a timestamp to them.

If you use logger , you can save the output on journalctl for later review (even aggregation with other tools out there). The best part is that you show the power of journalctl right away.

So instead of just doing echo , you can also add a call to logger like this using a new bash function called ‘ message ’:

You can see that you can store separate fields as part of the message, like the priority, the script that produced the message, etc.

So how is this useful? Well, you could get the messages between 1:26 PM and 1:27 PM, only errors ( priority=0 ) and only for our script ( collect_data_from_servers.v6.sh ) like this, output in JSON format:

Because this is structured data, other logs collectors can go through all your machines, aggregate your script logs, and then you not only have data but also the information.

You can take a look at the whole version six of the script.

Don’t be so eager to replace your data until you’ve checked it.

If you noticed from the very beginning, I’ve been copying a corrupted JSON file over and over:

That’s easy to prevent. Copy the file into a temporary location and if the file is corrupted, then don’t attempt to replace the previous version (and leave the bad one for inspection. lines 99-107 of version seven of the script):

Choose the right tools for the task and prep your code from the first line

One very important aspect of error handling is proper coding. If you have bad logic in your code, no amount of error handling will make it better. To keep this short and bash-related, I’ll give you below a few hints.

You should ALWAYS check for error syntax before running your script:

Seriously. It should be as automatic as performing any other test.

Read the bash man page and get familiar with must-know options, like:

Use ShellCheck to check your bash scripts

It’s very easy to miss simple issues when your scripts start to grow large. ShellCheck is one of those tools that saves you from making mistakes.

If you’re wondering, the final version of the script, after passing ShellCheck is here. Squeaky clean.

You noticed something with the background scp processes

Career advice

You probably noticed that if you kill the script, it leaves some forked processes behind. That isn’t good and this is one of the reasons I prefer to use tools like Ansible or Parallel to handle this type of task on multiple hosts, letting the frameworks do the proper cleanup for me. You can, of course, add more code to handle this situation.

This bash script could potentially create a fork bomb. It has no control of how many processes to spawn at the same time, which is a big problem in a real production environment. Also, there is a limit on how many concurrent ssh sessions you can have (let alone consume bandwidth). Again, I wrote this fictional example in bash to show you how you can always improve a program to better handle errors.

1. You must check the return code of your commands. That could mean deciding to retry until a transitory condition improves or to short-circuit the whole script.
2. Speaking of transitory conditions, you don’t need to start from scratch. You can save the status of successful tasks and then retry from that point forward.
3. Bash ‘trap’ is your friend. Use it for cleanup and error handling.
4. When downloading data from any source, assume it’s corrupted. Never overwrite your good data set with fresh data until you have done some integrity checks.
5. Take advantage of journalctl and custom fields. You can perform sophisticated searches looking for issues, and even send that data to log aggregators.
6. You can check the status of background tasks (including sub-shells). Just remember to save the PID and wait on it.
7. And finally: Use a Bash lint helper like ShellCheck. You can install it on your favorite editor (like VIM or PyCharm). You will be surprised how many errors go undetected on Bash scripts.

Источник

Here it shows to use the || and && in a single line to concatenate the execution of commands: How can I check for apt-get errors in a bash script?

I am trying to stop a script execution if a certain condition fails,

e.g.

false || echo "Obvious error because its false on left" && exit

Here it prints Obvious error because its false on the left and exits the console which is what I wanted.

true || echo "This shouldn't print" && exit

Here, there’s no echo print but the exit command runs as well, i.e., the console is closed, shouldn’t the exit command not run because the echo command on the right was not executed?
Or is by default a statement considered false on the left hand side of an && operator?

Edit:
I should have mentioned it before, my aim was to echo the error and exit if it was not clear.
w.r.t to my specific case of catching errors when grouping conditions using && and ||, @bodhi.zazen answer solves the problem.

@takatakatek answer makes it more clear about the flow control and the bash guide links are excellent

@muru answer has good explanation of why not to use set -e if you want custom error messages to be thrown with alternatives of using perl and trap which I think is a more robust way and see myself using it from my second bash script onwards!

Skip to content

The other day I’ve been puzzled by the results of a cron job script.  The bash script in question was written in a hurry a while back, and I was under the assumption that if any of its steps fail, the whole script will fail.  I was wrong.  Some commands were failing, but the script execution continued.  It was especially difficult to notice, due to a number of unset variables, piped commands, and redirected error output.

Once I realized the problem, I got even more puzzled as to what was the best solution.  Sure, you can check an exit code after each command in the script, but that didn’t seem elegant of efficient.

A quick couple of Google searches brought me to this StackOverflow thread (no surprise there), which opened my eyes on a few bash options that can be set at the beginning of the script to stop execution when an error or warning occurs (similar to use strict; use warnings; in Perl).  Here’s the test script for you with some test commands, pipes, error redirects, and options to control all that.

#!/bin/bash

# Stop on error
set -e
# Stop on unitialized variables
set -u
# Stop on failed pipes
set -o pipefail

# Good command
echo "We start here ..."

# Use of non-initialized variable
echo "$FOOBAR"
echo "Still going after uninitialized variable ..."

# Bad command with no STDERR
cd /foobar 2> /dev/null
echo "Still going after a bad command ..."

# Good command into a bad pipe with no STDERR
echo "Good" | /some/bad/script 2> /dev/null
echo "Still going after a bad pipe ..."

# Benchmark
echo "We should never get here!"

Save it to test.sh, make executable (chmod +x test.sh), and run like so:

$ ./test.sh || echo Something went wrong

Then try to comment out some options and some commands to see what happens in different scenarios.

I think, from now on, those three options will be the standard way I start all of my bash scripts.

Понравилась статья? Поделить с друзьями:
  • Bash error dev null
  • Bash error codes
  • Bash echo write error invalid argument
  • Bash disable error output
  • Bash deb команда не найдена как исправить