Rewrite When More Than N Lines
As described in Running Base in a
New Shell, I have found a way to recreate a Bash environment in a
new Bash shell that I am somewhat satisfied with. While
functional, my biggest dissatisfaction with the method is that the
script spawns many new processes. In particular, parsing the output of
declare -p
is complicated by newlines in environment
variable values. My implementation parses and escapes each declaration
separately, making use of sed
and tr
, spawning
multiple processes per declaration.
Shell Scripts
If shell scripts are good at anything, they are good at spawning new processes. Aside from tasks where spawning processes is the goal, it is something that should be minimized. Why use shell scripts at all? Common reasons (justifications) include:
- Shell scripts are portable. When written correctly, shell scripts can be used on any POSIX system without having to install additional dependencies.
- Shell scripts are stable. It is quite rare to need to update a working shell script to work with a new version of the shell language. Many shell scripts are therefore never modified after initial development.
- Shell script is often seen as a convenient “glue language” that can quickly solve a problem with few lines of code.
Shell scripts often become problematic when they require more than a few lines of code. This leads to a common heuristic:
When a shell script is more than N lines of code, it is time to rewrite in a more capable language.
The value of N
tends to be inversely proportional to
experience. Among senior developers, N=5
seems to be
common. Note that the number of lines is not really the point. In fact,
good implementations in capable languages are often longer than poor
implementations in shell script. The point is that shell scripts are
dangerous when code gets complex, and the number of lines is a
simple/inaccurate estimate of complexity.
Common problems with non-trivial shell scripts include the following:
- Shell scripts have many limitations. Many tasks that are trivial in other languages can be surprisingly difficult in a shell script. Such limitations often lead to poor implementations. For example, many shell scripts have poor CLI argument parsing because it is a lot of work to implement decent argument parsing. Developers may even decide to not implement useful features just because it is too difficult to implement them well.
- Shell scripts have many pitfalls. There are many ways to implement functionality that seems to work yet fails on edge cases. Note that static analysis using ShellCheck is extremely helpful in mitigating this issue to some extent. Such pitfalls often result in high development/maintenance cost, however, as developers often spend many hours on issues that do not exist in more capable languages.
- Shell scripts have poor error handling
capabilities. Bash “best practices” involve using
set -o errexit
andset -o pipefail
so that the script exits when a command fails! (Running the script again withset -o xtrace
can then help you discover what is failing, in the case that the error is easily reproducible.) Managing resources during error conditions must be done using traps.
Shell Script Alternatives
What are good alternatives to shell scripts? Python is often recommended because it is quite portable. It is easily installable on many systems, and the “batteries included” standard library provides a lot of functionality without having to install additional dependencies. It is a pretty reasonable solution if you want to distribute a “script” instead of a compiled binary.
Frankly, I think that most languages would be a better choice than writing a shell script in many cases. Pick a capable language that you are comfortable with! I prefer Haskell. There are even a number of Haskell libraries available for tasks that are particularly suited to shell scripting, including shelly, turtle, and shh.
If you prefer parenthesis over type safety, babashka is another good option!
Base vs. Bash
I avoid shell scripts when at all possible. Base is an exception because it is software specifically for Bash. I have used it daily for well over a decade, so I think it is worth the trouble.
I am currently thinking about ways to resolve the too-many-processes
issue with the new shell implementation. If I were using a more capable
language, I could process the output of declare -p
very
easily.
I considered three different ways to approach the problem:
- Perhaps I can implement it all within Bash, using string manipulation.
- Another possible way to resolve the issue is to do the processing
with
awk
. Perhaps I can put theawk
code in a function and pass it toawk
using process substitution, while piping the output ofdeclare -p
toSTDIN
. - Querying the environment configuration must be done within Bash
because it must be done by the shell process, but perhaps an external
program could be used to implement the problematic processing. For
example, perhaps the output of
declare -p
andalias -p
could be passed to an external program, and the external program could generate the initialization script to be passed to the new Bash process, including the Base implementation. Alas, I am not keen on adding a separate program for this purpose.
I tried implementing it using Bash string manipulation. A correct
solution needs to match quotation marks, which is probably quite
difficult to do using just Bash. Instead, I parsed the output by just
checking if each line starts with declare
syntax. This is
not a problem unless an environment variable value has declare syntax
after a newline, which is unlikely in practice.
_demo_load_env () {
local defcmd defn line var
while IFS=$'\n' read -r line ; do
if [[ "${line}" =~ ^declare ]] ; then
if [ -n "${defn}" ] ; then
echo "${defn}"
defn=""
fi
defcmd="${line#declare -* }"
var="${defcmd%%=*}"
case "${var}" in
BASH_* | FUNCNAME | GROUPS | cmd | val )
;;
DEMO_ENV | defcmd | defn | line | var )
;;
* )
defn="${line//$'\t'/\\t}"
;;
esac
else
defn="${defn}\\n${line//$'\t'/\\t}"
fi
done < <(declare -p)
test -z "${defn}" || echo "${defn}"
alias -p
}
The new version of the demonstration script is available on GitHub:
demo.sh
This is a good example of how shell script limitations can lead to poor implementations.
Update
This demonstration script has issues! The issues and fixes are discussed in the following blog entries:
- Bash Escaping Issue
- Bash Escaping Issue (Part 2)
- Bash Escaping Issue (Part 3)
- Copying a Bash Environment to a New Shell
The final version of the script can be found in Abort Transformation!