Abort Transformation!
Over the weekend, I had a chance to think about Bash escaping during
a bath. I have been anxious to release Base so that I can move on to
more important (as well as much more enjoyable) tasks, but I am not
satisfied with the current implementation of the environment copying
functionality. The issue with environment variable values that have
declare
syntax following a newline is a clear indicator
that it is not implemented well, and I suspect that the transformation
between string types does not cover all possibilities.
After the bath, I decided to try implementing a double-quoted string parser in Bash. I realized that I also need to handle newlines in strings that are stored in arrays, making the code even more complex! I made good progress, but the result was many hundreds of lines of difficult-to-read code! I was considering writing an implementation in Haskell that I could test well, using unit tests as well as QuickCheck, and then translate to Bash…
I then found information about double-quoted strings in the Bash manual, and there are indeed other characters that are treated specially! Here are some cases that I was not handling correctly:
$ declare TEST='`'
$ declare -p TEST
declare -- TEST="\`"
$ declare TEST='$'
$ declare -p TEST
declare -- TEST="\$"
The manual also mentions possible escaping of !
,
depending on history expansion settings! Transformation of strings is
much more difficult that I had expected. Thinking about possible ways to
get around the issue, I realized that there is a much better way that
does not involve transformation of escape sequences at all!
In Bash, quoted strings that are written together are concatenated. In the following example, the string is written using three quoted parts: two single-quoted parts, with a double-quoted part in the middle.
$ echo 'Alpha is Greek for "doesn'"'"'t work."'
Alpha is Greek for "doesn't work."
The output of declare -p
formats all values using double
quotes. When there is a newline, one can simply insert it using
escape-quotes without having to transform the rest of the string! It
seems obvious in hindsight.
New Implementation
In order to avoid hundreds of lines of code that parses strings, I
decided to go back to running declare -p
per environment
variable. Previously, a major issue in doing this was the use of pipes,
grep
, and sed
, which spawned many processes.
That issue can be avoided by parsing the variable names in Bash;
declare
itself is a Bash builtin.
Function _demo_select_env
parses environment variable
names from the output of declare -p
. Some environment
variables that should not be copied are filtered out.
_demo_select_env () {
local defn line var
while IFS=$'\n' read -r line ; do
if [[ "${line}" =~ ^declare\ - ]] ; then
defn="${line#declare -* }"
var="${defn%%=*}"
case "${var}" in
BASH_* | FUNCNAME | GROUPS | cmd | val | \
DEMO_ENV | decl | defn | line | var )
;;
* )
echo "${var}"
;;
esac
fi
done < <(declare -p)
}
Note that this function outputs variable names that occur in
declare
syntax following a newline in the value of an
environment variable. This causes two issues:
- Some variable names that are output may not actually exist. This issue is resolved by simply ignoring the variables that do not exist.
- Some variable names may be output more than once. This issue is
resolved by filtering the list through
sort -u
.
Function _demo_load_env
becomes quite simple! Each
variable name is queried, and multiple lines are joined as described
above. Only declare
commands for valid environment
variables are output. Finally, alias commands are also output.
_demo_load_env () {
local decl line var
while IFS=$'\n' read -r var ; do
decl=""
while IFS=$'\n' read -r line ; do
if [ -z "${decl}" ] ; then
decl="${line}"
else
decl="${decl}\"\$'\\n'\"${line}"
fi
done < <(declare -p "${var}" 2>/dev/null)
[ -z "${decl}" ] || echo "${decl}"
done < <(_demo_select_env | sort -u)
alias -p
}
The updated demonstration script is available on GitHub: demo.sh