Nix Terminfo and Locale Archive
While trying to use the latest version of Obelisk, I
ran into “invalid character” errors when attempting to use the
--verbose option. I traced the source of the error on the
Haskell side to printing of Unicode characters to the screen, and
replacing those characters with ASCII characters allowed me to get past
the error. The issue
is not updated yet, but a friend relayed some information about the
cause as well as some workarounds. This blog entry gives my
interpretation of the issue, so any errors are my own.
Unless using NixOS, the system contains things from both the host OS and Nix. Nix processes inherit environment variables by default, but these environment variables may reference things that are not configured/installed under Nix. Many issues can be avoided by separating the host OS and Nix, but some things are difficult to separate. In particular, a terminal provided by the host is often used to run Nix commands, and configuration for that user interface may not match the Nix environment.
I first ran into this kind of issue when I was unable to use the
backspace when using nix-shell. The problem was that Nix
did not have terminfo entries for
the terminal that I was using. I initially got around that issue by
setting TERM=xterm when using Nix, every time I ran
nix-shell. A friend later taught me a different workaround
that is less annoying. The terminfo database is located at
/usr/lib/terminfo, but the software also checks
~/.terminfo! By copying the necessary files from
/usr/lib/terminfo to ~/.terminfo, Nix
processes can find them.
Why not install these terminfo files in the Nix
environment? The files that I need are specific to the terminal that I
use, while other developers likely use different terminals. Perhaps
all terminfo files could be installed in the Nix
environment, but that increases the size of the closure. It would also
be unsatisfactory to include OS-level stuff that is unrelated to the
software that a derivation provides.
The “invalid character” errors are caused by a similar issue. My
locale environment variables were set to en_US.UTF-8, but
that locale was not available within the Nix environment. One workaround
is to use the C.UTF-8 locale instead for both the
LANG and LC_CTYPE environment variables. This
is similar to how I set the TERM environment variable to
work around terminal issues. I tested this and confirmed that this does
indeed work around the issue, and I saw Unicode characters in the
output.
Another way to work around the issue is to use the LOCALE_ARCHIVE
environment variable. The documentation indicates that it exists to
provide a way to specify a locale archive when using different versions
of glibc, but it can also be used to point to a locale
archive on the host system, usually located at
/usr/lib/locale/locale-archive. I tested this and confirmed
that it works as well.
What can the Obelisk project do to address the issue? It is a general
Nix issue, so perhaps it is best to just document it in the README or
FAQ. Alternatively, the project could include pkgs.development.libraries.glibc.locales
with allLocales set to true so that all
locales are installed. This is analogous to installing
terminfo, and the documentation indicates that it takes
about 100MB of space.
If locale information is not needed, however, a hack that would be
transparent to users would be to make the ob program set
the locale environment variables to C.UTF-8. This cannot
simply be done when the program starts, however, because the environment
variables are read before the Haskell code is run. The program can set
them before “hand off” and re-execute the program when
--no-handoff is used, however.
This hack is easy to implement, so I tried it out. The commit is on GitHub, but I will not link to it because this blog entry will outlive the repository branch. I shall instead include the code here.
The setLocaleC function checks locale environment
variables and sets them to C.UTF-8 when necessary,
returning True if any were set.
setLocaleC :: MonadObelisk m => m Bool
setLocaleC = fmap or . forM envVars $ \envVar -> do
  alreadySet <- liftIO $ (== Just locale) <$> lookupEnv envVar
  unless alreadySet $ do
    putLog Debug $ T.pack $ unwords ["Setting", envVar, "to", locale]
    liftIO $ setEnv envVar locale
  pure $ not alreadySet
  where
    envVars = ["LANG", "LC_CTYPE"]
    locale  = "C.UTF-8"The reExecuteOb function re-executes the ob
command.
reExecuteOb :: MonadObelisk m => FilePath -> [String] -> m ()
reExecuteOb obPath myArgs = do
  putLog Debug "Re-executing..."
  void $ liftIO $ rawSystem obPath myArgsIn the main' function, setLocaleC is
called, binding the result to localeSet. In the “hand off”
case, nothing more needs to be done. In the --no-handoff
case, reExecuteOb is called if the locale was set.
if localeSet
  then reExecuteOb obPath myArgs
  else ob $ _args_command args'It works as expected.