Skip to main content

Nix Terminfo and Locale Archive

While trying to use the latest version of Obelisk, I ran into “invalid character” errors when attempting to use the --verbose option. I traced the source of the error on the Haskell side to printing of Unicode characters to the screen, and replacing those characters with ASCII characters allowed me to get past the error. The issue is not updated yet, but a friend relayed some information about the cause as well as some workarounds. This blog entry gives my interpretation of the issue, so any errors are my own.

Unless using NixOS, the system contains things from both the host OS and Nix. Nix processes inherit environment variables by default, but these environment variables may reference things that are not configured/installed under Nix. Many issues can be avoided by separating the host OS and Nix, but some things are difficult to separate. In particular, a terminal provided by the host is often used to run Nix commands, and configuration for that user interface may not match the Nix environment.

I first ran into this kind of issue when I was unable to use the backspace when using nix-shell. The problem was that Nix did not have terminfo entries for the terminal that I was using. I initially got around that issue by setting TERM=xterm when using Nix, every time I ran nix-shell. A friend later taught me a different workaround that is less annoying. The terminfo database is located at /usr/lib/terminfo, but the software also checks ~/.terminfo! By copying the necessary files from /usr/lib/terminfo to ~/.terminfo, Nix processes can find them.

Why not install these terminfo files in the Nix environment? The files that I need are specific to the terminal that I use, while other developers likely use different terminals. Perhaps all terminfo files could be installed in the Nix environment, but that increases the size of the closure. It would also be unsatisfactory to include OS-level stuff that is unrelated to the software that a derivation provides.

The “invalid character” errors are caused by a similar issue. My locale environment variables were set to en_US.UTF-8, but that locale was not available within the Nix environment. One workaround is to use the C.UTF-8 locale instead for both the LANG and LC_CTYPE environment variables. This is similar to how I set the TERM environment variable to work around terminal issues. I tested this and confirmed that this does indeed work around the issue, and I saw Unicode characters in the output.

Another way to work around the issue is to use the LOCALE_ARCHIVE environment variable. The documentation indicates that it exists to provide a way to specify a locale archive when using different versions of glibc, but it can also be used to point to a locale archive on the host system, usually located at /usr/lib/locale/locale-archive. I tested this and confirmed that it works as well.

What can the Obelisk project do to address the issue? It is a general Nix issue, so perhaps it is best to just document it in the README or FAQ. Alternatively, the project could include pkgs.development.libraries.glibc.locales with allLocales set to true so that all locales are installed. This is analogous to installing terminfo, and the documentation indicates that it takes about 100MB of space.

If locale information is not needed, however, a hack that would be transparent to users would be to make the ob program set the locale environment variables to C.UTF-8. This cannot simply be done when the program starts, however, because the environment variables are read before the Haskell code is run. The program can set them before “hand off” and re-execute the program when --no-handoff is used, however.

This hack is easy to implement, so I tried it out. The commit is on GitHub, but I will not link to it because this blog entry will outlive the repository branch. I shall instead include the code here.

The setLocaleC function checks locale environment variables and sets them to C.UTF-8 when necessary, returning True if any were set.

setLocaleC :: MonadObelisk m => m Bool
setLocaleC = fmap or . forM envVars $ \envVar -> do
  alreadySet <- liftIO $ (== Just locale) <$> lookupEnv envVar
  unless alreadySet $ do
    putLog Debug $ T.pack $ unwords ["Setting", envVar, "to", locale]
    liftIO $ setEnv envVar locale
  pure $ not alreadySet
  where
    envVars = ["LANG", "LC_CTYPE"]
    locale  = "C.UTF-8"

The reExecuteOb function re-executes the ob command.

reExecuteOb :: MonadObelisk m => FilePath -> [String] -> m ()
reExecuteOb obPath myArgs = do
  putLog Debug "Re-executing..."
  void $ liftIO $ rawSystem obPath myArgs

In the main' function, setLocaleC is called, binding the result to localeSet. In the “hand off” case, nothing more needs to be done. In the --no-handoff case, reExecuteOb is called if the locale was set.

if localeSet
  then reExecuteOb obPath myArgs
  else ob $ _args_command args'

It works as expected.