Obelisk Memory Limiting

As described in the Laptop Issues blog entry, I recently upgraded my laptop to 48GB of RAM because my work requires a lot and my system would often freeze with only 16GB of RAM. I figured that I would no longer have memory issues, and I have indeed been enjoying having ample memory, but I ran out of memory this morning! I really do not want to crash my whole session when this happens, so I decided to start limiting RAM usage using cgroups.

When I run out of memory, the culprit is always the same. The project uses Obelisk, part of the Reflex FRP ecosystem. I run ob shell to enter a shell that has the project dependencies, courtesy of Nix. Running ob run within that environment runs the project in development mode. It uses ghcid to build and run the project, so that it automatically reloads whenever there is a change to a source file. It displays warnings and errors, so you can keep an eye on it while you develop and test. Compilation is done using GHC, the Haskell compiler. The ghc process consumes all of the memory. Usually, the kernel “OOM killer” is able to catch the issue, kill all of my processes, and dump me to a console login screen.

The goal is to contain the Obelisk environment using cgroups, setting memory limits so that the OOM killer just kills those processes instead of my whole session.

I use a cgroup named obelisk:

CGROUP_NAME="obelisk"

Write access to the cgroup.procs file of the parent cgroup is required in order to execute the initial cgroup process as a normal user (not root), so I create the new cgroup under my user slice.

USER_SLICE_PATH="user.slice/user-${UID}.slice"
USER_SLICE_DIR="/sys/fs/cgroup/${USER_SLICE_PATH}"
USER_SLICE_PROCS_FILE="${USER_SLICE_DIR}/cgroup.procs"

CGROUP_PATH="${USER_SLICE_PATH}/${CGROUP_NAME}"
CGROUP_DIR="/sys/fs/cgroup/${CGROUP_PATH}"

Administrative access is required to create the new cgroup, but my normal user is granted permissions to administer it.

echo "Creating cgroup ${CGROUP_PATH}..."
sudo cgcreate -a "${USER}" -t "${USER}" -g "memory:${CGROUP_PATH}"

Administrative access is also required to change ownership of the cgroup.procs file of the parent cgroup.

echo "Setting owner of parent cgroup.procs file..."
sudo chown "${USER}" "${USER_SLICE_PROCS_FILE}"

The memory limits are set as follows. The memory.high setting sets the memory usage throttle limit. If/when the cgroup exceeds the configured threshold, the kernel throttles the processes and puts them under heavy reclaim pressure. The memory.high setting sets the maximum memory usage. If/when the cgroup hits this limit, the OOM killer kills the cgroup processes.

echo -n "Setting memory limit (soft): "
echo "32G" | tee "${CGROUP_DIR}/memory.high"

echo -n "Setting memory limit (hard): "
echo "36G" | tee "${CGROUP_DIR}/memory.max"

echo -n "Setting swap limit (soft): "
echo "128M" | tee "${CGROUP_DIR}/memory.swap.high"

echo -n "Setting swap limit (hard): "
echo "256M" | tee "${CGROUP_DIR}/memory.swap.max"

In my project directory, I can run a bash shell in the new cgroup, as my normal user.

$ cgexec -g memory:user.slice/user-1331.slice/obelisk bash

The systemctl status command can be used to show the cgroup hierarchy. Since I run bash as the initial process, my hierarchy looks like the following:

/
├─init.scope
│ └─1 /sbin/init
├─system.slice
│ ├─...
│ ...
└─user.slice
  └─user-1331.slice
    ├─obelisk
    │ └─417624 bash
    ├─...
    ...

Within that shell, I can run ob shell and ob run as usual. With everything running, my hierarchy looks like the following:

/
├─init.scope
│ └─1 /sbin/init
├─system.slice
│ ├─...
│ ...
└─user.slice
  └─user-1331.slice
    ├─obelisk
    │ ├─417624 bash
    │ ├─435360 /home/tcard/.nix-profile/bin/ob shell
    │ ├─443578 ./.obelisk/impl/.attr-cache/command.out/bin/ob --no-handoff shell
    │ ├─443606 bash --rcfile /tmp/nix-shell-443606-0/rc
    │ ├─443651 /home/tcard/.nix-profile/bin/ob run
    │ ├─451869 ./.obelisk/impl/.attr-cache/command.out/bin/ob --no-handoff run
    │ ├─452114 bash /run/user/1331/nix-shell-452114-0/rc
    │ ├─452147 /nix/store/XXXXXXXX-ghcid-0.8/bin/ghcid ...
    │ └─452152 /nix/store/XXXXXXXX-ghc-8.6.5/lib/ghc-8.6.5/bin/ghc ...
    ├─...
    ...

The problematic ghc process is run within the new cgroup, so it should now only crash that cgroup instead of my whole session if/when it consumes excess memory.

References:

Author

Travis Cardwell

Published

January 30, 2023