Daniel Moerner

When Impossible Day Feels Impossible: /tmp Inode Exhaustion

Impossible Stuff Day is a special sprint at the Recurse Center devoted to working beyond the edge of your abilities. I had an ambitious day planned around MirageOS microkernels, but I made one critical mistake: I did not test my OCaml development environment with MirageOS.

But that’s no problem: There’s a nice guide to install MirageOS, and it looks pretty simple. We can install MirageOS and confirm that the Hello World tutorial works. However, it very much does not:

opam-monorepo: [ERROR] Can't find all required versions.
Selected: noop-unix.zdev ocaml-base-compiler&noop-unix
- cmdliner-stdlib -> (problem)
    No known implementations at all
(etc.)

That looks very bad! One thing that immediately came to mind is that most documents for MirageOS cover OCaml 4.x, but I am on OCaml 5.x. Let’s try the older version of OCaml first. Same error. Maybe there’s an issue with my whole switch setup, so let’s try on a clean user with nothing but the default switch.

[ERROR] Failed to extract archive /home/user/.opam/repo/default.tar.gz: "/usr/bin/tar xfz
        /home/user/.opam/repo/default.tar.gz -C /tmp/opam-47813-6b24e2" exited with code 2.
        Run `opam update --repositories default` to fix the issue

That looks even worse! Somehow I can’t even install OCaml at all now. Error 2 is a fatal tar error. Let’s try what it recommends:

$ opam update --repositories default

<><> Updating package repositories ><><><><><><><><><><><><><><><><><><><><><><>
[ERROR] Could not update repository "default": /usr/bin/opam: "mkdir" failed on /tmp/opam-48052-192552: No space left
        on device

Here we go. However, I’m only using 75 MiB out of 1 GiB of space on /tmp. State is hard, so let’s try one more time with a fresh install. This time, opam init does install the compiler! But I still get the same opam-monorepo error as before.

We’ve gotten one “good” error out of this process: The hint that /tmp is out of space. All the other errors were unhelpful: MirageOS’s make depends gave a generic error about no implementations found, which looks almost like a network failure, and tar gave a generic fatal error. I don’t blame tar, it’s operating under strong requirements for backwards compatibility and it’s not easy to tweak error reporting. But opam-monorepo is probably failing to propagate an underlying opam error here. Fortunately, the good error about /tmp is all we need.

On Linux, there’s actually two ways to run out of “space”: You can run out of blocks, and you can run out of inodes. Each file is associated with an inode, which stores relevant metadata for the file. If we aren’t out of block space, we must be out of inodes:

$ df -i /tmp
Filesystem     Inodes IUsed IFree IUse% Mounted on
tmpfs           44492 44492     0  100% /tmp

That’s the problem! OCaml’s opam uses /tmp for a staging area when performing various operations, and it was exhausting the number of inodes in /tmp. Unfortunately, there seems to be highly non-deterministic behavior lurking here: Some opam runs would exhaust the inodes, others would not. In fact, I’ve probably run into this error before with opam, but it was always fixed by cleaning up /tmp or rebooting. I only figured it out, and was forced to fix it, when opam-monorepo gave me mysterious but reproducible errors.

Now that we know the cause, how can we fix it? There’s another little rabbit hole here which I will skim over: /tmp is stored in RAM. Back in the day I remember manually setting this up in /etc/fstab, but there’s no fstab entry here. (Does anyone else pronounce “fstab” as “f-stab”, or is that just me?) Sounds like systemd shenanigans, and in fact it is.

The root cause is (in my opinion) a bug in my Linux distribution. In the process of overriding the systemd config file for mounting /tmp to increase the memory allocation, they forgot to also increase the inode allocation. This resulted in the (as far as I can see undocumented) default of 44492 inodes, which is way too low. Problem fixed locally, and a pull request submitted upstream: https://github.com/QubesOS/qubes-core-agent-linux/pull/535

As far as the rest of Impossible Day: After spending far too much time debugging this, I didn’t get nearly as far as I hoped. Things seem a lot simpler when writing them out in a blog post than when trying to debug them live. I only had time to play around with using Lwt in MirageOS for async promises. But it was a good experience, and I’m not yet ready to give up on my broader ambitions yet.

In unrelated news, this blog is now hosted on a different VPS, since my $1/year deal on Gullo from last Black Friday has expired. I’ll be keeping an eye out for new extremely cheap options over the next week.