When Impossible Day Feels Impossible: /tmp Inode Exhaustion
Impossible Stuff Day is a special sprint at the Recurse Center devoted to working beyond the edge of your abilities. I had an ambitious day planned around MirageOS microkernels, but I made one critical mistake: I did not test my OCaml development environment with MirageOS.
But that’s no problem: There’s a nice guide to install MirageOS, and it looks pretty simple. We can install MirageOS and confirm that the Hello World tutorial works. However, it very much does not:
opam-monorepo: [ERROR] Can't find all required versions.
Selected: noop-unix.zdev ocaml-base-compiler&noop-unix
- cmdliner-stdlib -> (problem)
No known implementations at all
(etc.)
That looks very bad! One thing that immediately came to mind is that most documents for MirageOS cover OCaml 4.x, but I am on OCaml 5.x. Let’s try the older version of OCaml first. Same error. Maybe there’s an issue with my whole switch setup, so let’s try on a clean user with nothing but the default switch.
[ERROR] Failed to extract archive /home/user/.opam/repo/default.tar.gz: "/usr/bin/tar xfz
/home/user/.opam/repo/default.tar.gz -C /tmp/opam-47813-6b24e2" exited with code 2.
Run `opam update --repositories default` to fix the issue
That looks even worse! Somehow I can’t even install OCaml at all now. Error 2
is a fatal tar
error. Let’s try what it recommends:
$ opam update --repositories default
<><> Updating package repositories ><><><><><><><><><><><><><><><><><><><><><><>
[ERROR] Could not update repository "default": /usr/bin/opam: "mkdir" failed on /tmp/opam-48052-192552: No space left
on device
Here we go. However, I’m only using 75 MiB out of 1 GiB of space on /tmp
.
State is hard, so let’s try one more time with a fresh install. This time,
opam init
does install the compiler! But I still get the same opam-monorepo
error as before.
We’ve gotten one “good” error out of this process: The hint that /tmp
is out
of space. All the other errors were unhelpful: MirageOS’s make depends
gave a
generic error about no implementations found, which looks almost like a network
failure, and tar
gave a generic fatal error. I don’t blame tar
, it’s
operating under strong requirements for backwards compatibility and it’s not
easy to tweak error reporting. But opam-monorepo
is probably failing to
propagate an underlying opam
error here. Fortunately, the good error about
/tmp
is all we need.
On Linux, there’s actually two ways to run out of “space”: You can run out of blocks, and you can run out of inodes. Each file is associated with an inode, which stores relevant metadata for the file. If we aren’t out of block space, we must be out of inodes:
$ df -i /tmp
Filesystem Inodes IUsed IFree IUse% Mounted on
tmpfs 44492 44492 0 100% /tmp
That’s the problem! OCaml’s opam
uses /tmp
for a staging area when
performing various operations, and it was exhausting the number of inodes in
/tmp
. Unfortunately, there seems to be highly non-deterministic behavior
lurking here: Some opam
runs would exhaust the inodes, others would not. In
fact, I’ve probably run into this error before with opam
, but it was always
fixed by cleaning up /tmp
or rebooting. I only figured it out, and was forced
to fix it, when opam-monorepo
gave me mysterious but reproducible errors.
Now that we know the cause, how can we fix it? There’s another little rabbit
hole here which I will skim over: /tmp
is stored in RAM. Back in the day I
remember manually setting this up in /etc/fstab
, but there’s no fstab
entry
here. (Does anyone else pronounce “fstab” as “f-stab”, or is that just me?)
Sounds like systemd
shenanigans, and in fact it is.
The root cause is (in my opinion) a bug in my Linux distribution. In the
process of overriding the systemd
config file for mounting /tmp
to increase
the memory allocation, they forgot to also increase the inode allocation. This
resulted in the (as far as I can see undocumented) default of 44492 inodes,
which is way too low. Problem fixed locally, and a pull request submitted
upstream: https://github.com/QubesOS/qubes-core-agent-linux/pull/535
As far as the rest of Impossible Day: After spending far too much time
debugging this, I didn’t get nearly as far as I hoped. Things seem a lot
simpler when writing them out in a blog post than when trying to debug them
live. I only had time to play around with using Lwt
in MirageOS for async
promises. But it was a good experience, and I’m not yet ready to give up on my
broader ambitions yet.
In unrelated news, this blog is now hosted on a different VPS, since my $1/year deal on Gullo from last Black Friday has expired. I’ll be keeping an eye out for new extremely cheap options over the next week.