A Curious Moon in Podman or Docker
I recently started working through A Curious Moon, a wonderfully clever data science “mystery” in PostgreSQL. The setup in the book uses Postgres on bare metal, but I wanted to use Postgres in Podman, which is like Docker. One interesting suggestion in the book is to use a Makefile to organize ETL. But this needed a little massaging work with Podman on Fedora, which I want to share here.
Here is my Compose file:
services:
cassini-pg:
image: postgres:17
restart: unless-stopped
container_name: cassini-pg
environment:
- POSTGRES_USER=cassini
- POSTGRES_PASSWORD=super_secret_password
- UID=1000
- GID=1000
volumes:
- ./curious_data:/curious_data:z
- ./scripts:/scripts:z
On a system with enforcing SELinux like Fedora, you’ll need to add the :z
to
the end of the volume mounts. This will ensure the volume is shared with the
correct SELinux labels. Otherwise, you will receive a permissions error when
trying to access the directory.
And here is my Makefile:
CONTAINER=cassini-pg
USER=cassini
DB=enceladus
LOCALPATH=${CURDIR}/
DOCKERPATH=/
SCRIPTS=scripts
CSV='/curious_data/data/master_plan.csv'
MASTER=$(SCRIPTS)/import.sql
NORMALIZE=$(SCRIPTS)/normalize.sql
BUILD=$(SCRIPTS)/build.sql
all: normalize
podman exec -it $(CONTAINER) psql $(DB) -U $(USER) -f $(DOCKERPATH)$(BUILD) && podman exec -it $(CONTAINER) psql $(DB) -U $(USER)
master:
@cat $(LOCALPATH)$(MASTER) >> $(LOCALPATH)$(BUILD)
import: master
@echo "COPY import.master_plan FROM $(CSV) WITH DELIMITER ',' HEADER CSV;" >> $(LOCALPATH)$(BUILD)
normalize: import
@cat $(LOCALPATH)$(NORMALIZE) >> $(LOCALPATH)$(BUILD)
clean:
@rm -rf $(LOCALPATH)$(BUILD)
createdb:
podman exec -it $(CONTAINER) createdb $(DB) -U $(USER)
psql:
podman exec -it $(CONTAINER) psql $(DB) -U $(USER)
A few things to note here. First, when using containers (at least at the start)
you will often want to completely blow up and recreate a container. This means
being prepared to re-run createdb
. I wanted to make this a target of all
and psql
, but it appears that Postgres does not have an easy-to-use command
like CREATE DATABASE IF NOT EXISTS
. So instead error messages on a new
container should point you to the need to re-run the createdb
target.
Second, the default make target should leave us in an interactive psql
session. However, just passing a script to the container drops us into what
appears to be a psql
session (you can type in it), but it is not usable
(“Enter” does not execute a command). Therefore, in the all
target we first
load the build script and then execute a separate psql
interactive session.
Third, the paths are a bit messy because we need to use a mixture of paths outside of the container and paths within the container. This could be simplified by using absolute paths and adjusting the container volume mounts to be identical. I did not pursue this, however, because hard-coding absolute paths in the volumes seems fragile and exposes more information than is necessary to the container.
I look forward to sharing more of my journey through this book here on my blog. One more thing: The link to the data included in the book itself seems to be dead, but the data is still available here: https://archive.redfour.io/