My homelab includes a virtual machine, media01, which hosts several critical services. Among these is an Immich instance, a self-hosted photo and video management application. This service is used by my family to access media dating back to 2010.

Recently, an issue arose after I inadvertently removed the media01 VM from the domain using the ipa-client --uninstall command. Compounding this, the last backup of the VM was on April 5, 2024.

As a result, the Podman containers on the VM failed to start. While the data volume containing the actual photos and thumbnails remained unaffected and is backed up separately on a Hetzner Storage Box and a local HDD, the inability to launch the database container, immich_postgres, was a significant problem.

No Database container, no immich.

Investigating the Startup Failure

I’ve examined the logs of the immich_postgres container via podman start immich_postgres and podman logs -f 30 immich_postgres and found:

2024-12-27 15:45:09.628 UTC [13903] STATEMENT:  INSERT INTO "assets"("id", "deviceAssetId", "ownerId", "libraryId", "deviceId", "type", "status", "originalPath", "thumbhash", "encodedVideoPath", "createdAt", "updatedAt", "deletedAt", "fileCreatedAt", "localDateTime", "fileModifiedAt", "isFavorite", "isArchived", "isExternal", "isOffline", "checksum", "duration", "isVisible", "livePhotoVideoId", "originalFileName", "sidecarPath", "stackId", "duplicateId") VALUES (DEFAULT, $1, $2, $3, $4, $5, DEFAULT, $6, DEFAULT, DEFAULT, DEFAULT, DEFAULT, DEFAULT, $7, $8, $9, $10, $11, DEFAULT, DEFAULT, $12, $13, $14, DEFAULT, $15, DEFAULT, DEFAULT, DEFAULT) RETURNING "id", "status", "encodedVideoPath", "createdAt", "updatedAt", "deletedAt", "isFavorite", "isArchived", "isExternal", "isOffline", "isVisible"
2025-01-04 07:06:34.716 UTC [1] LOG:  received smart shutdown request
chmod: changing permissions of '/var/lib/postgresql/data': Operation not permitted
chmod: changing permissions of '/var/run/postgresql': Operation not permitted
find: ‘/var/lib/postgresql/data’: Permission denied
chown: changing ownership of '/var/lib/postgresql/data': Operation not permitted
chmod: changing permissions of '/var/lib/postgresql/data': Operation not permitted
chmod: changing permissions of '/var/run/postgresql': Operation not permitted
find: ‘/var/lib/postgresql/data’: Permission denied
chown: changing ownership of '/var/lib/postgresql/data': Operation not permitted

The logs indicated a permission issue, preventing the container from executing chmod and chown commands on directories within its volume during startup. This was hindering the database initialization process.

Consequences of Database Reinitialization

While the media files were secure, starting with a new database would result in the loss of face recognition data generated by Immich’s machine learning. Recreating user accounts, shared albums, and other configurations would also be necessary.

Initial Troubleshooting Steps

Attempts to resolve the issue by changing the ownership of the container volume to the expected user and group (container-service-account:container-service-account) were unsuccessful. Similarly, setting world-writable permissions using chmod 777 did not resolve the problem.

This led to an investigation into user namespaces and subIDs, as documented in the subuid man page (https://man7.org/linux/man-pages/man5/subuid.5.html). The removal of the VM from the IPA domain likely disrupted the management of these subIDs.

Resolution Procedure

The following steps were taken to rectify the situation:

1. VM Backup

A backup of the current VM state was created as a precautionary measure. This allows me to be able to roll back any changes as I attempt to fix the VM.

2. Local User Removal

The local user account (container-service-account), used by the container before domain integration, was removed using userdel container-service-account.

3. Forced IPA Domain Rejoin

The VM was rejoined to the IPA domain using the following command:

ipa-client-install --domain=xxxxx.apigban.com --server=ipa.xxxxx.apigban.com  --realm=xxxxx.APIGBAN.COM  -U --force-join
4. Centralized SubID Re-enablement

Centralized management of subIDs was re-enabled by adding sss to the subid line in /etc/nsswitch.conf:

...
subid:   sss
...
5. Verification and Removal of Local SubID Configurations

I made sure that /etc/subuid and /etc/subgid had no entries related to container-service-account.

6. Automated Backup Configuration

To prevent future data loss, automated backups for the VM were configured.

PBS Backup Configuration for media01
7. VM Restart and Verification

The VM was restarted. After logging in as container-service-account, the command podman ps was executed to check the container status.

The containers, including immich_postgres, were running:

CONTAINER ID  IMAGE                                                                                                                   COMMAND               CREATED       STATUS         PORTS                   NAMES
877af9be8b20  registry.hub.docker.com/tensorchord/pg... postgres -c share... 9 months ago  Up 53 minutes                          immich_postgres
2d0972e9c5e3  registry.hub.docker.com/library/redis@sha256:51d6c...           redis-server          9 months ago  Up 53 minutes                          immich_redis
b20d9b6d407d  ghcr.io/immich-app/immich-server:v1.118.0                                                                               start.sh              2 months ago  Up 52 minutes  0.0.0.0:2283->2283/tcp  immich_server
71b2999c35ee  ghcr.io/immich-app/immich-machine-learning:v1.118.0                                                                     ./start.sh            2 months ago
8. Web Application Verification

Accessing the Immich web interface confirmed the service was functioning correctly.

Immich

Key Takeaways

This incident highlights several important considerations:

  • Domain Integration Impact: Integrating containers with a domain environment like IPA requires careful management. Unintended disassociation can lead to permission-related issues.
  • Understanding User Namespaces: When encountering “Operation Not Permitted” errors within containers, particularly after domain changes, understanding user namespace remapping is crucial for effective diagnosis.
  • Importance of Backups: Consistent and automated backups are essential to mitigate data loss.

Future Actions

To improve the management and documentation of this setup, I’m working towards the following actions:

  • Develop an Immich Podman Ansible Role: Automating the deployment and configuration of the Immich containers using Ansible will enhance reproducibility and provide self-documentation.
  • Consider Database Migration to an Ansible-Managed VM: Moving the database to a dedicated VM managed by Ansible could offer greater control over backups and recovery procedures.