kubernetes-deployments/docs/longhorn.md

# Longhorn notes

## Migration from NFS to Longhorn

1. Delete the workload, and delete the PVC and PVC using NFS.
2. Create Longhorn volumes as described below.
3. Copy NFS data from lewis.dmz to local disk.
4. Spin up a temporary pod and mount the Longhorn volume(s) in it:
   ```nix
    {
      pods.testje.spec = {
        containers.testje = {
          image = "nginx";
  
          volumeMounts = [
            {
              name = "uploads";
              mountPath = "/hedgedoc/public/uploads";
            }
          ];
        };
  
        volumes = {
          uploads.persistentVolumeClaim.claimName = "hedgedoc-uploads";
        };
      };
    }
   ```
5. Use `kubectl cp` to copy the data from the local disk to the pod.
6. Delete the temporary pod.
7. Be sure to set the group ownership of the mount to the correct GID.
7. Create the workload with updated volume mounts.
8. Delete the data from local disk.

## Creation of new Longhorn volumes

While it seems handy to use a K8s StorageClass for Longhorn, we do *not* want to use that.
If you use a StorageClass, a PV and Longhorn volume will be automatically provisioned.
These will have the name `pvc-<UID of PVC>`, where the UID of the PVC is random.
This makes it hard to restore a backup to a Longhorn volume with the correct name.

Instead, we want to manually create the Longhorn volumes via the web UI.
Then, we can create the PV and PVC as usual using our K8s provisioning tool (e.g. Kubectl/Kubenix).

Follow these actions to create a Volume:
1. Using the Longhorn web UI, create a new Longhorn volume, keeping the following in mind:
     - The size can be some more than what we expect to reasonable use. We use storage-overprovisioning, so the total size of volumes can exceed real disk size.
     - The number of replicas should be 2.
2. Enable the "backup-nfs" recurring job for the Longhorn volume.
3. Disable the "default" recurring job group for the Longhorn volume.
4. Create the PV, PVC and workload as usual.

## Disaster recovery using Longhorn backups

Backing up Longhorn volumes is very easy, but restoring them is more tricky.
We consider here the case when all our machines are wiped, and all we have left is Longhorn backups.
To restore a backup, perform the following actions:
1. Restore the latest snapshot in the relevant Longhorn backup, keeping the following in mind:
   - The name should remain the same (i.e. the one chosen at Longhorn volume creation).
   - The number of replicas should be 2.
   - Disable recurring jobs.
2. Enable the "backup-nfs" recurring job for the Longhorn volume.
3. Disable the "default" recurring job group for the Longhorn volume.
4. Create the PV, PVC and workload as usual.

## Recovering Longhorn volumes without a Kubernetes cluster

1. Navigate to the Longhorn backupstore location (`/mnt/longhorn/persistent/longhorn-backup/backupstore/volumes` for us).
2. Find the directory for the desired volume: `ls **/**`.
3. Determine the last backup for the volume: `cat volume.cfg | jq '.LastBackupName'`.
4. Find the blocks and the order that form the volume: `cat backups/<name>.cfg | jq '.Blocks'`.
5. Extract each block using lz4: `lz4 -d blocks/XX/YY/XXYY.blk block`.
6. Append the blocks to form the file system: `cat block1 block2 block3 > volume.img`
7. Lastly we need to fix the size of the image. We can simply append zero's to the end until the file is long enough so `fsck.ext4` does not complain anymore.
8. Mount the image: `mount -o loop volume.img /mnt/volume`.
Move over stuff from nixos-servers 2024-09-07 19:59:41 +00:00			`# Longhorn notes`

			`## Migration from NFS to Longhorn`

			`1. Delete the workload, and delete the PVC and PVC using NFS.`
			`2. Create Longhorn volumes as described below.`
			`3. Copy NFS data from lewis.dmz to local disk.`
			`4. Spin up a temporary pod and mount the Longhorn volume(s) in it:`
			```nix
			`{`
			`pods.testje.spec = {`
			`containers.testje = {`
			`image = "nginx";`

			`volumeMounts = [`
			`{`
			`name = "uploads";`
			`mountPath = "/hedgedoc/public/uploads";`
			`}`
			`];`
			`};`

			`volumes = {`
			`uploads.persistentVolumeClaim.claimName = "hedgedoc-uploads";`
			`};`
			`};`
			`}`
			```
			5. Use `kubectl cp` to copy the data from the local disk to the pod.
			`6. Delete the temporary pod.`
			`7. Be sure to set the group ownership of the mount to the correct GID.`
			`7. Create the workload with updated volume mounts.`
			`8. Delete the data from local disk.`

			`## Creation of new Longhorn volumes`

			`While it seems handy to use a K8s StorageClass for Longhorn, we do not want to use that.`
			`If you use a StorageClass, a PV and Longhorn volume will be automatically provisioned.`
			These will have the name `pvc-<UID of PVC>`, where the UID of the PVC is random.
			`This makes it hard to restore a backup to a Longhorn volume with the correct name.`

			`Instead, we want to manually create the Longhorn volumes via the web UI.`
			`Then, we can create the PV and PVC as usual using our K8s provisioning tool (e.g. Kubectl/Kubenix).`

			`Follow these actions to create a Volume:`
			`1. Using the Longhorn web UI, create a new Longhorn volume, keeping the following in mind:`
			`- The size can be some more than what we expect to reasonable use. We use storage-overprovisioning, so the total size of volumes can exceed real disk size.`
			`- The number of replicas should be 2.`
			`2. Enable the "backup-nfs" recurring job for the Longhorn volume.`
			`3. Disable the "default" recurring job group for the Longhorn volume.`
			`4. Create the PV, PVC and workload as usual.`

			`## Disaster recovery using Longhorn backups`

			`Backing up Longhorn volumes is very easy, but restoring them is more tricky.`
			`We consider here the case when all our machines are wiped, and all we have left is Longhorn backups.`
			`To restore a backup, perform the following actions:`
			`1. Restore the latest snapshot in the relevant Longhorn backup, keeping the following in mind:`
			`- The name should remain the same (i.e. the one chosen at Longhorn volume creation).`
			`- The number of replicas should be 2.`
			`- Disable recurring jobs.`
			`2. Enable the "backup-nfs" recurring job for the Longhorn volume.`
			`3. Disable the "default" recurring job group for the Longhorn volume.`
			`4. Create the PV, PVC and workload as usual.`

			`## Recovering Longhorn volumes without a Kubernetes cluster`

			1. Navigate to the Longhorn backupstore location (`/mnt/longhorn/persistent/longhorn-backup/backupstore/volumes` for us).
			2. Find the directory for the desired volume: `ls /`.
			3. Determine the last backup for the volume: `cat volume.cfg \| jq '.LastBackupName'`.
			4. Find the blocks and the order that form the volume: `cat backups/<name>.cfg \| jq '.Blocks'`.
			5. Extract each block using lz4: `lz4 -d blocks/XX/YY/XXYY.blk block`.
			6. Append the blocks to form the file system: `cat block1 block2 block3 > volume.img`
			7. Lastly we need to fix the size of the image. We can simply append zero's to the end until the file is long enough so `fsck.ext4` does not complain anymore.
			8. Mount the image: `mount -o loop volume.img /mnt/volume`.