From e242401553971d5903ae4671bcb395185a81bcd2 Mon Sep 17 00:00:00 2001 From: Pim Kunis Date: Tue, 30 Apr 2024 13:13:31 +0200 Subject: [PATCH] Remove jekyll cache --- ...40b622142f1c98125abcfe89a76a661b0e8e343910 | 1 - ...342cacbcd9ada77932aa71a300ff4a742b3a5403a2 | 180 ----------- ...97fa2f92ac4f63f127b7ede00efab21d453837389e | 170 ----------- ...301f8c4137151c8a5fce8bfdc7931f6d3eddecd1d0 | Bin 6156 -> 0 bytes ...3f412280aef126498994f10a177acf84deee8117b9 | 281 ------------------ ...f89e0c9cc99bcf4bcbf90a5e819a6827177e476177 | 66 ---- ...7911a944a2c6cdecdcf1af5627383af00a22c1698a | 49 --- ...555e6f42751b96af7f180d0a086c18cac528a356b7 | 53 ---- ...d71068b25447db3acbd8d5f496b19d502161a999cd | 51 ---- 9 files changed, 851 deletions(-) delete mode 100644 .jekyll-cache/Jekyll/Cache/Jekyll--Cache/b7/9606fb3afea5bd1609ed40b622142f1c98125abcfe89a76a661b0e8e343910 delete mode 100644 .jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/0e/763d8239a78fde13ad37342cacbcd9ada77932aa71a300ff4a742b3a5403a2 delete mode 100644 .jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/1a/68cb66b01bbf383da07f97fa2f92ac4f63f127b7ede00efab21d453837389e delete mode 100644 .jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/42/129c8d8ed3adce9c8fb7301f8c4137151c8a5fce8bfdc7931f6d3eddecd1d0 delete mode 100644 .jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/59/b34569de7b190c8504763f412280aef126498994f10a177acf84deee8117b9 delete mode 100644 .jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/95/3072d9307fff41fb3452f89e0c9cc99bcf4bcbf90a5e819a6827177e476177 delete mode 100644 .jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/b4/97aba66d754cde21ccf57911a944a2c6cdecdcf1af5627383af00a22c1698a delete mode 100644 .jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/cf/a279eddeb4b979b0e9e6555e6f42751b96af7f180d0a086c18cac528a356b7 delete mode 100644 .jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/dd/fa550314d3f6e485fefcd71068b25447db3acbd8d5f496b19d502161a999cd diff --git a/.jekyll-cache/Jekyll/Cache/Jekyll--Cache/b7/9606fb3afea5bd1609ed40b622142f1c98125abcfe89a76a661b0e8e343910 b/.jekyll-cache/Jekyll/Cache/Jekyll--Cache/b7/9606fb3afea5bd1609ed40b622142f1c98125abcfe89a76a661b0e8e343910 deleted file mode 100644 index 09128b6..0000000 --- a/.jekyll-cache/Jekyll/Cache/Jekyll--Cache/b7/9606fb3afea5bd1609ed40b622142f1c98125abcfe89a76a661b0e8e343910 +++ /dev/null @@ -1 +0,0 @@ -I"˙{"source"=>"/home/pim/git/blog-pim", "destination"=>"/home/pim/git/blog-pim/_site", "collections_dir"=>"", "cache_dir"=>".jekyll-cache", "plugins_dir"=>"_plugins", "layouts_dir"=>"_layouts", "data_dir"=>"_data", "includes_dir"=>"_includes", "collections"=>{"posts"=>{"output"=>true, "permalink"=>"/:categories/:year/:month/:day/:title:output_ext"}}, "safe"=>false, "include"=>[".htaccess"], "exclude"=>[".sass-cache", ".jekyll-cache", "gemfiles", "Gemfile", "Gemfile.lock", "node_modules", "vendor/bundle/", "vendor/cache/", "vendor/gems/", "vendor/ruby/"], "keep_files"=>[".git", ".svn"], "encoding"=>"utf-8", "markdown_ext"=>"markdown,mkdown,mkdn,mkd,md", "strict_front_matter"=>false, "show_drafts"=>nil, "limit_posts"=>0, "future"=>false, "unpublished"=>false, "whitelist"=>[], "plugins"=>[], "markdown"=>"kramdown", "highlighter"=>"rouge", "lsi"=>false, "excerpt_separator"=>"\n\n", "incremental"=>false, "detach"=>false, "port"=>"4000", "host"=>"127.0.0.1", "baseurl"=>nil, "show_dir_listing"=>false, "permalink"=>"date", "paginate_path"=>"/page:num", "timezone"=>nil, "quiet"=>false, "verbose"=>false, "defaults"=>[], "liquid"=>{"error_mode"=>"warn", "strict_filters"=>false, "strict_variables"=>false}, "kramdown"=>{"auto_ids"=>true, "toc_levels"=>[1, 2, 3, 4, 5, 6], "entity_output"=>"as_char", "smart_quotes"=>"lsquo,rsquo,ldquo,rdquo", "input"=>"GFM", "hard_wrap"=>false, "guess_lang"=>true, "footnote_nr"=>1, "show_warnings"=>false}, "livereload_port"=>35729, "serving"=>true, "watch"=>true, "url"=>"http://localhost:4000"}:ET \ No newline at end of file diff --git a/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/0e/763d8239a78fde13ad37342cacbcd9ada77932aa71a300ff4a742b3a5403a2 b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/0e/763d8239a78fde13ad37342cacbcd9ada77932aa71a300ff4a742b3a5403a2 deleted file mode 100644 index ea8297a..0000000 --- a/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/0e/763d8239a78fde13ad37342cacbcd9ada77932aa71a300ff4a742b3a5403a2 +++ /dev/null @@ -1,180 +0,0 @@ -I"±N

Recently, I deployed Concourse CI because I wanted to get my feet wet with a CI/CD pipeline. -However, I had a practical use case lying around for a long time: automatically compiling my static website and deploying it to my docker Swarm. -This took some time getting right, but the result works like a charm (source code).

- -

It’s comforting to know I don’t have move a finger and my website is automatically deployed. -However, I would still like to receive some indication of what’s happening. -And what’s a better way to do that, than using my Apprise service to keep me up to date. -There’s a little snag though: I could not find any Concourse resource that does this. -That’s when I decided to just create it myself.

- -

The Plagiarism Hunt

- -

As any good computer person, I am lazy. -I’d rather just copy someone’s work, so that’s what I did. -I found this GitHub repository that does the same thing but for Slack notifications. -For some reason it’s archived, but it seemed like it should work. -I actually noticed lots of repositories for Concourse resource types are archived, so not sure what’s going on there.

- -

Getting to know Concourse

- -

Let’s first understand what we need to do reach our end goal of sending Apprise notifications from Concourse.

- -

A Concourse pipeline takes some inputs, performs some operations on them which result in some outputs. -These inputs and outputs are called resources in Concourse. -For example, a Git repository could be a resource. -Each resource is an instance of a resource type. -A resource type therefore is simply a blueprint that can create multiple resources. -To continue the example, a resource type could be “Git repository”.

- -

We therefore need to create our own resource type that can send Apprise notifications. -A resource type is simply a container that includes three scripts:

- - -

As Apprise notifications are basically fire-and-forget, we will only implement the out script.

- -

Writing the out script

- -

The whole script can be found here, but I will explain the most important bits of it. -Note that I only use Apprise’s persistent storage solution, and not its stateless solution.

- -

Concourse provides us with the working directory, which we cd to:

-
cd "${1}"
-
- -

We create a timestamp, formatted in JSON, which we will use for the resource’s new version later. -Concourse requires us to set a version for the resource, but since Apprise notifications don’t have that, we use the timestamp:

-
timestamp="$(jq -n "{version:{timestamp:\"$(date +%s)\"}}")"
-
- -

First some black magic Bash to redirect file descriptors. -Not sure why this is needed, but I copied it anyways. -After that, we create a temporary file holding resource’s parameters.

-
exec 3>&1
-exec 1>&2
-
-payload=$(mktemp /tmp/resource-in.XXXXXX)
-cat > "${payload}" <&0
-
- -

We then extract the individual parameters. -The source key contains values how the resource type was specified, while the params key specifies parameters for this specific resource.

-
apprise_host="$(jq -r '.source.host' < "${payload}")"
-apprise_key="$(jq -r '.source.key' < "${payload}")"
-
-alert_body="$(jq -r '.params.body' < "${payload}")"
-alert_title="$(jq -r '.params.title // null' < "${payload}")"
-alert_type="$(jq -r '.params.type // null' < "${payload}")"
-alert_tag="$(jq -r '.params.tag // null' < "${payload}")"
-alert_format="$(jq -r '.params.format // null' < "${payload}")"
-
- -

We then format the different parameters using JSON:

-
alert_body="$(eval "printf \"${alert_body}\"" | jq -R -s .)"
-[ "${alert_title}" != "null" ] && alert_title="$(eval "printf \"${alert_title}\"" | jq -R -s .)"
-[ "${alert_type}" != "null" ] && alert_type="$(eval "printf \"${alert_type}\"" | jq -R -s .)"
-[ "${alert_tag}" != "null" ] && alert_tag="$(eval "printf \"${alert_tag}\"" | jq -R -s .)"
-[ "${alert_format}" != "null" ] && alert_format="$(eval "printf \"${alert_format}\"" | jq -R -s .)"
-
- -

Next, from the individual parameters we construct the final JSON message body we send to the Apprise endpoint.

-
body="$(cat <<EOF
-{
-  "body": ${alert_body},
-  "title": ${alert_title},
-  "type": ${alert_type},
-  "tag": ${alert_tag},
-  "format": ${alert_format}
-}
-EOF
-)"
-
- -

Before sending it just yet, we compact the JSON and remove any values that are null:

-
compact_body="$(echo "${body}" | jq -c '.')"
-echo "$compact_body" | jq 'del(..|nulls)' > /tmp/compact_body.json
-
- -

Here is the most important line, where we send the payload to the Apprise endpoint. -It’s quite straight-forward.

-
curl -v -X POST -T /tmp/compact_body.json -H "Content-Type: application/json" "${apprise_host}/notify/${apprise_key}"
-
- -

Finally, we print the timestamp (fake version) in order to appease the Concourse gods.

-
echo "${timestamp}" >&3
-
- -

Building the Container

- -

As said earlier, to actually use this script, we need to add it to a image. -I won’t be explaining this whole process, but the source can be found here. -The most important take-aways are these:

- - -

Using the Resource Type

- -

Using our newly created resource type is surprisingly simple. -I use it for the blog you are reading right now and the pipeline definition can be found here. -Here we specify the resource type in a Concourse pipeline:

-
resource_types:
-- name: apprise
-  type: registry-image
-  source:
-    repository: git.kun.is/pim/concourse-apprise-notifier
-    tag: "1.1.1"
-
- -

We simply have to tell Concourse where to find the image, and which tag we want. -Next, we instantiate the resource type to create a resource:

-
resources:
-- name: apprise-notification
-  type: apprise
-  source:
-    host: https://apprise.kun.is:444
-    key: concourse
-  icon: bell
-
- -

We simply specify the host to send Apprise notifications to. -Yeah, I even gave it a little bell because it’s cute.

- -

All that’s left to do, is actually send the notification. -Let’s see how that is done:

-
- name: deploy-static-website
-  plan:
-    - task: deploy-site
-      config: ...
-
-      on_success:
-	put: apprise-notification
-	params:
-	  title: "Static website deployed!"
-	  body: "New version: $(cat version/version)"
-	no_get: true
-
- -

As can be seen, the Apprise notification can be triggered when a task is executed successfully. -We do this using the put command, which execute the out script underwater. -We set the notification’s title and body, and send it! -The result is seen below in my Ntfy app, which Apprise forwards the message to: -picture showing my Ntfy app with the Apprise notification

- -

And to finish this off, here is what it looks like in the Concourse web UI: -the concourse web gui showing the pipeline of my static website including the the apprise notification resources

- -

Conclusion

- -

Concourse’s way of representing everything as an image/container is really interesting in my opinion. -A resource type is quite easily implemented as well, although Bash might not be the optimal way to do this. -I’ve seen some people implement it in Rust, which might be a good excuse to finally learn that language :)

- -

Apart from Apprise notifications, I’m planning on creating a resource type to deploy to a Docker swarm eventually. -This seems like a lot harder than simply sending notifications though.

-:ET \ No newline at end of file diff --git a/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/1a/68cb66b01bbf383da07f97fa2f92ac4f63f127b7ede00efab21d453837389e b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/1a/68cb66b01bbf383da07f97fa2f92ac4f63f127b7ede00efab21d453837389e deleted file mode 100644 index dcc8b00..0000000 --- a/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/1a/68cb66b01bbf383da07f97fa2f92ac4f63f127b7ede00efab21d453837389e +++ /dev/null @@ -1,170 +0,0 @@ -I"µ:

Ever SSH’ed into a freshly installed server and gotten the following annoying message?

-
The authenticity of host 'host.tld (1.2.3.4)' can't be established.
-ED25519 key fingerprint is SHA256:eUXGdm1YdsMAS7vkdx6dOJdOGHdem5gQp4tadCfdLB8.
-Are you sure you want to continue connecting (yes/no)?
-
- -

Or even more annoying:

-
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
-@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
-@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
-IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
-Someone could be eavesdropping on you right now (man-in-the-middle attack)!
-It is also possible that a host key has just been changed.
-The fingerprint for the ED25519 key sent by the remote host is
-SHA256:eUXGdm1YdsMAS7vkdx6dOJdOGHdem5gQp4tadCfdLB8.
-Please contact your system administrator.
-Add correct host key in /home/user/.ssh/known_hosts to get rid of this message.
-Offending ED25519 key in /home/user/.ssh/known_hosts:3
-  remove with:
-  ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "1.2.3.4"
-ED25519 host key for 1.2.3.4 has changed and you have requested strict checking.
-Host key verification failed.
-
- -

Could it be that the programmers at OpenSSH simply like to annoy us with these confusing messages? -Maybe, but these warnings also serve as a way to notify users of a potential Man-in-the-Middle (MITM) attack. -I won’t go into the details of this problem, but I refer you to this excellent blog post. -Instead, I would like to talk about ways to solve these annoying warnings.

- -

One obvious solution is simply to add each host to your known_hosts file. -This works okay when managing a handful of servers, but becomes unbearable when managing many servers. -In my case, I wanted to quickly spin up virtual machines using Duncan Mac-Vicar’s Terraform Libvirt provider, without having to accept their host key before connecting. -The solution? Issuing SSH host certificates using an SSH certificate authority.

- -

SSH Certificate Authorities vs. the Web

- -

The idea of an SSH certificate authority (CA) is quite easy to grasp, if you understand the web’s Public Key Infrastructure (PKI). -Just like with the web, a trusted party can issue certificates that are offered when establishing a connection. -The idea is, just by trusting the trusted party, you trust every certificate they issue. -In the case of the web’s PKI, this trusted party is bundled and trusted by your browser or operating system. -However, in the case of SSH, the trusted party is you! (Okay you can also trust your own web certificate authority) -With this great power, comes great responsibility which we will abuse heavily in this article.

- -

SSH Certificate Authority for Terraform

- -

So, let’s start with a plan. -I want to spawn virtual machines with Terraform which which are automatically provisioned with a SSH host certificate issued by my CA. -This CA will be another host on my private network, issuing certificates over SSH.

- -

Fetching the SSH Host Certificate

- -

First we generate an SSH key pair in Terraform. -Below is the code for that:

-
resource "tls_private_key" "debian" {
-  algorithm = "ED25519"
-}
-
-data "tls_public_key" "debian" {
-  private_key_pem = tls_private_key.debian.private_key_pem
-}
-
- -

Now that we have an SSH key pair, we need to somehow make Terraform communicate this with the CA. -Lucky for us, there is a way for Terraform to execute an arbitrary command with the external data feature. -We call this script below:

-
data "external" "cert" {
-  program = ["bash", "${path.module}/get_cert.sh"]
-
-  query = {
-    pubkey   = trimspace(data.tls_public_key.debian.public_key_openssh)
-    host     = var.name
-    cahost   = var.ca_host
-    cascript = var.ca_script
-    cakey    = var.ca_key
-  }
-}
-
- -

These query parameters will end up in the script’s stdin in JSON format. -We can then read these parameters, and send them to the CA over SSH. -The result must as well be in JSON format.

-
#!/bin/bash
-set -euo pipefail
-IFS=$'\n\t'
-
-# Read the query parameters
-eval "$(jq -r '@sh "PUBKEY=\(.pubkey) HOST=\(.host) CAHOST=\(.cahost) CASCRIPT=\(.cascript) CAKEY=\(.cakey)"')"
-
-# Fetch certificate from the CA
-# Warning: extremely ugly code that I am to lazy to fix
-CERT=$(ssh -o ConnectTimeout=3 -o ConnectionAttempts=1 root@$CAHOST '"'"$CASCRIPT"'" host "'"$CAKEY"'" "'"$PUBKEY"'" "'"$HOST"'".dmz')
-
-jq -n --arg cert "$CERT" '{"cert":$cert}'
-
- -

We see that a script is called on the remote host that issues the certificate. -This is just a simple wrapper around ssh-keygen, which you can see below.

-
#!/bin/bash
-set -euo pipefail
-IFS=$'\n\t'
-
-host() {
-	CAKEY="$2"
-	PUBKEY="$3"
-	HOST="$4"
-
-	echo "$PUBKEY" > /root/ca/"$HOST".pub
-	ssh-keygen -h -s /root/ca/keys/"$CAKEY" -I "$HOST" -n "$HOST" /root/ca/"$HOST".pub
-	cat /root/ca/"$HOST"-cert.pub
-	rm /root/ca/"$HOST"*.pub
-}
-
-"$1" "$@"
-
- -

Appeasing the Terraform Gods

- -

So nice, we can fetch the SSH host certificate from the CA. -We should just be able to use it right? -We can, but it brings a big annoyance with it: Terraform will fetch a new certificate every time it is run. -This is because the external feature of Terraform is a data source. -If we were to use this data source for a Terraform resource, it would need to be updated every time we run Terraform. -I have not been able to find a way to avoid fetching the certificate every time, except for writing my own resource provider which I’d rather not. -I have, however, found a way to hack around the issue.

- -

The idea is as follows: we can use Terraform’s ignore_changes to, well, ignore any changes of a resource. -Unfortunately, we cannot use this for a data source, so we must create a glue null_resource that supports ignore_changes. -This is shown in the code snipppet below. -We use the triggers property simply to copy the certificate in; we don’t use it for it’s original purpose.

- -
resource "null_resource" "cert" {
-  triggers = {
-    cert = data.external.cert.result["cert"]
-  }
-
-  lifecycle {
-    ignore_changes = [
-      triggers
-    ]
-  }
-}
-
- -

And voilà, we can now use null_resource.cert.triggers["cert"] as our certificate, that won’t trigger replacements in Terraform.

- -

Setting the Host Certificate with Cloud-Init

- -

Terraform’s Libvirt provider has native support for Cloud-Init, which is very handy. -We can give the host certificate directly to Cloud-Init and place it on the virtual machine. -Inside the Cloud-Init configuration, we can set the ssh_keys property to do this:

-
ssh_keys:
-  ed25519_private: |
-    ${indent(4, private_key)}
-  ed25519_certificate: "${host_cert}"
-
- -

I hardcoded this to ED25519 keys, because this is all I use.

- -

This works perfectly, and I never have to accept host certificates from virtual machines again.

- -

Caveats

- -

A sharp eye might have noticed the lifecycle of these host certificates is severely lacking. -Namely, the deployed host certificates have no expiration date nore is there revocation function. -There are ways to implement these, but for my home lab I did not deem this necessary at this point. -In a more professional environment, I would suggest using Hashicorp’s Vault.

- -

This project did teach me about the limits and flexibility of Terraform, so all in all a success! -All code can be found on the git repository here.

-:ET \ No newline at end of file diff --git a/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/42/129c8d8ed3adce9c8fb7301f8c4137151c8a5fce8bfdc7931f6d3eddecd1d0 b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/42/129c8d8ed3adce9c8fb7301f8c4137151c8a5fce8bfdc7931f6d3eddecd1d0 deleted file mode 100644 index a92a14d2460824f65e2a8a7668797b97bbce6e2f..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 6156 zcmds5-EP~+6)uWmQRoSJIm^B12CyaD>m-gVbKzzaLtQ5^VjFEy6vl`giBprD>C8~F zx>%sE(bwyf^gD+`DOxE`Y#=wiNTQHDGw0_!-}(8?Z%4_mfAzbeO#d^mJ9yW zG(k8ey{UIrQo?6WsaniUQ|4lV-~Uqm-c_MJj)Pz!tyQ(k@xKkB9U<#n)F#kNSqpU+ zl&fV)>g!-!zc*slZ#6zM6HzMZ=%`>ssbLoDuY-_Yj6_BX)UY>*34JexH)Rtvj)o$- zX024Klv9<;7Z4)H%pvYnX=DylNyj2-Y7D z2IH~sOP3W8ZtnWVO$Rfr=7^Qf`nIs&>-^;4@Y&Jfev(S)g5c!CflP%QQ>@?_7*8sm z;tX+4hu|jHRQ_v3NG(K=RqV1SmnDtB1=T-9b~1Y%m>Ucg~@2ha9);JO)^ zL~;>N{T`qfnpy@`E?9ISWa$mPx5lf7+h<|m{?XCVi|-D;vGE6+sw!)oba0%$Ba1ul z7yAMtNpK73whP^M@j;Sa@&8%Z*CbHvx1l<39+ zAs?I{Ol^bT)j+@4e>%vux20SRG^u!Fz3XS+oF5(}X^RAJ8V5LL2sF8HN5C|pY6)s+ z?TGS}Zc_tZz`s(IX0Dv@0hv2qFiMagNC`5}F9KKvUeI}|#266dr!7IyV`PqR^AKp* zJjsg;EZ|h$KrhP|H6A%geqKOZk*k@;E4~nSAjVzbZOW8`WMb0tbkU>%fYn{5Y=|NO zNqr0J_^?P^Z6nlmD&$ZAmST!~t>bD;Oy0n4-=6>Y<_R&V68J^=`b#NzGJ~PGYe>Np zSt-3gNlIMWz+1ZrI)(f$h(b@n9n%6D>^jp-Rgw}ih69pI`4ZwH{MkU}T;J+na8J;o zRD)LAo26+!Nvoy|+CulaTzDoRm>ZpN+Zwm}e20MnzAlIo@S(~fDzbxSUt8+Ysby+0Ew$HoL=DF!d@Xps+PU<*}- znU;L-4ncy^XOZUqF^)wNu~(NSQM}ovT%jQGef{30sT7ok$4PIL*zR7gStRm^m@;E@ z*Mmz?xlq=)YIziWJ<*iWhP@G8KwlIna(K!JmY}PV@Vc_7MqILUax?TXmOb?dwFqkc zmQaX&PWCz#aVg~tia1r}Qd<^+BbA&Vp6L1g(_QZJvvLGeS1E*cJ`;~Hvv?`lqk zrE`fMU@rKQ1qqN#QGi1teV7~{;p2e{*&v=Vm_narZodWd#4`LCOr5>I{^QNdH$R+@ z-oCoQU)O)R`Qg>;)9drAE{YKtD}>DpiH?Ke;#=hVg1bTsej;ZOoZ1YGtN@Qh-MM!V z6%vAx0dGlB6W1@BICN)a}vU@Ek6t-F>XXgopp zwDuZ3LTvKU2XS{##U2c!1XO|`#suoA+L$p5uH^`l+1yf$$6)SK-%>-2fN2z7DuI<; z;3T}#{`)8Q1Zg3w^2+@UEt9lHYHf>%g%q^qP_0WLRAA123*kk_OjwQoSw5!tFQY_D@*^*)7smhT>C_>!!dLTUd*ow*3mP zGAcn2yM+v|o3dQ_bBppV5j^Z*j|RF+tTkl7u8${nC$Zf^H2%vFt(|!fqA>IQew*e? zTG;loo+YW+_5xBZi)?#&pVTJlbXsjUwKg}P8z*lNaFp+D*L~dR&phj)n1! z&57^Y3ff}ve$qUbp2tR%7L488lRmC_>lALA#{#@a1QGS2inqz0|0j&N#phdc^b4_G zY?D6)lCc} z<2>$PVjDE3?Hq5;sCc&(7xOek-d(mEGy)k7j2LTaSvEm0j)o$m&(VNqjiVI_#_G`6 ze($BJFg~>-SnLE)=Pkwt`kvb>8lJg<8}l}hmQZ}#J+OsYU#5;H=}$3Ih~!$Zh}3B0 z*cM3C4$PjL0VbKYb-kd;zGJWTgCX4YYwIs;lKw58y8ePo$|jL+{T7G({SWa?Z_OaQ z`FZ)Gokc&=@Li!%=eb3r?^`I_&Djxi=+v?jy`HI)^yTvRYHKz^tM~D)kJu~DM{B+Q zY6AesuJw^`xZCsSZA6?4umEU$+r^Jnpz)A43m=@kI;>qMFz$Hp=ZOF((cYvs^UgGz zo>IH%b&h3bz?gwyH((6_Ak)@#l^X4KmiMP^iqNEtPfG21AjXJ;K92TOXp3pe@V>#! z!96Sz3CUPBShn&pG{9h53|I$sT{EU_h2_LZRJ4|fnFg7K=(4fBmA!!-G6qNN7~?LQ a9N^R(yG53W>BpvG-N>`dSI1wyy80i_mQG6m diff --git a/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/59/b34569de7b190c8504763f412280aef126498994f10a177acf84deee8117b9 b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/59/b34569de7b190c8504763f412280aef126498994f10a177acf84deee8117b9 deleted file mode 100644 index 08c63b3..0000000 --- a/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/59/b34569de7b190c8504763f412280aef126498994f10a177acf84deee8117b9 +++ /dev/null @@ -1,281 +0,0 @@ -I"ýG

I have been meaning to write about the current state of my home lab infrastructure for a while now. -Now that the most important parts are quite stable, I think the opportunity is ripe. -I expect this post to get quite long, so I might have to leave out some details along the way.

- -

This post will be a starting point for future infrastructure snapshots which I can hopefully put out periodically. -That is, if there is enough worth talking about.

- -

Keep an eye out for the icon, which links to the source code and configuration of anything mentioned. -Oh yeah, did I mention everything I do is open source?

- -

Networking and Infrastructure Overview

- -

Hardware and Operating Systems

- -

Let’s start with the basics: what kind of hardware do I use for my home lab? -The most important servers are my three Gigabyte Brix GB-BLCE-4105. -Two of them have 16 GB of memory, and one 8 GB. -I named these servers as follows:

-
    -
  • Atlas: because this server was going to “lift” a lot of virtual machines.
  • -
  • Lewis: we started out with a “Max” server named after the Formula 1 driver Max Verstappen, but it kind of became an unmanagable behemoth without infrastructure-as-code. Our second server we subsequently named Lewis after his colleague Lewis Hamilton. Note: people around me vetoed these names and I am no F1 fan!
  • -
  • Jefke: it’s a funny Belgian name. That’s all.
  • -
- -

Here is a picture of them sitting in their cosy closet:

- -

A picture of my servers.

- -

If you look look to the left, you will also see a Raspberry pi 4B. -I use this Pi to do some rudimentary monitoring whether servers and services are running. -More on this in the relevant section below. -The Pi is called Iris because it’s a messenger for the other servers.

- -

I used to run Ubuntu on these systems, but I have since migrated away to Debian. -The main reasons were Canonical putting advertisements in my terminal and pushing Snap which has a proprietry backend. -Two of my servers run the newly released Debian Bookworm, while one still runs Debian Bullseye.

- -

Networking

- -

For networking, I wanted hypervisors and virtual machines separated by VLANs for security reasons. -The following picture shows a simplified view of the VLANs present in my home lab:

- -

Picture showing the VLANS in my home lab.

- -

All virtual machines are connected to a virtual bridge which tags network traffic with the DMZ VLAN. -The hypervisors VLAN is used for traffic to and from the hypervisors. -Devices from the hypervisors VLAN are allowed to connect to devices in the DMZ, but not vice versa. -The hypervisors are connected to a switch using a trunk link, allows both DMZ and hypervisors traffic.

- -

I realised the above design using ifupdown. -Below is the configuration for each hypervisor, which creates a new enp3s0.30 interface with all DMZ traffic from the enp3s0 interface .

- -
auto enp3s0.30
-iface enp3s0.30 inet manual
-iface enp3s0.30 inet6 auto
-	accept_ra 0
-	dhcp 0
-	request_prefix 0
-	privext 0
-	pre-up sysctl -w net/ipv6/conf/enp3s0.30/disable_ipv6=1
-
- -

This configuration seems more complex than it actually is. -Most of it is to make sure the interface is not assigned an IPv4/6 address on the hypervisor host. -The magic .30 at the end of the interface name makes this interface tagged with VLAN ID 30 (DMZ for me).

- -

Now that we have an interface tagged for the DMZ VLAN, we can create a bridge where future virtual machines can connect to:

- -
auto dmzbr
-iface dmzbr inet manual
-	bridge_ports enp3s0.30
-	bridge_stp off
-iface dmzbr inet6 auto
-	accept_ra 0
-	dhcp 0
-	request_prefix 0
-	privext 0
-	pre-up sysctl -w net/ipv6/conf/dmzbr/disable_ipv6=1
-
- -

Just like the previous config, this is quite bloated because I don’t want the interface to be assigned an IP address on the host. -Most importantly, the bridge_ports enp3s0.30 line here makes this interface a virtual bridge for the enp3s0.30 interface.

- -

And voilĂ , we now have a virtual bridge on each machine, where only DMZ traffic will flow. -Here I verify whether this configuration works:

-
- Show - - -We can see that the two virtual interfaces are created, and are only assigned a MAC address and not a IP address: -```text -root@atlas:~# ip a show enp3s0.30 -4: enp3s0.30@enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master dmzbr state UP group default qlen 1000 - link/ether d8:5e:d3:4c:70:38 brd ff:ff:ff:ff:ff:ff -5: dmzbr: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 - link/ether 4e:f7:1f:0f:ad:17 brd ff:ff:ff:ff:ff:ff -``` - -Pinging a VM from a hypervisor works: -```text -root@atlas:~# ping -c1 maestro.dmz -PING maestro.dmz (192.168.30.8) 56(84) bytes of data. -64 bytes from 192.168.30.8 (192.168.30.8): icmp_seq=1 ttl=63 time=0.457 ms -``` - -Pinging a hypervisor from a VM does not work: -```text -root@maestro:~# ping -c1 atlas.hyp -PING atlas.hyp (192.168.40.2) 56(84) bytes of data. - ---- atlas.hyp ping statistics --- -1 packets transmitted, 0 received, 100% packet loss, time 0ms -``` -
- -

DNS and DHCP

- -

Now that we have a working DMZ network, let’s build on it to get DNS and DHCP working. -This will enable new virtual machines to obtain a static or dynamic IP address and register their host in DNS. -This has actually been incredibly annoying due to our friend Network address translation (NAT).

-
- NAT recap - -Network address translation (NAT) is a function of a router which allows multiple hosts to share a single IP address. -This is needed for IPv4, because IPv4 addresses are scarce and usually one household is only assigned a single IPv4 address. -This is one of the problems IPv6 attempts to solve (mainly by having so many IP addresses that they should never run out). -To solve the problem for IPv4, each host in a network is assigned a private IPv4 address, which can be reused for every network. - -Then, the router must perform address translation. -It does this by keeping track of ports opened by hosts in its private network. -If a packet from the internet arrives at the router for such a port, it forwards this packet to the correct host. -
- -

I would like to host my own DNS on a virtual machine (called hermes, more on VMs later) in the DMZ network. -This basically gives two problems:

- -
    -
  1. The upstream DNS server will refer to the public internet-accessible IP address of our DNS server. -This IP-address has no meaning inside the private network due to NAT and the router will reject the packet.
  2. -
  3. Our DNS resolves hosts to their public internet-accessible IP address. -This is similar to the previous problem as the public IP address has no meaning.
  4. -
- -

The first problem can be remediated by overriding the location of the DNS server for hosts inside the DMZ network. -This can be achieved on my router, which uses Unbound as its recursive DNS server:

- -

Unbound overides for kun.is and dmz domains.

- -

Any DNS requests to Unbound to domains in either dmz or kun.is will now be forwarded 192.168.30.7 (port 5353). -This is the virtual machine hosting my DNS.

- -

The second problem can be solved at the DNS server. -We need to do some magic overriding, which dnsmasq is perfect for :

- -
alias=84.245.14.149,192.168.30.8
-server=/kun.is/192.168.30.7
-
- -

This always overrides the public IPv4 address to the private one. -It also overrides the DNS server for kun.is to 192.168.30.7.

- -

Finally, behind the dnsmasq server, I run Powerdns as authoritative DNS server . -I like this DNS server because I can manage it with Terraform .

- -

Here is a small diagram showing my setup (my networking teacher would probably kill me for this): -Shitty diagram showing my DNS setup.

- -

Virtualization

-

https://github.com/containrrr/shepherd -Now that we have laid out the basic networking, let’s talk virtualization. -Each of my servers are configured to run KVM virtual machines, orchestrated using Libvirt. -Configuration of the physical hypervisor servers, including KVM/Libvirt is done using Ansible. -The VMs are spun up using Terraform and the dmacvicar/libvirt Terraform provider.

- -

This all isn’t too exciting, except that I created a Terraform module that abstracts the Terraform Libvirt provider for my specific scenario :

-
module "maestro" {
-  source          = "git::https://git.kun.is/home/tf-modules.git//debian"
-  name            = "maestro"
-  domain_name     = "tf-maestro"
-  memory          = 10240
-  mac             = "CA:FE:C0:FF:EE:08"
-}
-
- -

This automatically creates a Debian virtual machines with the properties specified. -It also sets up certificate-based SSH authentication which I talked about before.

- -

Clustering

- -

With virtualization explained, let’s move up one level further. -Each of my three physical servers hosts a virtual machine running Docker, which together form a Docker Swarm. -I use Traefik as a reverse proxy which routes requests to the correct container.

- -

All data is hosted on a single machine and made available to containers using NFS. -This might not be very secure (as NFS is not encrypted and no proper authentication), it is quite fast.

- -

As of today, I host the following services on my Docker Swarm :

- - -

CI / CD

- -

For CI / CD, I run Concourse CI in a separate VM. -This is needed, because Concourse heavily uses containers to create reproducible builds.

- -

Although I should probably use it for more, I currently use my Concourse for three pipelines:

- -
    -
  • A pipeline to build this static website and create a container image of it. -The image is then uploaded to the image registry of my Forgejo instance. -I love it when I can use stuff I previously built :) -The pipeline finally deploys this new image to the Docker Swarm .
  • -
  • A pipeline to create a Concourse resource that sends Apprise alerts (Concourse-ception?)
  • -
  • A pipeline to build a custom Fluentd image with plugins installed
  • -
- -

Backups

- -

To create backups, I use Borg. -As I keep all data on one machine, this backup process is quite simple. -In fact, all this data is stored in a single Libvirt volume. -To configure Borg with a simple declarative script, I use Borgmatic.

- -

In order to back up the data inside the Libvirt volume, I create a snapshot to a file. -Then I can mount this snapshot in my file system. -The files can then be backed up while the system is still running. -It is also possible to simply back up the Libvirt image, but this takes more time and storage .

- -

Monitoring and Alerting

- -

The last topic I would like to talk about is monitoring and alerting. -This is something I’m still actively improving and only just set up properly.

- -

Alerting

- -

For alerting, I wanted something that runs entirely on my own infrastructure. -I settled for Apprise + Ntfy.

- -

Apprise is a server that is able to send notifications to dozens of services. -For application developers, it is thus only necessary to implement the Apprise API to gain access to all these services. -The Apprise API itself is also very simple. -By using Apprise, I can also easily switch to another notification service later. -Ntfy is free software made for mobile push notifications.

- -

I use this alerting system in quite a lot of places in my infrastructure, for example when creating backups.

- -

Uptime Monitoring

- -

The first monitoring setup I created, was using Uptime Kuma. -Uptime Kuma periodically pings a service to see whether it is still running. -You can do a literal ping, test HTTP response codes, check database connectivity and much more. -I use it to check whether my services and VMs are online. -And the best part is, Uptime Kuma supports Apprise so I get push notifications on my phone whenever something goes down!

- -

Metrics and Log Monitoring

- -

A new monitoring system I am still in the process of deploying is focused on metrics and logs. -I plan on creating a separate blog post about this, so keep an eye out on that (for example using RSS :)). -Safe to say, it is no basic ELK stack!

- -

Conclusion

- -

That’s it for now! -Hopefully I inspired someone to build something… or how not to :)

-:ET \ No newline at end of file diff --git a/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/95/3072d9307fff41fb3452f89e0c9cc99bcf4bcbf90a5e819a6827177e476177 b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/95/3072d9307fff41fb3452f89e0c9cc99bcf4bcbf90a5e819a6827177e476177 deleted file mode 100644 index f206364..0000000 --- a/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/95/3072d9307fff41fb3452f89e0c9cc99bcf4bcbf90a5e819a6827177e476177 +++ /dev/null @@ -1,66 +0,0 @@ -I"

Previously, I have used Prometheus’ node_exporter to monitor the memory usage of my servers. -However, I am currently in the process of moving away from Prometheus to a new Monioring stack. -While I understand the advantages, I felt like Prometheus’ pull architecture does not scale nicely. -Everytime I spin up a new machine, I would have to centrally change Prometheus’ configuration in order for it to query the new server.

- -

In order to collect metrics from my servers, I am now using Fluent Bit. -I love Fluent Bit’s way of configuration which I can easily express as code and automate, its focus on effiency and being vendor agnostic. -However, I have stumbled upon one, in my opinion, big issue with Fluent Bit: its mem plugin to monitor memory usage is completely useless. -In this post I will go over the problem and my temporary solution.

- -

The Problem with Fluent Bit’s mem Plugin

- -

As can be seen in the documentation, Fluent Bit’s mem input plugin exposes a few metrics regarding memory usage which should be self-explaining: Mem.total, Mem.used, Mem.free, Swap.total, Swap.used and Swap.free. -The problem is that Mem.used and Mem.free do not accurately reflect the machine’s actual memory usage. -This is because these metrics include caches and buffers, which can be reclaimed by other processes if needed. -Most tools reporting memory usage therefore include an additional metric that specifices the memory available on the system. -For example, the command free -m reports the following data on my laptop:

-
               total        used        free      shared  buff/cache   available
-Mem:           15864        3728        7334         518        5647       12136
-Swap:           2383         663        1720
-
- -

Notice that the available memory is more than free memory.

- -

While the issue is known (see this and this link), it is unfortunately not yet fixed.

- -

A Temporary Solution

- -

The issues I linked previously provide stand-alone plugins that fix the problem, which will hopefully be merged in the official project at some point. -However, I didn’t want to install another plugin so I used Fluent Bit’s exec input plugin and the free Linux command to query memory usage like so:

-
[INPUT]
-    Name exec
-    Tag memory
-    Command free -m | tail -2 | tr '\n' ' '
-    Interval_Sec 1
-
- -

To interpret the command’s output, I created the following filter:

-
[FILTER]
-    Name parser
-    Match memory
-    Key_Name exec
-    Parser free
-
- -

Lastly, I created the following parser (warning: regex shitcode incoming):

-
[PARSER]
-    Name free
-    Format regex
-    Regex ^Mem:\s+(?<mem_total>\d+)\s+(?<mem_used>\d+)\s+(?<mem_free>\d+)\s+(?<mem_shared>\d+)\s+(?<mem_buff_cache>\d+)\s+(?<mem_available>\d+) Swap:\s+(?<swap_total>\d+)\s+(?<swap_used>\d+)\s+(?<swap_free>\d+)
-    Types mem_total:integer mem_used:integer mem_free:integer mem_shared:integer mem_buff_cache:integer mem_available:integer swap_total:integer swap_used:integer
-
- -

With this configuration, you can use the mem_available metric to get accurate memory usage in Fluent Bit.

- -

Conclusion

- -

Let’s hope Fluent Bit’s mem input plugin is improved upon soon so this hacky solution is not needed. -I also intend to document my new monitoring pipeline, which at the moment consists of:

-
    -
  • Fluent Bit
  • -
  • Fluentd
  • -
  • Elasticsearch
  • -
  • Grafana
  • -
-:ET \ No newline at end of file diff --git a/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/b4/97aba66d754cde21ccf57911a944a2c6cdecdcf1af5627383af00a22c1698a b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/b4/97aba66d754cde21ccf57911a944a2c6cdecdcf1af5627383af00a22c1698a deleted file mode 100644 index fc8d47a..0000000 --- a/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/b4/97aba66d754cde21ccf57911a944a2c6cdecdcf1af5627383af00a22c1698a +++ /dev/null @@ -1,49 +0,0 @@ -I"Đ

See the Update at the end of the article.

- -

Already a week ago, Hashicorp announced it would change the license on almost all its projects. -Unlike their previous license, which was the Mozilla Public License 2.0, their new license is no longer truly open source. -It is called the Business Source License™ and restricts use of their software for competitors. -In their own words:

-
-

Vendors who provide competitive services built on our community products will no longer be able to incorporate future releases, bug fixes, or security patches contributed to our products.

-
- -

I found a great article by MeshedInsights that names this behaviour the “rights ratchet model”. -They define a script start-ups use to garner the interest of open source enthusiasts but eventually turn their back on them for profit. -The reason why Hashicorp can do this, is because contributors signed a copyright license agreement (CLA). -This agreement transfers the copyright of contributors’ code to Hashicorp, allowing them to change the license if they want to.

- -

I find this action really regrettable because I like their products. -This sort of action was also why I wanted to avoid using an Elastic stack, which also had their license changed.1 -These companies do not respect their contributors and the software stack beneath they built their product on, which is actually open source (Golang, Linux, etc.).

- -

Impact on my Home Lab

- -

I am using Terraform in my home lab to manage several important things:

-
    -
  • Libvirt virtual machines
  • -
  • PowerDNS records
  • -
  • Elasticsearch configuration
  • -
- -

With Hashicorp’s anti open source move, I intend to move away from Terraform in the future. -While I will not use Hashicorp’s products for new personal projects, I will leave my current setup as-is for some time because there is no real need to quickly migrate.

- -

I might also investigate some of Terraform’s competitors, like Pulumi. -Hopefully there is a project that respects open source which I can use in the future.

- -

Update

- -

A promising fork of Terraform has been announced called OpenTF. -They intend to take part of the Cloud Native Computing Foundation, which I think is a good effort because Terraform is so important for modern cloud infrastructures.

- -

Footnotes

- -
-
    -
  1. -

    While I am still using Elasticsearch, I don’t use the rest of the Elastic stack in order to prevent a vendor lock-in. 

    -
  2. -
-
-:ET \ No newline at end of file diff --git a/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/cf/a279eddeb4b979b0e9e6555e6f42751b96af7f180d0a086c18cac528a356b7 b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/cf/a279eddeb4b979b0e9e6555e6f42751b96af7f180d0a086c18cac528a356b7 deleted file mode 100644 index 1f67617..0000000 --- a/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/cf/a279eddeb4b979b0e9e6555e6f42751b96af7f180d0a086c18cac528a356b7 +++ /dev/null @@ -1,53 +0,0 @@ -I"ĺ

When I was scaling up my home lab, I started thinking more about data management. -I hadn’t (and still haven’t) set up any form of network storage. -I have, however, set up a backup mechanism using Borg. -Still, I want to operate lots of virtual machines, and backing up each one of them separately seemed excessive. -So I started thinking, what if I just let the host machines back up the data? -After all, the amount of physical hosts I have in my home lab is unlikely to increase drastically.

- -

The Use Case for Sharing Directories

- -

I started working out this idea further. -Without network storage, I needed a way for guest VMs to access the host’s disks. -Here there are two possibilities, either expose some block device or a file system. -Creating a whole virtual disk for just the data of some VMs seemed wasteful, and from my experiences also increases backup times dramatically. -I therefore searched for a way to mount a directory from the host OS on the guest VM. -This is when I stumbled upon this blog post talking about sharing directories with virtual machines.

- -

Sharing Directories with virtio-9p

- -

virtio-9p is a way to map a directory on the host OS to a special device on the virtual machine. -In virt-manager, it looks like the following: -picture showing virt-manager configuration to map a directory to a VM -Under the hood, virtio-9p uses the 9pnet protocol. -Originally developed at Bell Labs, support for this is available in all modern Linux kernels. -If you share a directory with a VM, you can then mount it. -Below is an extract of my /etc/fstab to automatically mount the directory:

-
data	/mnt/data	9p	trans=virtio,rw	0	0
-
- -

The first argument (data) refers to the name you gave this share from the host -With the trans option we specify that this is a virtio share.

- -

Problems with virtio-9p

- -

At first I had no problems with my setup, but I am now contemplating just moving to a network storage based setup because of two problems.

- -

The first problem is that some files have suddenly changed ownership from libvirt-qemu to root. -If the file is owned by root, the guest OS can still see it, but cannot access it. -I am not entirely sure the problem lies with virtio, but I suspect it is. -For anyone experiencing this problem, I wrote a small shell script to revert ownership to the libvirt-qemu user:

-
find -printf "%h/%f %u\n"  | grep root | cut -d ' ' -f1 | xargs chown libvirt-qemu:libvirt-qemu
-
- -

Another problem that I have experienced, is guests being unable to mount the directory at all. -I have only experienced this problem once, but it was highly annoying. -To fix it, I had to reboot the whole physical machine.

- -

Alternatives

- -

virtio-9p seemed like a good idea, but as discussed, I had some problems with it. -It seems virtioFS might be a an interesting alternative as it is designed specifically for sharing directories with VMs.

- -

As for me, I will probably finally look into deploying network storage either with NFS or SSHFS.

-:ET \ No newline at end of file diff --git a/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/dd/fa550314d3f6e485fefcd71068b25447db3acbd8d5f496b19d502161a999cd b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/dd/fa550314d3f6e485fefcd71068b25447db3acbd8d5f496b19d502161a999cd deleted file mode 100644 index 0e44a03..0000000 --- a/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/dd/fa550314d3f6e485fefcd71068b25447db3acbd8d5f496b19d502161a999cd +++ /dev/null @@ -1,51 +0,0 @@ -I"ë

BorgBackup and Borgmatic have been my go-to tools to create backups for my home lab since I started creating backups. -Using Systemd Timers, I regularly create a backup every night. -I also monitor successful execution of the backup process, in case some error occurs. -However, the way I set this up resulted in not receiving notifications. -Even though it boils down to RTFM, I’d like to explain my error and how to handle errors correctly.

- -

I was using the on_error option to handle errors, like so:

- -
on_error:
-  - 'apprise --body="Error while performing backup" <URL> || true'
-
- -

However, on_error does not handle errors from the execution of before_everything and after_everything hooks. -My solution to this was moving the error handling up to the Systemd service that calls Borgmatic. -This results in the following Systemd service:

- -
[Unit]
-Description=Backup data using Borgmatic
-# Added
-OnFailure=backup-failure.service
-
-[Service]
-ExecStart=/usr/bin/borgmatic --config /root/backup.yml
-Type=oneshot
-
- -

This handles any error, be it from Borgmatic’s hooks or itself. -The backup-failure service is very simple, and just calls Apprise to send a notification:

- -
[Unit]
-Description=Send backup failure notification
-
-[Service]
-Type=oneshot
-ExecStart=apprise --body="Failed to create backup!" <URL>
-
-[Install]
-WantedBy=multi-user.target
-
- -

The Aftermath (or what I learned)

- -

Because the error handling and alerting weren’t working propertly, my backups didn’t succeed for two weeks straight. -And, of course, you only notice your backups aren’t working when you actually need them. -This is exactly what happened: my disk was full and a MariaDB database crashed as a result of that. -Actually, the whole database seemed to be corrupt and I find it worrying MariaDB does not seem to be very resilient to failures (in comparison a PostgreSQL database was able to recover automatically). -I then tried to recover the data using last night’s backup, only to find out there was no such backup. -Fortunately, I had other means to recover the data so I incurred no data loss.

- -

I already knew it is important to test backups, but I learned it is also important to test failures during backups!

-:ET \ No newline at end of file