Recently, I deployed Concourse CI because I wanted to get my feet wet with a CI/CD pipeline.
+I"±N
Recently, I deployed Concourse CI because I wanted to get my feet wet with a CI/CD pipeline.
However, I had a practical use case lying around for a long time: automatically compiling my static website and deploying it to my docker Swarm.
This took some time getting right, but the result works like a charm (source code).
+
Itâs comforting to know I donât have move a finger and my website is automatically deployed.
However, I would still like to receive some indication of whatâs happening.
And whatâs a better way to do that, than using my Apprise service to keep me up to date.
Thereâs a little snag though: I could not find any Concourse resource that does this.
Thatâs when I decided to just create it myself.
+
The Plagiarism Hunt
+
As any good computer person, I am lazy.
Iâd rather just copy someoneâs work, so thatâs what I did.
I found this GitHub repository that does the same thing but for Slack notifications.
For some reason itâs archived, but it seemed like it should work.
I actually noticed lots of repositories for Concourse resource types are archived, so not sure whatâs going on there.
+
Getting to know Concourse
+
Letâs first understand what we need to do reach our end goal of sending Apprise notifications from Concourse.
+
A Concourse pipeline takes some inputs, performs some operations on them which result in some outputs.
These inputs and outputs are called resources in Concourse.
For example, a Git repository could be a resource.
Each resource is an instance of a resource type.
A resource type therefore is simply a blueprint that can create multiple resources.
To continue the example, a resource type could be âGit repositoryâ.
+
We therefore need to create our own resource type that can send Apprise notifications.
A resource type is simply a container that includes three scripts:
-
check: check for a new version of a resource
-
in: retrieve a version of the resource
-
out: create a version of the resource
+
check: check for a new version of a resource
+
in: retrieve a version of the resource
+
out: create a version of the resource
-
As Apprise notifications are basically fire-and-forget, we will only implement the out script.
-
Writing the out script
+
+
As Apprise notifications are basically fire-and-forget, we will only implement the out script.
+
+
Writing the out script
+
The whole script can be found here, but I will explain the most important bits of it.
Note that I only use Appriseâs persistent storage solution, and not its stateless solution.
-
Concourse provides us with the working directory, which we cd to:
+
+
Concourse provides us with the working directory, which we cd to:
cd"${1}"
+
We create a timestamp, formatted in JSON, which we will use for the resourceâs new version later.
Concourse requires us to set a version for the resource, but since Apprise notifications donât have that, we use the timestamp:
First some black magic Bash to redirect file descriptors.
Not sure why this is needed, but I copied it anyways.
After that, we create a temporary file holding resourceâs parameters.
@@ -47,8 +60,9 @@ After that, we create a temporary file holding resourceâs parameters.
payload=$(mktemp /tmp/resource-in.XXXXXX)cat>"${payload}" <&0
+
We then extract the individual parameters.
-The source key contains values how the resource type was specified, while the params key specifies parameters for this specific resource.
+The source key contains values how the resource type was specified, while the params key specifies parameters for this specific resource.
apprise_host="$(jq -r'.source.host' < "${payload}")"apprise_key="$(jq -r'.source.key' < "${payload}")"
@@ -58,6 +72,7 @@ The source key contains values how the resource type was specified,
alert_tag="$(jq -r'.params.tag // null' < "${payload}")"alert_format="$(jq -r'.params.format // null' < "${payload}")"
+
We then format the different parameters using JSON:
alert_body="$(eval"printf \"${alert_body}\"" | jq -R-s .)"["${alert_title}"!="null"]&&alert_title="$(eval"printf \"${alert_title}\"" | jq -R-s .)"
@@ -65,6 +80,7 @@ The source key contains values how the resource type was specified,
["${alert_tag}"!="null"]&&alert_tag="$(eval"printf \"${alert_tag}\"" | jq -R-s .)"["${alert_format}"!="null"]&&alert_format="$(eval"printf \"${alert_format}\"" | jq -R-s .)"
+
Next, from the individual parameters we construct the final JSON message body we send to the Apprise endpoint.
body="$(cat<<EOF
{
@@ -77,26 +93,33 @@ The source key contains values how the resource type was specified,
EOF
)"
-
Before sending it just yet, we compact the JSON and remove any values that are null:
+
+
Before sending it just yet, we compact the JSON and remove any values that are null:
Here is the most important line, where we send the payload to the Apprise endpoint.
Itâs quite straight-forward.
curl -v-X POST -T /tmp/compact_body.json -H"Content-Type: application/json""${apprise_host}/notify/${apprise_key}"
+
Finally, we print the timestamp (fake version) in order to appease the Concourse gods.
echo"${timestamp}">&3
+
Building the Container
+
As said earlier, to actually use this script, we need to add it to a image.
I wonât be explaining this whole process, but the source can be found here.
The most important take-aways are these:
-
Use concourse/oci-build-task to build a image from a Dockerfile.
-
Use registry-image to push the image to an image registry.
+
Use concourse/oci-build-task to build a image from a Dockerfile.
+
Use registry-image to push the image to an image registry.
+
Using the Resource Type
+
Using our newly created resource type is surprisingly simple.
I use it for the blog you are reading right now and the pipeline definition can be found here.
Here we specify the resource type in a Concourse pipeline:
@@ -107,6 +130,7 @@ Here we specify the resource type in a Concourse pipeline:
repository:git.kun.is/pim/concourse-apprise-notifiertag:"1.1.1"
+
We simply have to tell Concourse where to find the image, and which tag we want.
Next, we instantiate the resource type to create a resource:
resources:
@@ -117,8 +141,10 @@ Next, we instantiate the resource type to create a resource:
key:concourseicon:bell
+
We simply specify the host to send Apprise notifications to.
Yeah, I even gave it a little bell because itâs cute.
+
All thatâs left to do, is actually send the notification.
Letâs see how that is done:
-name:deploy-static-website
@@ -133,17 +159,22 @@ Letâs see how that is done:
body:"Newversion:$(catversion/version)"no_get:true
+
As can be seen, the Apprise notification can be triggered when a task is executed successfully.
-We do this using the put command, which execute the out script underwater.
+We do this using the put command, which execute the out script underwater.
We set the notificationâs title and body, and send it!
The result is seen below in my Ntfy app, which Apprise forwards the message to:
+
And to finish this off, here is what it looks like in the Concourse web UI:
+
Conclusion
+
Concourseâs way of representing everything as an image/container is really interesting in my opinion.
A resource type is quite easily implemented as well, although Bash might not be the optimal way to do this.
Iâve seen some people implement it in Rust, which might be a good excuse to finally learn that language :)
+
Apart from Apprise notifications, Iâm planning on creating a resource type to deploy to a Docker swarm eventually.
This seems like a lot harder than simply sending notifications though.
:ET
\ No newline at end of file
diff --git a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/1a/68cb66b01bbf383da07f97fa2f92ac4f63f127b7ede00efab21d453837389e b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/1a/68cb66b01bbf383da07f97fa2f92ac4f63f127b7ede00efab21d453837389e
similarity index 90%
rename from src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/1a/68cb66b01bbf383da07f97fa2f92ac4f63f127b7ede00efab21d453837389e
rename to .jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/1a/68cb66b01bbf383da07f97fa2f92ac4f63f127b7ede00efab21d453837389e
index 1e583e6..dcc8b00 100644
--- a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/1a/68cb66b01bbf383da07f97fa2f92ac4f63f127b7ede00efab21d453837389e
+++ b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/1a/68cb66b01bbf383da07f97fa2f92ac4f63f127b7ede00efab21d453837389e
@@ -1,8 +1,9 @@
-I"ž8
Ever SSHâed into a freshly installed server and gotten the following annoying message?
+I"”:
Ever SSHâed into a freshly installed server and gotten the following annoying message?
The authenticity of host 'host.tld (1.2.3.4)' can't be established.
ED25519 key fingerprint is SHA256:eUXGdm1YdsMAS7vkdx6dOJdOGHdem5gQp4tadCfdLB8.
Are you sure you want to continue connecting (yes/no)?
+
Or even more annoying:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@ -20,26 +21,34 @@ Offending ED25519 key in /home/user/.ssh/known_hosts:3
ED25519 host key for 1.2.3.4 has changed and you have requested strict checking.
Host key verification failed.
+
Could it be that the programmers at OpenSSH simply like to annoy us with these confusing messages?
Maybe, but these warnings also serve as a way to notify users of a potential Man-in-the-Middle (MITM) attack.
I wonât go into the details of this problem, but I refer you to this excellent blog post.
Instead, I would like to talk about ways to solve these annoying warnings.
-
One obvious solution is simply to add each host to your known_hosts file.
+
+
One obvious solution is simply to add each host to your known_hosts file.
This works okay when managing a handful of servers, but becomes unbearable when managing many servers.
In my case, I wanted to quickly spin up virtual machines using Duncan Mac-Vicarâs Terraform Libvirt provider, without having to accept their host key before connecting.
The solution? Issuing SSH host certificates using an SSH certificate authority.
+
SSH Certificate Authorities vs. the Web
+
The idea of an SSH certificate authority (CA) is quite easy to grasp, if you understand the webâs Public Key Infrastructure (PKI).
Just like with the web, a trusted party can issue certificates that are offered when establishing a connection.
The idea is, just by trusting the trusted party, you trust every certificate they issue.
In the case of the webâs PKI, this trusted party is bundled and trusted by your browser or operating system.
However, in the case of SSH, the trusted party is you! (Okay you can also trust your own web certificate authority)
With this great power, comes great responsibility which we will abuse heavily in this article.
+
SSH Certificate Authority for Terraform
+
So, letâs start with a plan.
I want to spawn virtual machines with Terraform which which are automatically provisioned with a SSH host certificate issued by my CA.
This CA will be another host on my private network, issuing certificates over SSH.
+
Fetching the SSH Host Certificate
+
First we generate an SSH key pair in Terraform.
Below is the code for that:
resource"tls_private_key""debian"{
@@ -50,8 +59,9 @@ Below is the code for that:
private_key_pem=tls_private_key.debian.private_key_pem}
+
Now that we have an SSH key pair, we need to somehow make Terraform communicate this with the CA.
-Lucky for us, there is a way for Terraform to execute an arbitrary command with the external data feature.
+Lucky for us, there is a way for Terraform to execute an arbitrary command with the external data feature.
We call this script below:
data"external""cert"{program=["bash","${path.module}/get_cert.sh"]
@@ -65,6 +75,7 @@ We call this script below:
}}
+
These query parameters will end up in the scriptâs stdin in JSON format.
We can then read these parameters, and send them to the CA over SSH.
The result must as well be in JSON format.
@@ -81,8 +92,9 @@ The result must as well be in JSON format.
jq -n--arg cert "$CERT"'{"cert":$cert}'
+
We see that a script is called on the remote host that issues the certificate.
-This is just a simple wrapper around ssh-keygen, which you can see below.
+This is just a simple wrapper around ssh-keygen, which you can see below.
So nice, we can fetch the SSH host certificate from the CA.
We should just be able to use it right?
We can, but it brings a big annoyance with it: Terraform will fetch a new certificate every time it is run.
-This is because the external feature of Terraform is a data source.
+This is because the external feature of Terraform is a data source.
If we were to use this data source for a Terraform resource, it would need to be updated every time we run Terraform.
I have not been able to find a way to avoid fetching the certificate every time, except for writing my own resource provider which Iâd rather not.
I have, however, found a way to hack around the issue.
-
The idea is as follows: we can use Terraformâs ignore_changes to, well, ignore any changes of a resource.
-Unfortunately, we cannot use this for a data source, so we must create a glue null_resource that supports ignore_changes.
+
+
The idea is as follows: we can use Terraformâs ignore_changes to, well, ignore any changes of a resource.
+Unfortunately, we cannot use this for a data source, so we must create a glue null_resource that supports ignore_changes.
This is shown in the code snipppet below.
-We use the triggers property simply to copy the certificate in; we donât use it for itâs original purpose.
+We use the triggers property simply to copy the certificate in; we donât use it for itâs original purpose.
+
resource"null_resource""cert"{triggers={cert=data.external.cert.result["cert"]
@@ -124,23 +140,31 @@ We use the triggers property simply to copy the certificate in; we
}}
-
And voilĂ , we can now use null_resource.cert.triggers["cert"] as our certificate, that wonât trigger replacements in Terraform.
+
+
And voilĂ , we can now use null_resource.cert.triggers["cert"] as our certificate, that wonât trigger replacements in Terraform.
+
Setting the Host Certificate with Cloud-Init
+
Terraformâs Libvirt provider has native support for Cloud-Init, which is very handy.
We can give the host certificate directly to Cloud-Init and place it on the virtual machine.
-Inside the Cloud-Init configuration, we can set the ssh_keys property to do this:
+Inside the Cloud-Init configuration, we can set the ssh_keys property to do this:
I hardcoded this to ED25519 keys, because this is all I use.
+
This works perfectly, and I never have to accept host certificates from virtual machines again.
+
Caveats
+
A sharp eye might have noticed the lifecycle of these host certificates is severely lacking.
Namely, the deployed host certificates have no expiration date nore is there revocation function.
There are ways to implement these, but for my home lab I did not deem this necessary at this point.
In a more professional environment, I would suggest using Hashicorpâs Vault.
+
This project did teach me about the limits and flexibility of Terraform, so all in all a success!
All code can be found on the git repository here.
:ET
\ No newline at end of file
diff --git a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/42/129c8d8ed3adce9c8fb7301f8c4137151c8a5fce8bfdc7931f6d3eddecd1d0 b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/42/129c8d8ed3adce9c8fb7301f8c4137151c8a5fce8bfdc7931f6d3eddecd1d0
similarity index 84%
rename from src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/42/129c8d8ed3adce9c8fb7301f8c4137151c8a5fce8bfdc7931f6d3eddecd1d0
rename to .jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/42/129c8d8ed3adce9c8fb7301f8c4137151c8a5fce8bfdc7931f6d3eddecd1d0
index ff88a77..a92a14d 100644
Binary files a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/42/129c8d8ed3adce9c8fb7301f8c4137151c8a5fce8bfdc7931f6d3eddecd1d0 and b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/42/129c8d8ed3adce9c8fb7301f8c4137151c8a5fce8bfdc7931f6d3eddecd1d0 differ
diff --git a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/4b/03ed67d8bc9c0d8fd335b681a9a1d44e97b95fb060758712befbe0a20bf7ed b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/59/b34569de7b190c8504763f412280aef126498994f10a177acf84deee8117b9
similarity index 75%
rename from src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/4b/03ed67d8bc9c0d8fd335b681a9a1d44e97b95fb060758712befbe0a20bf7ed
rename to .jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/59/b34569de7b190c8504763f412280aef126498994f10a177acf84deee8117b9
index 1261b6e..08c63b3 100644
--- a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/4b/03ed67d8bc9c0d8fd335b681a9a1d44e97b95fb060758712befbe0a20bf7ed
+++ b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/59/b34569de7b190c8504763f412280aef126498994f10a177acf84deee8117b9
@@ -1,40 +1,55 @@
-I"GG
I have been meaning to write about the current state of my home lab infrastructure for a while now.
+I"ęG
I have been meaning to write about the current state of my home lab infrastructure for a while now.
Now that the most important parts are quite stable, I think the opportunity is ripe.
I expect this post to get quite long, so I might have to leave out some details along the way.
+
This post will be a starting point for future infrastructure snapshots which I can hopefully put out periodically.
That is, if there is enough worth talking about.
+
Keep an eye out for the icon, which links to the source code and configuration of anything mentioned.
Oh yeah, did I mention everything I do is open source?
+
Networking and Infrastructure Overview
+
Hardware and Operating Systems
+
Letâs start with the basics: what kind of hardware do I use for my home lab?
The most important servers are my three Gigabyte Brix GB-BLCE-4105.
Two of them have 16 GB of memory, and one 8 GB.
I named these servers as follows:
-
Atlas: because this server was going to âliftâ a lot of virtual machines.
-
Lewis: we started out with a âMaxâ server named after the Formula 1 driver Max Verstappen, but it kind of became an unmanagable behemoth without infrastructure-as-code. Our second server we subsequently named Lewis after his colleague Lewis Hamilton. Note: people around me vetoed these names and I am no F1 fan!
-
Jefke: itâs a funny Belgian name. Thatâs all.
+
Atlas: because this server was going to âliftâ a lot of virtual machines.
+
Lewis: we started out with a âMaxâ server named after the Formula 1 driver Max Verstappen, but it kind of became an unmanagable behemoth without infrastructure-as-code. Our second server we subsequently named Lewis after his colleague Lewis Hamilton. Note: people around me vetoed these names and I am no F1 fan!
+
Jefke: itâs a funny Belgian name. Thatâs all.
+
Here is a picture of them sitting in their cosy closet:
+
+
If you look look to the left, you will also see a Raspberry pi 4B.
I use this Pi to do some rudimentary monitoring whether servers and services are running.
More on this in the relevant section below.
The Pi is called Iris because itâs a messenger for the other servers.
+
I used to run Ubuntu on these systems, but I have since migrated away to Debian.
The main reasons were Canonical putting advertisements in my terminal and pushing Snap which has a proprietry backend.
Two of my servers run the newly released Debian Bookworm, while one still runs Debian Bullseye.
+
Networking
+
For networking, I wanted hypervisors and virtual machines separated by VLANs for security reasons.
The following picture shows a simplified view of the VLANs present in my home lab:
+
+
All virtual machines are connected to a virtual bridge which tags network traffic with the DMZ VLAN.
The hypervisors VLAN is used for traffic to and from the hypervisors.
Devices from the hypervisors VLAN are allowed to connect to devices in the DMZ, but not vice versa.
The hypervisors are connected to a switch using a trunk link, allows both DMZ and hypervisors traffic.
+
I realised the above design using ifupdown.
-Below is the configuration for each hypervisor, which creates a new enp3s0.30 interface with all DMZ traffic from the enp3s0 interface .
+Below is the configuration for each hypervisor, which creates a new enp3s0.30 interface with all DMZ traffic from the enp3s0 interface .
+
auto enp3s0.30
iface enp3s0.30 inet manual
iface enp3s0.30 inet6 auto
@@ -44,10 +59,13 @@ iface enp3s0.30 inet6 auto
privext 0
pre-up sysctl -w net/ipv6/conf/enp3s0.30/disable_ipv6=1
+
This configuration seems more complex than it actually is.
Most of it is to make sure the interface is not assigned an IPv4/6 address on the hypervisor host.
-The magic .30 at the end of the interface name makes this interface tagged with VLAN ID 30 (DMZ for me).
+The magic .30 at the end of the interface name makes this interface tagged with VLAN ID 30 (DMZ for me).
+
Now that we have an interface tagged for the DMZ VLAN, we can create a bridge where future virtual machines can connect to:
Just like the previous config, this is quite bloated because I donât want the interface to be assigned an IP address on the host.
-Most importantly, the bridge_ports enp3s0.30 line here makes this interface a virtual bridge for the enp3s0.30 interface.
+Most importantly, the bridge_ports enp3s0.30 line here makes this interface a virtual bridge for the enp3s0.30 interface.
+
And voilĂ , we now have a virtual bridge on each machine, where only DMZ traffic will flow.
Here I verify whether this configuration works:
Show
-
We can see that the two virtual interfaces are created, and are only assigned a MAC address and not a IP address:
-
root@atlas:~# ip a show enp3s0.30
+
+
+We can see that the two virtual interfaces are created, and are only assigned a MAC address and not a IP address:
+```text
+root@atlas:~# ip a show enp3s0.30
4: enp3s0.30@enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master dmzbr state UP group default qlen 1000
link/ether d8:5e:d3:4c:70:38 brd ff:ff:ff:ff:ff:ff
5: dmzbr: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
link/ether 4e:f7:1f:0f:ad:17 brd ff:ff:ff:ff:ff:ff
-
-
Pinging a VM from a hypervisor works:
-
root@atlas:~# ping -c1 maestro.dmz
+```
+
+Pinging a VM from a hypervisor works:
+```text
+root@atlas:~# ping -c1 maestro.dmz
PING maestro.dmz (192.168.30.8) 56(84) bytes of data.
64 bytes from 192.168.30.8 (192.168.30.8): icmp_seq=1 ttl=63 time=0.457 ms
-
-
Pinging a hypervisor from a VM does not work:
-
root@maestro:~# ping -c1 atlas.hyp
+```
+
+Pinging a hypervisor from a VM does not work:
+```text
+root@maestro:~# ping -c1 atlas.hyp
PING atlas.hyp (192.168.40.2) 56(84) bytes of data.
--- atlas.hyp ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms
-
+```
+
DNS and DHCP
+
Now that we have a working DMZ network, letâs build on it to get DNS and DHCP working.
This will enable new virtual machines to obtain a static or dynamic IP address and register their host in DNS.
This has actually been incredibly annoying due to our friend Network address translation (NAT).
NAT recap
-
Network address translation (NAT) is a function of a router which allows multiple hosts to share a single IP address.
+
+Network address translation (NAT) is a function of a router which allows multiple hosts to share a single IP address.
This is needed for IPv4, because IPv4 addresses are scarce and usually one household is only assigned a single IPv4 address.
This is one of the problems IPv6 attempts to solve (mainly by having so many IP addresses that they should never run out).
-To solve the problem for IPv4, each host in a network is assigned a private IPv4 address, which can be reused for every network.
-
Then, the router must perform address translation.
+To solve the problem for IPv4, each host in a network is assigned a private IPv4 address, which can be reused for every network.
+
+Then, the router must perform address translation.
It does this by keeping track of ports opened by hosts in its private network.
-If a packet from the internet arrives at the router for such a port, it forwards this packet to the correct host.
+If a packet from the internet arrives at the router for such a port, it forwards this packet to the correct host.
+
I would like to host my own DNS on a virtual machine (called hermes, more on VMs later) in the DMZ network.
This basically gives two problems:
+
-
The upstream DNS server will refer to the public internet-accessible IP address of our DNS server.
+
The upstream DNS server will refer to the public internet-accessible IP address of our DNS server.
This IP-address has no meaning inside the private network due to NAT and the router will reject the packet.
-
Our DNS resolves hosts to their public internet-accessible IP address.
+
Our DNS resolves hosts to their public internet-accessible IP address.
This is similar to the previous problem as the public IP address has no meaning.
+
The first problem can be remediated by overriding the location of the DNS server for hosts inside the DMZ network.
This can be achieved on my router, which uses Unbound as its recursive DNS server:
+
-
Any DNS requests to Unbound to domains in either dmz or kun.is will now be forwarded 192.168.30.7 (port 5353).
+
+
Any DNS requests to Unbound to domains in either dmz or kun.is will now be forwarded 192.168.30.7 (port 5353).
This is the virtual machine hosting my DNS.
+
The second problem can be solved at the DNS server.
We need to do some magic overriding, which dnsmasq is perfect for :
https://github.com/containrrr/shepherd
Now that we have laid out the basic networking, letâs talk virtualization.
Each of my servers are configured to run KVM virtual machines, orchestrated using Libvirt.
Configuration of the physical hypervisor servers, including KVM/Libvirt is done using Ansible.
The VMs are spun up using Terraform and the dmacvicar/libvirt Terraform provider.
+
This all isnât too exciting, except that I created a Terraform module that abstracts the Terraform Libvirt provider for my specific scenario :
module"maestro"{source="git::https://git.kun.is/home/tf-modules.git//debian"
@@ -138,77 +181,101 @@ The VMs are spun up using Terraform and the mac ="CA:FE:C0:FF:EE:08"}
+
This automatically creates a Debian virtual machines with the properties specified.
-It also sets up certificate-based SSH authentication which I talked about before.
+It also sets up certificate-based SSH authentication which I talked about before.
+
Clustering
+
With virtualization explained, letâs move up one level further.
Each of my three physical servers hosts a virtual machine running Docker, which together form a Docker Swarm.
I use Traefik as a reverse proxy which routes requests to the correct container.
+
All data is hosted on a single machine and made available to containers using NFS.
This might not be very secure (as NFS is not encrypted and no proper authentication), it is quite fast.
+
As of today, I host the following services on my Docker Swarm :
For CI / CD, I run Concourse CI in a separate VM.
This is needed, because Concourse heavily uses containers to create reproducible builds.
+
Although I should probably use it for more, I currently use my Concourse for three pipelines:
+
-
A pipeline to build this static website and create a container image of it.
+
A pipeline to build this static website and create a container image of it.
The image is then uploaded to the image registry of my Forgejo instance.
I love it when I can use stuff I previously built :)
The pipeline finally deploys this new image to the Docker Swarm .
-
A pipeline to create a Concourse resource that sends Apprise alerts (Concourse-ception?)
-
A pipeline to build a custom Fluentd image with plugins installed
+
A pipeline to create a Concourse resource that sends Apprise alerts (Concourse-ception?)
+
A pipeline to build a custom Fluentd image with plugins installed
+
Backups
+
To create backups, I use Borg.
As I keep all data on one machine, this backup process is quite simple.
In fact, all this data is stored in a single Libvirt volume.
To configure Borg with a simple declarative script, I use Borgmatic.
+
In order to back up the data inside the Libvirt volume, I create a snapshot to a file.
Then I can mount this snapshot in my file system.
The files can then be backed up while the system is still running.
It is also possible to simply back up the Libvirt image, but this takes more time and storage .
+
Monitoring and Alerting
+
The last topic I would like to talk about is monitoring and alerting.
This is something Iâm still actively improving and only just set up properly.
+
Alerting
+
For alerting, I wanted something that runs entirely on my own infrastructure.
I settled for Apprise + Ntfy.
+
Apprise is a server that is able to send notifications to dozens of services.
For application developers, it is thus only necessary to implement the Apprise API to gain access to all these services.
The Apprise API itself is also very simple.
By using Apprise, I can also easily switch to another notification service later.
Ntfy is free software made for mobile push notifications.
+
I use this alerting system in quite a lot of places in my infrastructure, for example when creating backups.
+
Uptime Monitoring
+
The first monitoring setup I created, was using Uptime Kuma.
Uptime Kuma periodically pings a service to see whether it is still running.
You can do a literal ping, test HTTP response codes, check database connectivity and much more.
I use it to check whether my services and VMs are online.
And the best part is, Uptime Kuma supports Apprise so I get push notifications on my phone whenever something goes down!
+
Metrics and Log Monitoring
+
A new monitoring system I am still in the process of deploying is focused on metrics and logs.
I plan on creating a separate blog post about this, so keep an eye out on that (for example using RSS :)).
Safe to say, it is no basic ELK stack!
+
Conclusion
+
Thatâs it for now!
Hopefully I inspired someone to build something⊠or how not to :)
:ET
\ No newline at end of file
diff --git a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/95/3072d9307fff41fb3452f89e0c9cc99bcf4bcbf90a5e819a6827177e476177 b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/95/3072d9307fff41fb3452f89e0c9cc99bcf4bcbf90a5e819a6827177e476177
similarity index 70%
rename from src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/95/3072d9307fff41fb3452f89e0c9cc99bcf4bcbf90a5e819a6827177e476177
rename to .jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/95/3072d9307fff41fb3452f89e0c9cc99bcf4bcbf90a5e819a6827177e476177
index 924975d..f206364 100644
--- a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/95/3072d9307fff41fb3452f89e0c9cc99bcf4bcbf90a5e819a6827177e476177
+++ b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/95/3072d9307fff41fb3452f89e0c9cc99bcf4bcbf90a5e819a6827177e476177
@@ -1,32 +1,40 @@
-I"Û
Previously, I have used Prometheusâ node_exporter to monitor the memory usage of my servers.
However, I am currently in the process of moving away from Prometheus to a new Monioring stack.
While I understand the advantages, I felt like Prometheusâ pull architecture does not scale nicely.
Everytime I spin up a new machine, I would have to centrally change Prometheusâ configuration in order for it to query the new server.
+
In order to collect metrics from my servers, I am now using Fluent Bit.
I love Fluent Bitâs way of configuration which I can easily express as code and automate, its focus on effiency and being vendor agnostic.
-However, I have stumbled upon one, in my opinion, big issue with Fluent Bit: its mem plugin to monitor memory usage is completely useless.
+However, I have stumbled upon one, in my opinion, big issue with Fluent Bit: its mem plugin to monitor memory usage is completely useless.
In this post I will go over the problem and my temporary solution.
-
The Problem with Fluent Bitâs mem Plugin
-
As can be seen in the documentation, Fluent Bitâs mem input plugin exposes a few metrics regarding memory usage which should be self-explaining: Mem.total, Mem.used, Mem.free, Swap.total, Swap.used and Swap.free.
-The problem is that Mem.used and Mem.free do not accurately reflect the machineâs actual memory usage.
+
+
The Problem with Fluent Bitâs mem Plugin
+
+
As can be seen in the documentation, Fluent Bitâs mem input plugin exposes a few metrics regarding memory usage which should be self-explaining: Mem.total, Mem.used, Mem.free, Swap.total, Swap.used and Swap.free.
+The problem is that Mem.used and Mem.free do not accurately reflect the machineâs actual memory usage.
This is because these metrics include caches and buffers, which can be reclaimed by other processes if needed.
Most tools reporting memory usage therefore include an additional metric that specifices the memory available on the system.
-For example, the command free -m reports the following data on my laptop:
+For example, the command free -m reports the following data on my laptop:
total used free shared buff/cache available
Mem: 15864 3728 7334 518 5647 12136
Swap: 2383 663 1720
-
Notice that the available memory is more than free memory.
+
+
Notice that the available memory is more than free memory.
+
While the issue is known (see this and this link), it is unfortunately not yet fixed.
+
A Temporary Solution
+
The issues I linked previously provide stand-alone plugins that fix the problem, which will hopefully be merged in the official project at some point.
-However, I didnât want to install another plugin so I used Fluent Bitâs exec input plugin and the free Linux command to query memory usage like so:
+However, I didnât want to install another plugin so I used Fluent Bitâs exec input plugin and the free Linux command to query memory usage like so:
To interpret the commandâs output, I created the following filter:
[FILTER]
Nameparser
@@ -34,6 +42,7 @@ However, I didnât want to install another plugin so I used Fluent Bitâs Key_Name execParserfree
+
Lastly, I created the following parser (warning: regex shitcode incoming):
[PARSER]
Namefree
@@ -41,14 +50,17 @@ However, I didnât want to install another plugin so I used Fluent Bitâs Regex ^Mem:\s+(?<mem_total>\d+)\s+(?<mem_used>\d+)\s+(?<mem_free>\d+)\s+(?<mem_shared>\d+)\s+(?<mem_buff_cache>\d+)\s+(?<mem_available>\d+) Swap:\s+(?<swap_total>\d+)\s+(?<swap_used>\d+)\s+(?<swap_free>\d+)
Typesmem_total:integermem_used:integermem_free:integermem_shared:integermem_buff_cache:integermem_available:integerswap_total:integerswap_used:integer
-
With this configuration, you can use the mem_available metric to get accurate memory usage in Fluent Bit.
+
+
With this configuration, you can use the mem_available metric to get accurate memory usage in Fluent Bit.
+
Conclusion
-
Letâs hope Fluent Bitâs mem input plugin is improved upon soon so this hacky solution is not needed.
+
+
Letâs hope Fluent Bitâs mem input plugin is improved upon soon so this hacky solution is not needed.
I also intend to document my new monitoring pipeline, which at the moment consists of:
-
Fluent Bit
-
Fluentd
-
Elasticsearch
-
Grafana
+
Fluent Bit
+
Fluentd
+
Elasticsearch
+
Grafana
:ET
\ No newline at end of file
diff --git a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/b4/97aba66d754cde21ccf57911a944a2c6cdecdcf1af5627383af00a22c1698a b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/b4/97aba66d754cde21ccf57911a944a2c6cdecdcf1af5627383af00a22c1698a
similarity index 73%
rename from src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/b4/97aba66d754cde21ccf57911a944a2c6cdecdcf1af5627383af00a22c1698a
rename to .jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/b4/97aba66d754cde21ccf57911a944a2c6cdecdcf1af5627383af00a22c1698a
index 0febaba..fc8d47a 100644
--- a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/b4/97aba66d754cde21ccf57911a944a2c6cdecdcf1af5627383af00a22c1698a
+++ b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/b4/97aba66d754cde21ccf57911a944a2c6cdecdcf1af5627383af00a22c1698a
@@ -1,38 +1,49 @@
-I":
Already a week ago, Hashicorp announced it would change the license on almost all its projects.
Unlike their previous license, which was the Mozilla Public License 2.0, their new license is no longer truly open source.
It is called the Business Source Licenseâą and restricts use of their software for competitors.
In their own words:
-
Vendors who provide competitive services built on our community products will no longer be able to incorporate future releases, bug fixes, or security patches contributed to our products.
+
Vendors who provide competitive services built on our community products will no longer be able to incorporate future releases, bug fixes, or security patches contributed to our products.
+
I found a great article by MeshedInsights that names this behaviour the ârights ratchet modelâ.
They define a script start-ups use to garner the interest of open source enthusiasts but eventually turn their back on them for profit.
The reason why Hashicorp can do this, is because contributors signed a copyright license agreement (CLA).
This agreement transfers the copyright of contributorsâ code to Hashicorp, allowing them to change the license if they want to.
+
I find this action really regrettable because I like their products.
-This sort of action was also why I wanted to avoid using an Elastic stack, which also had their license changed.1
+This sort of action was also why I wanted to avoid using an Elastic stack, which also had their license changed.1
These companies do not respect their contributors and the software stack beneath they built their product on, which is actually open source (Golang, Linux, etc.).
+
Impact on my Home Lab
+
I am using Terraform in my home lab to manage several important things:
-
Libvirt virtual machines
-
PowerDNS records
-
Elasticsearch configuration
+
Libvirt virtual machines
+
PowerDNS records
+
Elasticsearch configuration
+
With Hashicorpâs anti open source move, I intend to move away from Terraform in the future.
While I will not use Hashicorpâs products for new personal projects, I will leave my current setup as-is for some time because there is no real need to quickly migrate.
+
I might also investigate some of Terraformâs competitors, like Pulumi.
Hopefully there is a project that respects open source which I can use in the future.
+
Update
+
A promising fork of Terraform has been announced called OpenTF.
They intend to take part of the Cloud Native Computing Foundation, which I think is a good effort because Terraform is so important for modern cloud infrastructures.
While I am still using Elasticsearch, I donât use the rest of the Elastic stack in order to prevent a vendor lock-in. ↩
+
+
+
:ET
\ No newline at end of file
diff --git a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/cf/a279eddeb4b979b0e9e6555e6f42751b96af7f180d0a086c18cac528a356b7 b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/cf/a279eddeb4b979b0e9e6555e6f42751b96af7f180d0a086c18cac528a356b7
similarity index 77%
rename from src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/cf/a279eddeb4b979b0e9e6555e6f42751b96af7f180d0a086c18cac528a356b7
rename to .jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/cf/a279eddeb4b979b0e9e6555e6f42751b96af7f180d0a086c18cac528a356b7
index 2001e9f..1f67617 100644
--- a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/cf/a279eddeb4b979b0e9e6555e6f42751b96af7f180d0a086c18cac528a356b7
+++ b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/cf/a279eddeb4b979b0e9e6555e6f42751b96af7f180d0a086c18cac528a356b7
@@ -1,41 +1,53 @@
-I"q
When I was scaling up my home lab, I started thinking more about data management.
+I"ć
When I was scaling up my home lab, I started thinking more about data management.
I hadnât (and still havenât) set up any form of network storage.
I have, however, set up a backup mechanism using Borg.
Still, I want to operate lots of virtual machines, and backing up each one of them separately seemed excessive.
So I started thinking, what if I just let the host machines back up the data?
After all, the amount of physical hosts I have in my home lab is unlikely to increase drastically.
+
The Use Case for Sharing Directories
+
I started working out this idea further.
Without network storage, I needed a way for guest VMs to access the hostâs disks.
Here there are two possibilities, either expose some block device or a file system.
Creating a whole virtual disk for just the data of some VMs seemed wasteful, and from my experiences also increases backup times dramatically.
I therefore searched for a way to mount a directory from the host OS on the guest VM.
This is when I stumbled upon this blog post talking about sharing directories with virtual machines.
+
Sharing Directories with virtio-9p
+
virtio-9p is a way to map a directory on the host OS to a special device on the virtual machine.
-In virt-manager, it looks like the following:
+In virt-manager, it looks like the following:
Under the hood, virtio-9p uses the 9pnet protocol.
Originally developed at Bell Labs, support for this is available in all modern Linux kernels.
If you share a directory with a VM, you can then mount it.
-Below is an extract of my /etc/fstab to automatically mount the directory:
+Below is an extract of my /etc/fstab to automatically mount the directory:
data /mnt/data 9p trans=virtio,rw 0 0
-
The first argument (data) refers to the name you gave this share from the host
-With the trans option we specify that this is a virtio share.
+
+
The first argument (data) refers to the name you gave this share from the host
+With the trans option we specify that this is a virtio share.
+
Problems with virtio-9p
+
At first I had no problems with my setup, but I am now contemplating just moving to a network storage based setup because of two problems.
-
The first problem is that some files have suddenly changed ownership from libvirt-qemu to root.
-If the file is owned by root, the guest OS can still see it, but cannot access it.
+
+
The first problem is that some files have suddenly changed ownership from libvirt-qemu to root.
+If the file is owned by root, the guest OS can still see it, but cannot access it.
I am not entirely sure the problem lies with virtio, but I suspect it is.
-For anyone experiencing this problem, I wrote a small shell script to revert ownership to the libvirt-qemu user:
+For anyone experiencing this problem, I wrote a small shell script to revert ownership to the libvirt-qemu user:
Another problem that I have experienced, is guests being unable to mount the directory at all.
I have only experienced this problem once, but it was highly annoying.
To fix it, I had to reboot the whole physical machine.
+
Alternatives
+
virtio-9p seemed like a good idea, but as discussed, I had some problems with it.
It seems virtioFS might be a an interesting alternative as it is designed specifically for sharing directories with VMs.
+
As for me, I will probably finally look into deploying network storage either with NFS or SSHFS.
:ET
\ No newline at end of file
diff --git a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/dd/fa550314d3f6e485fefcd71068b25447db3acbd8d5f496b19d502161a999cd b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/dd/fa550314d3f6e485fefcd71068b25447db3acbd8d5f496b19d502161a999cd
similarity index 84%
rename from src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/dd/fa550314d3f6e485fefcd71068b25447db3acbd8d5f496b19d502161a999cd
rename to .jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/dd/fa550314d3f6e485fefcd71068b25447db3acbd8d5f496b19d502161a999cd
index e1d86bf..0e44a03 100644
--- a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/dd/fa550314d3f6e485fefcd71068b25447db3acbd8d5f496b19d502161a999cd
+++ b/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/dd/fa550314d3f6e485fefcd71068b25447db3acbd8d5f496b19d502161a999cd
@@ -1,15 +1,19 @@
-I"
BorgBackup and Borgmatic have been my go-to tools to create backups for my home lab since I started creating backups.
+I"ë
BorgBackup and Borgmatic have been my go-to tools to create backups for my home lab since I started creating backups.
Using Systemd Timers, I regularly create a backup every night.
I also monitor successful execution of the backup process, in case some error occurs.
However, the way I set this up resulted in not receiving notifications.
Even though it boils down to RTFM, Iâd like to explain my error and how to handle errors correctly.
-
I was using the on_error option to handle errors, like so:
+
+
I was using the on_error option to handle errors, like so:
However, on_error does not handle errors from the execution of before_everything and after_everything hooks.
+
+
However, on_error does not handle errors from the execution of before_everything and after_everything hooks.
My solution to this was moving the error handling up to the Systemd service that calls Borgmatic.
This results in the following Systemd service:
+
[Unit]Description=Backup data using Borgmatic
# Added
@@ -19,8 +23,10 @@ This results in the following Systemd service:
ExecStart=/usr/bin/borgmatic --config /root/backup.yml
Type=oneshot
+
This handles any error, be it from Borgmaticâs hooks or itself.
-The backup-failure service is very simple, and just calls Apprise to send a notification:
+The backup-failure service is very simple, and just calls Apprise to send a notification:
+
[Unit]Description=Send backup failure notification
@@ -31,12 +37,15 @@ The backup-failure service is very simple, and just calls Apprise t
[Install]WantedBy=multi-user.target
+
The Aftermath (or what I learned)
+
Because the error handling and alerting werenât working propertly, my backups didnât succeed for two weeks straight.
And, of course, you only notice your backups arenât working when you actually need them.
This is exactly what happened: my disk was full and a MariaDB database crashed as a result of that.
Actually, the whole database seemed to be corrupt and I find it worrying MariaDB does not seem to be very resilient to failures (in comparison a PostgreSQL database was able to recover automatically).
I then tried to recover the data using last nightâs backup, only to find out there was no such backup.
Fortunately, I had other means to recover the data so I incurred no data loss.
+
I already knew it is important to test backups, but I learned it is also important to test failures during backups!
Ever SSHâed into a freshly installed server and gotten the following annoying message?
-
The authenticity of host 'host.tld (1.2.3.4)' can't be established.
-ED25519 key fingerprint is SHA256:eUXGdm1YdsMAS7vkdx6dOJdOGHdem5gQp4tadCfdLB8.
-Are you sure you want to continue connecting (yes/no)?
-
-:ET
\ No newline at end of file
diff --git a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/4d/1f7103582bf7a375b7e199c115b152c6960d408bde12ec01191a8b470bc37d b/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/4d/1f7103582bf7a375b7e199c115b152c6960d408bde12ec01191a8b470bc37d
deleted file mode 100644
index 28af8b8..0000000
--- a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/4d/1f7103582bf7a375b7e199c115b152c6960d408bde12ec01191a8b470bc37d
+++ /dev/null
@@ -1,2 +0,0 @@
-I"U
-:ET
\ No newline at end of file
diff --git a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/5a/79890c75774971d86fa3a9500f60acb485aab92f38b4d59116b26fa5b065a5 b/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/5a/79890c75774971d86fa3a9500f60acb485aab92f38b4d59116b26fa5b065a5
deleted file mode 100644
index e68b7a4..0000000
--- a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/5a/79890c75774971d86fa3a9500f60acb485aab92f38b4d59116b26fa5b065a5
+++ /dev/null
@@ -1,6 +0,0 @@
-I"f
BorgBackup and Borgmatic have been my go-to tools to create backups for my home lab since I started creating backups.
-Using Systemd Timers, I regularly create a backup every night.
-I also monitor successful execution of the backup process, in case some error occurs.
-However, the way I set this up resulted in not receiving notifications.
-Even though it boils down to RTFM, Iâd like to explain my error and how to handle errors correctly.
-:ET
\ No newline at end of file
diff --git a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/5c/8fa8240a1640578f5aa674d4f7f9445d7b22817fc1803dbddfc2b5fe30da33 b/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/5c/8fa8240a1640578f5aa674d4f7f9445d7b22817fc1803dbddfc2b5fe30da33
deleted file mode 100644
index d45a833..0000000
--- a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/5c/8fa8240a1640578f5aa674d4f7f9445d7b22817fc1803dbddfc2b5fe30da33
+++ /dev/null
@@ -1,7 +0,0 @@
-I"1
When I was scaling up my home lab, I started thinking more about data management.
-I hadnât (and still havenât) set up any form of network storage.
-I have, however, set up a backup mechanism using Borg.
-Still, I want to operate lots of virtual machines, and backing up each one of them separately seemed excessive.
-So I started thinking, what if I just let the host machines back up the data?
-After all, the amount of physical hosts I have in my home lab is unlikely to increase drastically.
-:ET
\ No newline at end of file
diff --git a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/91/5e385b14adac9ab2123fd9d2352da7c5bb04d9882563beb3316a61554ca575 b/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/91/5e385b14adac9ab2123fd9d2352da7c5bb04d9882563beb3316a61554ca575
deleted file mode 100644
index 9a3212d..0000000
--- a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/91/5e385b14adac9ab2123fd9d2352da7c5bb04d9882563beb3316a61554ca575
+++ /dev/null
@@ -1,4 +0,0 @@
-I"
I have been meaning to write about the current state of my home lab infrastructure for a while now.
-Now that the most important parts are quite stable, I think the opportunity is ripe.
-I expect this post to get quite long, so I might have to leave out some details along the way.
-:ET
\ No newline at end of file
diff --git a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/a0/3c83cf411782db1c50d9126663f49970c6f45dc99e8db7ec0da9af3b56d618 b/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/a0/3c83cf411782db1c50d9126663f49970c6f45dc99e8db7ec0da9af3b56d618
deleted file mode 100644
index 50e89b2..0000000
--- a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/a0/3c83cf411782db1c50d9126663f49970c6f45dc99e8db7ec0da9af3b56d618
+++ /dev/null
@@ -1,4 +0,0 @@
-I"
Recently, I deployed Concourse CI because I wanted to get my feet wet with a CI/CD pipeline.
-However, I had a practical use case lying around for a long time: automatically compiling my static website and deploying it to my docker Swarm.
-This took some time getting right, but the result works like a charm (source code).
-:ET
\ No newline at end of file
diff --git a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/a3/2451b46abf31bffd4ef16c4fba342105d08b449ffcd5d8857fac68c8d666ac b/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/a3/2451b46abf31bffd4ef16c4fba342105d08b449ffcd5d8857fac68c8d666ac
deleted file mode 100644
index 8a31833..0000000
--- a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/a3/2451b46abf31bffd4ef16c4fba342105d08b449ffcd5d8857fac68c8d666ac
+++ /dev/null
@@ -1,5 +0,0 @@
-I"
For months, Iâve had a peculiar problem with my laptop: once in a while, seemingly without reason, my laptop screen would freeze.
-This only happened on my laptop screen, and not on an external monitor.
-I had kind of learned to live with it as I couldnât find a solution online.
-The only remedy I had was reloading my window manager, which would often unfreeze the screen.
-:ET
\ No newline at end of file
diff --git a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/b0/5eaa7d51955d5eb4c0531861b6ceff386cf59937bb012e007e75bb17be3a70 b/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/b0/5eaa7d51955d5eb4c0531861b6ceff386cf59937bb012e007e75bb17be3a70
deleted file mode 100644
index 9a2b2e4..0000000
--- a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/b0/5eaa7d51955d5eb4c0531861b6ceff386cf59937bb012e007e75bb17be3a70
+++ /dev/null
@@ -1,9 +0,0 @@
-I"ł
Finally, after several months this website is up and running again!
-
My homelab has completely changed, but the reason why it initially went offline is because of my failing CI installation.
-I was using Concourse CI which I was initially interested in due to the reproducible nature of its builds using containers.
-However, for some reason pipelines were sporadically getting stuck when I reboot the virtual machine it was running on.
-The fix was very annoying: I had to re-create the pipelines manually (which feels very backwards for a CI/CD system!)
-Additionally, my virtual machine setup back then was also quite fragile and I decided to get rid of that as well.
-
I have learned that having an escape hatch to deploy something is probably a good idea đ
-Expect a new overview of my homelab soon, in the same vein as this post from last year!
-:ET
\ No newline at end of file
diff --git a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/ca/fa91a891b326d3785106ef39feb14d15f250d4e9053e26bc4a59df1be95175 b/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/ca/fa91a891b326d3785106ef39feb14d15f250d4e9053e26bc4a59df1be95175
deleted file mode 100644
index 938b887..0000000
--- a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/ca/fa91a891b326d3785106ef39feb14d15f250d4e9053e26bc4a59df1be95175
+++ /dev/null
@@ -1,2 +0,0 @@
-I"H
Here I might post some personally identifiable information.
-:ET
\ No newline at end of file
diff --git a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/de/3e2dabf1802e61ca9f839af6ae4614ee4dd39a3db773a8ac7c3b1dcc8d5b5c b/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/de/3e2dabf1802e61ca9f839af6ae4614ee4dd39a3db773a8ac7c3b1dcc8d5b5c
deleted file mode 100644
index 9ab03e1..0000000
--- a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/de/3e2dabf1802e61ca9f839af6ae4614ee4dd39a3db773a8ac7c3b1dcc8d5b5c
+++ /dev/null
@@ -1,2 +0,0 @@
-I"P
Finally, after several months this website is up and running again!
-:ET
\ No newline at end of file
diff --git a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/e3/b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 b/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/e3/b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
deleted file mode 100644
index 177adbd..0000000
Binary files a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/e3/b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 and /dev/null differ
diff --git a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/ee/52e5fa863848f5119ac696d0af443c67f63ca3732d79fbd250d0c2ebb8bbb8 b/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/ee/52e5fa863848f5119ac696d0af443c67f63ca3732d79fbd250d0c2ebb8bbb8
deleted file mode 100644
index 8796725..0000000
--- a/src/.jekyll-cache/Jekyll/Cache/Jekyll--Converters--Markdown/ee/52e5fa863848f5119ac696d0af443c67f63ca3732d79fbd250d0c2ebb8bbb8
+++ /dev/null
@@ -1,5 +0,0 @@
-I"ï
Previously, I have used Prometheusâ node_exporter to monitor the memory usage of my servers.
-However, I am currently in the process of moving away from Prometheus to a new Monioring stack.
-While I understand the advantages, I felt like Prometheusâ pull architecture does not scale nicely.
-Everytime I spin up a new machine, I would have to centrally change Prometheusâ configuration in order for it to query the new server.
-:ET
\ No newline at end of file
diff --git a/src/_data/menus.yml b/src/_data/menus.yml
index 78daf1c..7170bea 100644
--- a/src/_data/menus.yml
+++ b/src/_data/menus.yml
@@ -8,7 +8,15 @@
- title: about
url: /about/
- external: false # set true if you using external link, see below
+ external: false
+
+- title: ideas
+ url: /ideas/
+ external: false
+
+- title: now
+ url: /now/
+ external: false
# Example:
# - title: github
diff --git a/src/_layouts/404.html b/src/_layouts/404.html
index 77bbabd..85df97a 100644
--- a/src/_layouts/404.html
+++ b/src/_layouts/404.html
@@ -33,7 +33,7 @@ layout: compress
Here I might post some personally identifiable information.
diff --git a/src/_site/ansible-edit-grub/index.html b/src/_site/ansible-edit-grub/index.html
deleted file mode 100644
index d1c7883..0000000
--- a/src/_site/ansible-edit-grub/index.html
+++ /dev/null
@@ -1,18 +0,0 @@
- Using Ansible to alter Kernel Parameters - Pim Kunis
Using Ansible to alter Kernel Parameters
Pim KunisPim Kunis
For months, Iâve had a peculiar problem with my laptop: once in a while, seemingly without reason, my laptop screen would freeze. This only happened on my laptop screen, and not on an external monitor. I had kind of learned to live with it as I couldnât find a solution online. The only remedy I had was reloading my window manager, which would often unfreeze the screen.
For months, Iâve had a peculiar problem with my laptop: once in a while, seemingly without reason, my laptop screen would freeze. This only happened on my laptop screen, and not on an external monitor. I had kind of learned to live with it as I couldnât find a solution online. The only remedy I had was reloading my window manager, which would often unfreeze the screen.
Yesterday I tried Googling once more and I actually found a thread about it on the Arch Linux forums! They talk about the same laptop model, the Lenovo ThinkPad x260, having the problem. Fortunately, they also propose a temporary fix.
Trying the Fix
Apparently, a problem with the Panel Self Refresh (PSR) feature of Intel iGPUs is the culprit. According to the Linux source code, PSR enables the display to go into a lower standby mode when the sytem is idle but the screen is in use. These lower standby modes can reduce power usage of your device when idling.
This all seems useful, except when it makes your screen freeze! The proposed fix disables the PSR feature entirely. To do this, we need to change a parameter to the Intel Graphics Linux Kernel Module (LKM). The LKM for Intel Graphics is called i915. There are multiple ways to change kernel parameters, but I chose to edit my Grub configuration.
First, I wanted to test whether it actually works. When booting into my Linux partition via Grub, you can press e to edit the Grub definition. Somewhere there, you can find the linux command which specifies to boot Linux and how to do that. I simply appended the option i915.enable_psr=0 to this line. After rebooting, I noticed my screen no longer freezes! Success!
Persisting the Fix
To make the change permanent, we need to permanently change Grubâs configuration. One way to do this, is by changing Grubâs defaults in /etc/default/grub. Namely, the GRUB_CMDLINE_LINUX_DEFAULT option specifies what options Grub should pass to the Linux kernel by default. For me, this is a nice solution as the problem exists for both Linux OSes I have installed. I changed this option to:
Next, I wanted to automate this solution using Ansible. This turned out to be quite easy, as the Grub configuration looks a bit like an ini file (maybe it is?):
Lately, I have been learning a bit of NixOS with the intention of replacing my current setup. Compared to Ansible, applying this fix is a breeze on NixOS:
{
- boot.kernelParams=["i915.enable_psr=0"];
-}
-
Thatâs it, yep.
Conclusion
It turned out to be quite easy to change Linux kernel parameters using Ansible. Maybe some kernel gurus have better ways to change parameters, but this works for me for now.
As a sidenote, I started reading a bit more about NixOS and realised that it can solve issues like these much more nicely than Ansible does. I might replace my OS with NixOS some day, if I manage to rewrite my Ansible for it.
BorgBackup and Borgmatic have been my go-to tools to create backups for my home lab since I started creating backups. Using Systemd Timers, I regularly create a backup every night. I also monitor successful execution of the backup process, in case some error occurs. However, the way I set this up resulted in not receiving notifications. Even though it boils down to RTFM, Iâd like to explain my error and how to handle errors correctly.
BorgBackup and Borgmatic have been my go-to tools to create backups for my home lab since I started creating backups. Using Systemd Timers, I regularly create a backup every night. I also monitor successful execution of the backup process, in case some error occurs. However, the way I set this up resulted in not receiving notifications. Even though it boils down to RTFM, Iâd like to explain my error and how to handle errors correctly.
I was using the on_error option to handle errors, like so:
However, on_error does not handle errors from the execution of before_everything and after_everything hooks. My solution to this was moving the error handling up to the Systemd service that calls Borgmatic. This results in the following Systemd service:
[Unit]
-Description=Backup data using Borgmatic
-# Added
-OnFailure=backup-failure.service
-
-[Service]
-ExecStart=/usr/bin/borgmatic --config /root/backup.yml
-Type=oneshot
-
This handles any error, be it from Borgmaticâs hooks or itself. The backup-failure service is very simple, and just calls Apprise to send a notification:
Because the error handling and alerting werenât working propertly, my backups didnât succeed for two weeks straight. And, of course, you only notice your backups arenât working when you actually need them. This is exactly what happened: my disk was full and a MariaDB database crashed as a result of that. Actually, the whole database seemed to be corrupt and I find it worrying MariaDB does not seem to be very resilient to failures (in comparison a PostgreSQL database was able to recover automatically). I then tried to recover the data using last nightâs backup, only to find out there was no such backup. Fortunately, I had other means to recover the data so I incurred no data loss.
I already knew it is important to test backups, but I learned it is also important to test failures during backups!
Recently, I deployed Concourse CI because I wanted to get my feet wet with a CI/CD pipeline. However, I had a practical use case lying around for a long time: automatically compiling my static website and deploying it to my docker Swarm. This took some time getting right, but the result works like a charm (source code).
Recently, I deployed Concourse CI because I wanted to get my feet wet with a CI/CD pipeline. However, I had a practical use case lying around for a long time: automatically compiling my static website and deploying it to my docker Swarm. This took some time getting right, but the result works like a charm (source code).
Itâs comforting to know I donât have move a finger and my website is automatically deployed. However, I would still like to receive some indication of whatâs happening. And whatâs a better way to do that, than using my Apprise service to keep me up to date. Thereâs a little snag though: I could not find any Concourse resource that does this. Thatâs when I decided to just create it myself.
The Plagiarism Hunt
As any good computer person, I am lazy. Iâd rather just copy someoneâs work, so thatâs what I did. I found this GitHub repository that does the same thing but for Slack notifications. For some reason itâs archived, but it seemed like it should work. I actually noticed lots of repositories for Concourse resource types are archived, so not sure whatâs going on there.
Getting to know Concourse
Letâs first understand what we need to do reach our end goal of sending Apprise notifications from Concourse.
A Concourse pipeline takes some inputs, performs some operations on them which result in some outputs. These inputs and outputs are called resources in Concourse. For example, a Git repository could be a resource. Each resource is an instance of a resource type. A resource type therefore is simply a blueprint that can create multiple resources. To continue the example, a resource type could be âGit repositoryâ.
We therefore need to create our own resource type that can send Apprise notifications. A resource type is simply a container that includes three scripts:
check: check for a new version of a resource
in: retrieve a version of the resource
out: create a version of the resource
As Apprise notifications are basically fire-and-forget, we will only implement the out script.
Writing the out script
The whole script can be found here, but I will explain the most important bits of it. Note that I only use Appriseâs persistent storage solution, and not its stateless solution.
Concourse provides us with the working directory, which we cd to:
cd"${1}"
-
We create a timestamp, formatted in JSON, which we will use for the resourceâs new version later. Concourse requires us to set a version for the resource, but since Apprise notifications donât have that, we use the timestamp:
First some black magic Bash to redirect file descriptors. Not sure why this is needed, but I copied it anyways. After that, we create a temporary file holding resourceâs parameters.
We then extract the individual parameters. The source key contains values how the resource type was specified, while the params key specifies parameters for this specific resource.
Here is the most important line, where we send the payload to the Apprise endpoint. Itâs quite straight-forward.
curl -v-X POST -T /tmp/compact_body.json -H"Content-Type: application/json""${apprise_host}/notify/${apprise_key}"
-
Finally, we print the timestamp (fake version) in order to appease the Concourse gods.
echo"${timestamp}">&3
-
Building the Container
As said earlier, to actually use this script, we need to add it to a image. I wonât be explaining this whole process, but the source can be found here. The most important take-aways are these:
Use concourse/oci-build-task to build a image from a Dockerfile.
Use registry-image to push the image to an image registry.
Using the Resource Type
Using our newly created resource type is surprisingly simple. I use it for the blog you are reading right now and the pipeline definition can be found here. Here we specify the resource type in a Concourse pipeline:
As can be seen, the Apprise notification can be triggered when a task is executed successfully. We do this using the put command, which execute the out script underwater. We set the notificationâs title and body, and send it! The result is seen below in my Ntfy app, which Apprise forwards the message to:
And to finish this off, here is what it looks like in the Concourse web UI:
Conclusion
Concourseâs way of representing everything as an image/container is really interesting in my opinion. A resource type is quite easily implemented as well, although Bash might not be the optimal way to do this. Iâve seen some people implement it in Rust, which might be a good excuse to finally learn that language :)
Apart from Apprise notifications, Iâm planning on creating a resource type to deploy to a Docker swarm eventually. This seems like a lot harder than simply sending notifications though.
diff --git a/src/_site/concourse-apprise-notifier/ntfy.png b/src/_site/concourse-apprise-notifier/ntfy.png
deleted file mode 100644
index 3b47f51..0000000
Binary files a/src/_site/concourse-apprise-notifier/ntfy.png and /dev/null differ
diff --git a/src/_site/concourse-apprise-notifier/pipeline.png b/src/_site/concourse-apprise-notifier/pipeline.png
deleted file mode 100644
index 68d0d14..0000000
Binary files a/src/_site/concourse-apprise-notifier/pipeline.png and /dev/null differ
diff --git a/src/_site/feed.xml b/src/_site/feed.xml
deleted file mode 100644
index 3c3ea5d..0000000
--- a/src/_site/feed.xml
+++ /dev/null
@@ -1,738 +0,0 @@
-Jekyll2024-04-26T10:58:13+02:00http://localhost:4000/feed.xmlPim KunisA pig's gotta flyPim KunisItâs alive!2024-04-21T10:02:00+02:002024-04-21T10:02:00+02:00http://localhost:4000/its-aliveFinally, after several months this website is up and running again!
-
My homelab has completely changed, but the reason why it initially went offline is because of my failing CI installation.
-I was using Concourse CI which I was initially interested in due to the reproducible nature of its builds using containers.
-However, for some reason pipelines were sporadically getting stuck when I reboot the virtual machine it was running on.
-The fix was very annoying: I had to re-create the pipelines manually (which feels very backwards for a CI/CD system!)
-Additionally, my virtual machine setup back then was also quite fragile and I decided to get rid of that as well.
-
I have learned that having an escape hatch to deploy something is probably a good idea đ
-Expect a new overview of my homelab soon, in the same vein as this post from last year!
]]>Pim KunisHome Lab Infrastructure Snapshot August 20232023-08-27T22:23:00+02:002023-08-27T22:23:00+02:00http://localhost:4000/infrastructure-snapshotI have been meaning to write about the current state of my home lab infrastructure for a while now.
-Now that the most important parts are quite stable, I think the opportunity is ripe.
-I expect this post to get quite long, so I might have to leave out some details along the way.
-
This post will be a starting point for future infrastructure snapshots which I can hopefully put out periodically.
-That is, if there is enough worth talking about.
-
Keep an eye out for the icon, which links to the source code and configuration of anything mentioned.
-Oh yeah, did I mention everything I do is open source?
-
Networking and Infrastructure Overview
-
Hardware and Operating Systems
-
Letâs start with the basics: what kind of hardware do I use for my home lab?
-The most important servers are my three Gigabyte Brix GB-BLCE-4105.
-Two of them have 16 GB of memory, and one 8 GB.
-I named these servers as follows:
-
-
Atlas: because this server was going to âliftâ a lot of virtual machines.
-
Lewis: we started out with a âMaxâ server named after the Formula 1 driver Max Verstappen, but it kind of became an unmanagable behemoth without infrastructure-as-code. Our second server we subsequently named Lewis after his colleague Lewis Hamilton. Note: people around me vetoed these names and I am no F1 fan!
-
Jefke: itâs a funny Belgian name. Thatâs all.
-
-
Here is a picture of them sitting in their cosy closet:
-
-
If you look look to the left, you will also see a Raspberry pi 4B.
-I use this Pi to do some rudimentary monitoring whether servers and services are running.
-More on this in the relevant section below.
-The Pi is called Iris because itâs a messenger for the other servers.
-
I used to run Ubuntu on these systems, but I have since migrated away to Debian.
-The main reasons were Canonical putting advertisements in my terminal and pushing Snap which has a proprietry backend.
-Two of my servers run the newly released Debian Bookworm, while one still runs Debian Bullseye.
-
Networking
-
For networking, I wanted hypervisors and virtual machines separated by VLANs for security reasons.
-The following picture shows a simplified view of the VLANs present in my home lab:
-
-
All virtual machines are connected to a virtual bridge which tags network traffic with the DMZ VLAN.
-The hypervisors VLAN is used for traffic to and from the hypervisors.
-Devices from the hypervisors VLAN are allowed to connect to devices in the DMZ, but not vice versa.
-The hypervisors are connected to a switch using a trunk link, allows both DMZ and hypervisors traffic.
-
I realised the above design using ifupdown.
-Below is the configuration for each hypervisor, which creates a new enp3s0.30 interface with all DMZ traffic from the enp3s0 interface .
This configuration seems more complex than it actually is.
-Most of it is to make sure the interface is not assigned an IPv4/6 address on the hypervisor host.
-The magic .30 at the end of the interface name makes this interface tagged with VLAN ID 30 (DMZ for me).
-
Now that we have an interface tagged for the DMZ VLAN, we can create a bridge where future virtual machines can connect to:
Just like the previous config, this is quite bloated because I donât want the interface to be assigned an IP address on the host.
-Most importantly, the bridge_ports enp3s0.30 line here makes this interface a virtual bridge for the enp3s0.30 interface.
-
And voilĂ , we now have a virtual bridge on each machine, where only DMZ traffic will flow.
-Here I verify whether this configuration works:
-
- Show
-
We can see that the two virtual interfaces are created, and are only assigned a MAC address and not a IP address:
-
root@atlas:~# ip a show enp3s0.30
-4: enp3s0.30@enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master dmzbr state UP group default qlen 1000
- link/ether d8:5e:d3:4c:70:38 brd ff:ff:ff:ff:ff:ff
-5: dmzbr: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
- link/ether 4e:f7:1f:0f:ad:17 brd ff:ff:ff:ff:ff:ff
-
-
Pinging a VM from a hypervisor works:
-
root@atlas:~# ping -c1 maestro.dmz
-PING maestro.dmz (192.168.30.8) 56(84) bytes of data.
-64 bytes from 192.168.30.8 (192.168.30.8): icmp_seq=1 ttl=63 time=0.457 ms
-
Now that we have a working DMZ network, letâs build on it to get DNS and DHCP working.
-This will enable new virtual machines to obtain a static or dynamic IP address and register their host in DNS.
-This has actually been incredibly annoying due to our friend Network address translation (NAT).
-
- NAT recap
-
Network address translation (NAT) is a function of a router which allows multiple hosts to share a single IP address.
-This is needed for IPv4, because IPv4 addresses are scarce and usually one household is only assigned a single IPv4 address.
-This is one of the problems IPv6 attempts to solve (mainly by having so many IP addresses that they should never run out).
-To solve the problem for IPv4, each host in a network is assigned a private IPv4 address, which can be reused for every network.
-
Then, the router must perform address translation.
-It does this by keeping track of ports opened by hosts in its private network.
-If a packet from the internet arrives at the router for such a port, it forwards this packet to the correct host.
-
-
I would like to host my own DNS on a virtual machine (called hermes, more on VMs later) in the DMZ network.
-This basically gives two problems:
-
-
The upstream DNS server will refer to the public internet-accessible IP address of our DNS server.
-This IP-address has no meaning inside the private network due to NAT and the router will reject the packet.
-
Our DNS resolves hosts to their public internet-accessible IP address.
-This is similar to the previous problem as the public IP address has no meaning.
-
-
The first problem can be remediated by overriding the location of the DNS server for hosts inside the DMZ network.
-This can be achieved on my router, which uses Unbound as its recursive DNS server:
-
-
Any DNS requests to Unbound to domains in either dmz or kun.is will now be forwarded 192.168.30.7 (port 5353).
-This is the virtual machine hosting my DNS.
-
The second problem can be solved at the DNS server.
-We need to do some magic overriding, which dnsmasq is perfect for :
This always overrides the public IPv4 address to the private one.
-It also overrides the DNS server for kun.is to 192.168.30.7.
-
Finally, behind the dnsmasq server, I run Powerdns as authoritative DNS server .
-I like this DNS server because I can manage it with Terraform .
-
Here is a small diagram showing my setup (my networking teacher would probably kill me for this):
-
-
Virtualization
-
https://github.com/containrrr/shepherd
-Now that we have laid out the basic networking, letâs talk virtualization.
-Each of my servers are configured to run KVM virtual machines, orchestrated using Libvirt.
-Configuration of the physical hypervisor servers, including KVM/Libvirt is done using Ansible.
-The VMs are spun up using Terraform and the dmacvicar/libvirt Terraform provider.
-
This all isnât too exciting, except that I created a Terraform module that abstracts the Terraform Libvirt provider for my specific scenario :
This automatically creates a Debian virtual machines with the properties specified.
-It also sets up certificate-based SSH authentication which I talked about before.
-
Clustering
-
With virtualization explained, letâs move up one level further.
-Each of my three physical servers hosts a virtual machine running Docker, which together form a Docker Swarm.
-I use Traefik as a reverse proxy which routes requests to the correct container.
-
All data is hosted on a single machine and made available to containers using NFS.
-This might not be very secure (as NFS is not encrypted and no proper authentication), it is quite fast.
-
As of today, I host the following services on my Docker Swarm :
For CI / CD, I run Concourse CI in a separate VM.
-This is needed, because Concourse heavily uses containers to create reproducible builds.
-
Although I should probably use it for more, I currently use my Concourse for three pipelines:
-
-
A pipeline to build this static website and create a container image of it.
-The image is then uploaded to the image registry of my Forgejo instance.
-I love it when I can use stuff I previously built :)
-The pipeline finally deploys this new image to the Docker Swarm .
-
A pipeline to create a Concourse resource that sends Apprise alerts (Concourse-ception?)
-
A pipeline to build a custom Fluentd image with plugins installed
-
-
Backups
-
To create backups, I use Borg.
-As I keep all data on one machine, this backup process is quite simple.
-In fact, all this data is stored in a single Libvirt volume.
-To configure Borg with a simple declarative script, I use Borgmatic.
-
In order to back up the data inside the Libvirt volume, I create a snapshot to a file.
-Then I can mount this snapshot in my file system.
-The files can then be backed up while the system is still running.
-It is also possible to simply back up the Libvirt image, but this takes more time and storage .
-
Monitoring and Alerting
-
The last topic I would like to talk about is monitoring and alerting.
-This is something Iâm still actively improving and only just set up properly.
-
Alerting
-
For alerting, I wanted something that runs entirely on my own infrastructure.
-I settled for Apprise + Ntfy.
-
Apprise is a server that is able to send notifications to dozens of services.
-For application developers, it is thus only necessary to implement the Apprise API to gain access to all these services.
-The Apprise API itself is also very simple.
-By using Apprise, I can also easily switch to another notification service later.
-Ntfy is free software made for mobile push notifications.
-
I use this alerting system in quite a lot of places in my infrastructure, for example when creating backups.
-
Uptime Monitoring
-
The first monitoring setup I created, was using Uptime Kuma.
-Uptime Kuma periodically pings a service to see whether it is still running.
-You can do a literal ping, test HTTP response codes, check database connectivity and much more.
-I use it to check whether my services and VMs are online.
-And the best part is, Uptime Kuma supports Apprise so I get push notifications on my phone whenever something goes down!
-
Metrics and Log Monitoring
-
A new monitoring system I am still in the process of deploying is focused on metrics and logs.
-I plan on creating a separate blog post about this, so keep an eye out on that (for example using RSS :)).
-Safe to say, it is no basic ELK stack!
-
Conclusion
-
Thatâs it for now!
-Hopefully I inspired someone to build something⊠or how not to :)
]]>Pim KunisHashicorpâs License Change and my Home Lab - Update2023-08-17T18:15:00+02:002023-08-17T18:15:00+02:00http://localhost:4000/hashicorp-license-changeSee the Update at the end of the article.
-
Already a week ago, Hashicorp announced it would change the license on almost all its projects.
-Unlike their previous license, which was the Mozilla Public License 2.0, their new license is no longer truly open source.
-It is called the Business Source Licenseâą and restricts use of their software for competitors.
-In their own words:
-
-
Vendors who provide competitive services built on our community products will no longer be able to incorporate future releases, bug fixes, or security patches contributed to our products.
-
-
I found a great article by MeshedInsights that names this behaviour the ârights ratchet modelâ.
-They define a script start-ups use to garner the interest of open source enthusiasts but eventually turn their back on them for profit.
-The reason why Hashicorp can do this, is because contributors signed a copyright license agreement (CLA).
-This agreement transfers the copyright of contributorsâ code to Hashicorp, allowing them to change the license if they want to.
-
I find this action really regrettable because I like their products.
-This sort of action was also why I wanted to avoid using an Elastic stack, which also had their license changed.1
-These companies do not respect their contributors and the software stack beneath they built their product on, which is actually open source (Golang, Linux, etc.).
-
Impact on my Home Lab
-
I am using Terraform in my home lab to manage several important things:
-
-
Libvirt virtual machines
-
PowerDNS records
-
Elasticsearch configuration
-
-
With Hashicorpâs anti open source move, I intend to move away from Terraform in the future.
-While I will not use Hashicorpâs products for new personal projects, I will leave my current setup as-is for some time because there is no real need to quickly migrate.
-
I might also investigate some of Terraformâs competitors, like Pulumi.
-Hopefully there is a project that respects open source which I can use in the future.
-
Update
-
A promising fork of Terraform has been announced called OpenTF.
-They intend to take part of the Cloud Native Computing Foundation, which I think is a good effort because Terraform is so important for modern cloud infrastructures.
-
-]]>Pim KunisMonitoring Correct Memory Usage in Fluent Bit2023-08-09T16:19:00+02:002023-08-09T16:19:00+02:00http://localhost:4000/fluent-bit-memoryPreviously, I have used Prometheusâ node_exporter to monitor the memory usage of my servers.
-However, I am currently in the process of moving away from Prometheus to a new Monioring stack.
-While I understand the advantages, I felt like Prometheusâ pull architecture does not scale nicely.
-Everytime I spin up a new machine, I would have to centrally change Prometheusâ configuration in order for it to query the new server.
-
In order to collect metrics from my servers, I am now using Fluent Bit.
-I love Fluent Bitâs way of configuration which I can easily express as code and automate, its focus on effiency and being vendor agnostic.
-However, I have stumbled upon one, in my opinion, big issue with Fluent Bit: its mem plugin to monitor memory usage is completely useless.
-In this post I will go over the problem and my temporary solution.
-
The Problem with Fluent Bitâs mem Plugin
-
As can be seen in the documentation, Fluent Bitâs mem input plugin exposes a few metrics regarding memory usage which should be self-explaining: Mem.total, Mem.used, Mem.free, Swap.total, Swap.used and Swap.free.
-The problem is that Mem.used and Mem.free do not accurately reflect the machineâs actual memory usage.
-This is because these metrics include caches and buffers, which can be reclaimed by other processes if needed.
-Most tools reporting memory usage therefore include an additional metric that specifices the memory available on the system.
-For example, the command free -m reports the following data on my laptop:
-
total used free shared buff/cache available
-Mem: 15864 3728 7334 518 5647 12136
-Swap: 2383 663 1720
-
-
Notice that the available memory is more than free memory.
-
While the issue is known (see this and this link), it is unfortunately not yet fixed.
-
A Temporary Solution
-
The issues I linked previously provide stand-alone plugins that fix the problem, which will hopefully be merged in the official project at some point.
-However, I didnât want to install another plugin so I used Fluent Bitâs exec input plugin and the free Linux command to query memory usage like so:
With this configuration, you can use the mem_available metric to get accurate memory usage in Fluent Bit.
-
Conclusion
-
Letâs hope Fluent Bitâs mem input plugin is improved upon soon so this hacky solution is not needed.
-I also intend to document my new monitoring pipeline, which at the moment consists of:
-
-
Fluent Bit
-
Fluentd
-
Elasticsearch
-
Grafana
-
]]>Pim KunisError Handling in Borgmatic2023-08-08T11:51:00+02:002023-08-08T11:51:00+02:00http://localhost:4000/backup-failureBorgBackup and Borgmatic have been my go-to tools to create backups for my home lab since I started creating backups.
-Using Systemd Timers, I regularly create a backup every night.
-I also monitor successful execution of the backup process, in case some error occurs.
-However, the way I set this up resulted in not receiving notifications.
-Even though it boils down to RTFM, Iâd like to explain my error and how to handle errors correctly.
-
I was using the on_error option to handle errors, like so:
However, on_error does not handle errors from the execution of before_everything and after_everything hooks.
-My solution to this was moving the error handling up to the Systemd service that calls Borgmatic.
-This results in the following Systemd service:
-
[Unit]
-Description=Backup data using Borgmatic
-# Added
-OnFailure=backup-failure.service
-
-[Service]
-ExecStart=/usr/bin/borgmatic --config /root/backup.yml
-Type=oneshot
-
-
This handles any error, be it from Borgmaticâs hooks or itself.
-The backup-failure service is very simple, and just calls Apprise to send a notification:
Because the error handling and alerting werenât working propertly, my backups didnât succeed for two weeks straight.
-And, of course, you only notice your backups arenât working when you actually need them.
-This is exactly what happened: my disk was full and a MariaDB database crashed as a result of that.
-Actually, the whole database seemed to be corrupt and I find it worrying MariaDB does not seem to be very resilient to failures (in comparison a PostgreSQL database was able to recover automatically).
-I then tried to recover the data using last nightâs backup, only to find out there was no such backup.
-Fortunately, I had other means to recover the data so I incurred no data loss.
-
I already knew it is important to test backups, but I learned it is also important to test failures during backups!
]]>Pim KunisUsing Ansible to alter Kernel Parameters2023-06-19T09:31:00+02:002023-06-19T09:31:00+02:00http://localhost:4000/ansible-edit-grubFor months, Iâve had a peculiar problem with my laptop: once in a while, seemingly without reason, my laptop screen would freeze.
-This only happened on my laptop screen, and not on an external monitor.
-I had kind of learned to live with it as I couldnât find a solution online.
-The only remedy I had was reloading my window manager, which would often unfreeze the screen.
-
Yesterday I tried Googling once more and I actually found a thread about it on the Arch Linux forums!
-They talk about the same laptop model, the Lenovo ThinkPad x260, having the problem.
-Fortunately, they also propose a temporary fix.
-
Trying the Fix
-
Apparently, a problem with the Panel Self Refresh (PSR) feature of Intel iGPUs is the culprit.
-According to the Linux source code, PSR enables the display to go into a lower standby mode when the sytem is idle but the screen is in use.
-These lower standby modes can reduce power usage of your device when idling.
-
This all seems useful, except when it makes your screen freeze!
-The proposed fix disables the PSR feature entirely.
-To do this, we need to change a parameter to the Intel Graphics Linux Kernel Module (LKM).
-The LKM for Intel Graphics is called i915.
-There are multiple ways to change kernel parameters, but I chose to edit my Grub configuration.
-
First, I wanted to test whether it actually works.
-When booting into my Linux partition via Grub, you can press e to edit the Grub definition.
-Somewhere there, you can find the linux command which specifies to boot Linux and how to do that.
-I simply appended the option i915.enable_psr=0 to this line.
-After rebooting, I noticed my screen no longer freezes!
-Success!
-
Persisting the Fix
-
To make the change permanent, we need to permanently change Grubâs configuration.
-One way to do this, is by changing Grubâs defaults in /etc/default/grub.
-Namely, the GRUB_CMDLINE_LINUX_DEFAULT option specifies what options Grub should pass to the Linux kernel by default.
-For me, this is a nice solution as the problem exists for both Linux OSes I have installed.
-I changed this option to:
Next, I wanted to automate this solution using Ansible.
-This turned out to be quite easy, as the Grub configuration looks a bit like an ini file (maybe it is?):
Lately, I have been learning a bit of NixOS with the intention of replacing my current setup.
-Compared to Ansible, applying this fix is a breeze on NixOS:
-
{
- boot.kernelParams=["i915.enable_psr=0"];
-}
-
-
Thatâs it, yep.
-
Conclusion
-
It turned out to be quite easy to change Linux kernel parameters using Ansible.
-Maybe some kernel gurus have better ways to change parameters, but this works for me for now.
-
As a sidenote, I started reading a bit more about NixOS and realised that it can solve issues like these much more nicely than Ansible does.
-I might replace my OS with NixOS some day, if I manage to rewrite my Ansible for it.
]]>Pim KunisSending Apprise Notifications from Concourse CI2023-06-14T23:39:00+02:002023-06-14T23:39:00+02:00http://localhost:4000/concourse-apprise-notifierRecently, I deployed Concourse CI because I wanted to get my feet wet with a CI/CD pipeline.
-However, I had a practical use case lying around for a long time: automatically compiling my static website and deploying it to my docker Swarm.
-This took some time getting right, but the result works like a charm (source code).
-
Itâs comforting to know I donât have move a finger and my website is automatically deployed.
-However, I would still like to receive some indication of whatâs happening.
-And whatâs a better way to do that, than using my Apprise service to keep me up to date.
-Thereâs a little snag though: I could not find any Concourse resource that does this.
-Thatâs when I decided to just create it myself.
-
The Plagiarism Hunt
-
As any good computer person, I am lazy.
-Iâd rather just copy someoneâs work, so thatâs what I did.
-I found this GitHub repository that does the same thing but for Slack notifications.
-For some reason itâs archived, but it seemed like it should work.
-I actually noticed lots of repositories for Concourse resource types are archived, so not sure whatâs going on there.
-
Getting to know Concourse
-
Letâs first understand what we need to do reach our end goal of sending Apprise notifications from Concourse.
-
A Concourse pipeline takes some inputs, performs some operations on them which result in some outputs.
-These inputs and outputs are called resources in Concourse.
-For example, a Git repository could be a resource.
-Each resource is an instance of a resource type.
-A resource type therefore is simply a blueprint that can create multiple resources.
-To continue the example, a resource type could be âGit repositoryâ.
-
We therefore need to create our own resource type that can send Apprise notifications.
-A resource type is simply a container that includes three scripts:
-
-
check: check for a new version of a resource
-
in: retrieve a version of the resource
-
out: create a version of the resource
-
-
As Apprise notifications are basically fire-and-forget, we will only implement the out script.
-
Writing the out script
-
The whole script can be found here, but I will explain the most important bits of it.
-Note that I only use Appriseâs persistent storage solution, and not its stateless solution.
-
Concourse provides us with the working directory, which we cd to:
-
cd"${1}"
-
-
We create a timestamp, formatted in JSON, which we will use for the resourceâs new version later.
-Concourse requires us to set a version for the resource, but since Apprise notifications donât have that, we use the timestamp:
First some black magic Bash to redirect file descriptors.
-Not sure why this is needed, but I copied it anyways.
-After that, we create a temporary file holding resourceâs parameters.
We then extract the individual parameters.
-The source key contains values how the resource type was specified, while the params key specifies parameters for this specific resource.
Here is the most important line, where we send the payload to the Apprise endpoint.
-Itâs quite straight-forward.
-
curl -v-X POST -T /tmp/compact_body.json -H"Content-Type: application/json""${apprise_host}/notify/${apprise_key}"
-
-
Finally, we print the timestamp (fake version) in order to appease the Concourse gods.
-
echo"${timestamp}">&3
-
-
Building the Container
-
As said earlier, to actually use this script, we need to add it to a image.
-I wonât be explaining this whole process, but the source can be found here.
-The most important take-aways are these:
-
-
Use concourse/oci-build-task to build a image from a Dockerfile.
-
Use registry-image to push the image to an image registry.
-
-
Using the Resource Type
-
Using our newly created resource type is surprisingly simple.
-I use it for the blog you are reading right now and the pipeline definition can be found here.
-Here we specify the resource type in a Concourse pipeline:
As can be seen, the Apprise notification can be triggered when a task is executed successfully.
-We do this using the put command, which execute the out script underwater.
-We set the notificationâs title and body, and send it!
-The result is seen below in my Ntfy app, which Apprise forwards the message to:
-
-
And to finish this off, here is what it looks like in the Concourse web UI:
-
-
Conclusion
-
Concourseâs way of representing everything as an image/container is really interesting in my opinion.
-A resource type is quite easily implemented as well, although Bash might not be the optimal way to do this.
-Iâve seen some people implement it in Rust, which might be a good excuse to finally learn that language :)
-
Apart from Apprise notifications, Iâm planning on creating a resource type to deploy to a Docker swarm eventually.
-This seems like a lot harder than simply sending notifications though.
]]>Pim KunisMy Experiences with virtio-9p2023-05-31T14:18:00+02:002023-05-31T14:18:00+02:00http://localhost:4000/virtio-9p-experiencesWhen I was scaling up my home lab, I started thinking more about data management.
-I hadnât (and still havenât) set up any form of network storage.
-I have, however, set up a backup mechanism using Borg.
-Still, I want to operate lots of virtual machines, and backing up each one of them separately seemed excessive.
-So I started thinking, what if I just let the host machines back up the data?
-After all, the amount of physical hosts I have in my home lab is unlikely to increase drastically.
-
The Use Case for Sharing Directories
-
I started working out this idea further.
-Without network storage, I needed a way for guest VMs to access the hostâs disks.
-Here there are two possibilities, either expose some block device or a file system.
-Creating a whole virtual disk for just the data of some VMs seemed wasteful, and from my experiences also increases backup times dramatically.
-I therefore searched for a way to mount a directory from the host OS on the guest VM.
-This is when I stumbled upon this blog post talking about sharing directories with virtual machines.
-
Sharing Directories with virtio-9p
-
virtio-9p is a way to map a directory on the host OS to a special device on the virtual machine.
-In virt-manager, it looks like the following:
-
-Under the hood, virtio-9p uses the 9pnet protocol.
-Originally developed at Bell Labs, support for this is available in all modern Linux kernels.
-If you share a directory with a VM, you can then mount it.
-Below is an extract of my /etc/fstab to automatically mount the directory:
-
data /mnt/data 9p trans=virtio,rw 0 0
-
-
The first argument (data) refers to the name you gave this share from the host
-With the trans option we specify that this is a virtio share.
-
Problems with virtio-9p
-
At first I had no problems with my setup, but I am now contemplating just moving to a network storage based setup because of two problems.
-
The first problem is that some files have suddenly changed ownership from libvirt-qemu to root.
-If the file is owned by root, the guest OS can still see it, but cannot access it.
-I am not entirely sure the problem lies with virtio, but I suspect it is.
-For anyone experiencing this problem, I wrote a small shell script to revert ownership to the libvirt-qemu user:
Another problem that I have experienced, is guests being unable to mount the directory at all.
-I have only experienced this problem once, but it was highly annoying.
-To fix it, I had to reboot the whole physical machine.
-
Alternatives
-
virtio-9p seemed like a good idea, but as discussed, I had some problems with it.
-It seems virtioFS might be a an interesting alternative as it is designed specifically for sharing directories with VMs.
-
As for me, I will probably finally look into deploying network storage either with NFS or SSHFS.
]]>Pim KunisHomebrew SSH Certificate Authority for the Terraform Libvirt Provider2023-05-23T11:14:00+02:002023-05-23T11:14:00+02:00http://localhost:4000/homebrew-ssh-caEver SSHâed into a freshly installed server and gotten the following annoying message?
-
The authenticity of host 'host.tld (1.2.3.4)' can't be established.
-ED25519 key fingerprint is SHA256:eUXGdm1YdsMAS7vkdx6dOJdOGHdem5gQp4tadCfdLB8.
-Are you sure you want to continue connecting (yes/no)?
-
-
Or even more annoying:
-
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
-@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
-@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
-IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
-Someone could be eavesdropping on you right now (man-in-the-middle attack)!
-It is also possible that a host key has just been changed.
-The fingerprint for the ED25519 key sent by the remote host is
-SHA256:eUXGdm1YdsMAS7vkdx6dOJdOGHdem5gQp4tadCfdLB8.
-Please contact your system administrator.
-Add correct host key in /home/user/.ssh/known_hosts to get rid of this message.
-Offending ED25519 key in /home/user/.ssh/known_hosts:3
- remove with:
- ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "1.2.3.4"
-ED25519 host key for 1.2.3.4 has changed and you have requested strict checking.
-Host key verification failed.
-
-
Could it be that the programmers at OpenSSH simply like to annoy us with these confusing messages?
-Maybe, but these warnings also serve as a way to notify users of a potential Man-in-the-Middle (MITM) attack.
-I wonât go into the details of this problem, but I refer you to this excellent blog post.
-Instead, I would like to talk about ways to solve these annoying warnings.
-
One obvious solution is simply to add each host to your known_hosts file.
-This works okay when managing a handful of servers, but becomes unbearable when managing many servers.
-In my case, I wanted to quickly spin up virtual machines using Duncan Mac-Vicarâs Terraform Libvirt provider, without having to accept their host key before connecting.
-The solution? Issuing SSH host certificates using an SSH certificate authority.
-
SSH Certificate Authorities vs. the Web
-
The idea of an SSH certificate authority (CA) is quite easy to grasp, if you understand the webâs Public Key Infrastructure (PKI).
-Just like with the web, a trusted party can issue certificates that are offered when establishing a connection.
-The idea is, just by trusting the trusted party, you trust every certificate they issue.
-In the case of the webâs PKI, this trusted party is bundled and trusted by your browser or operating system.
-However, in the case of SSH, the trusted party is you! (Okay you can also trust your own web certificate authority)
-With this great power, comes great responsibility which we will abuse heavily in this article.
-
SSH Certificate Authority for Terraform
-
So, letâs start with a plan.
-I want to spawn virtual machines with Terraform which which are automatically provisioned with a SSH host certificate issued by my CA.
-This CA will be another host on my private network, issuing certificates over SSH.
-
Fetching the SSH Host Certificate
-
First we generate an SSH key pair in Terraform.
-Below is the code for that:
Now that we have an SSH key pair, we need to somehow make Terraform communicate this with the CA.
-Lucky for us, there is a way for Terraform to execute an arbitrary command with the external data feature.
-We call this script below:
These query parameters will end up in the scriptâs stdin in JSON format.
-We can then read these parameters, and send them to the CA over SSH.
-The result must as well be in JSON format.
-
#!/bin/bash
-set-euo pipefail
-IFS=$'\n\t'
-
-# Read the query parameters
-eval"$(jq -r'@sh "PUBKEY=\(.pubkey) HOST=\(.host) CAHOST=\(.cahost) CASCRIPT=\(.cascript) CAKEY=\(.cakey)"')"
-
-# Fetch certificate from the CA
-# Warning: extremely ugly code that I am to lazy to fix
-CERT=$(ssh -oConnectTimeout=3 -oConnectionAttempts=1 root@$CAHOST'"'"$CASCRIPT"'" host "'"$CAKEY"'" "'"$PUBKEY"'" "'"$HOST"'".dmz')
-
-jq -n--arg cert "$CERT"'{"cert":$cert}'
-
-
We see that a script is called on the remote host that issues the certificate.
-This is just a simple wrapper around ssh-keygen, which you can see below.
So nice, we can fetch the SSH host certificate from the CA.
-We should just be able to use it right?
-We can, but it brings a big annoyance with it: Terraform will fetch a new certificate every time it is run.
-This is because the external feature of Terraform is a data source.
-If we were to use this data source for a Terraform resource, it would need to be updated every time we run Terraform.
-I have not been able to find a way to avoid fetching the certificate every time, except for writing my own resource provider which Iâd rather not.
-I have, however, found a way to hack around the issue.
-
The idea is as follows: we can use Terraformâs ignore_changes to, well, ignore any changes of a resource.
-Unfortunately, we cannot use this for a data source, so we must create a glue null_resource that supports ignore_changes.
-This is shown in the code snipppet below.
-We use the triggers property simply to copy the certificate in; we donât use it for itâs original purpose.
And voilĂ , we can now use null_resource.cert.triggers["cert"] as our certificate, that wonât trigger replacements in Terraform.
-
Setting the Host Certificate with Cloud-Init
-
Terraformâs Libvirt provider has native support for Cloud-Init, which is very handy.
-We can give the host certificate directly to Cloud-Init and place it on the virtual machine.
-Inside the Cloud-Init configuration, we can set the ssh_keys property to do this:
I hardcoded this to ED25519 keys, because this is all I use.
-
This works perfectly, and I never have to accept host certificates from virtual machines again.
-
Caveats
-
A sharp eye might have noticed the lifecycle of these host certificates is severely lacking.
-Namely, the deployed host certificates have no expiration date nore is there revocation function.
-There are ways to implement these, but for my home lab I did not deem this necessary at this point.
-In a more professional environment, I would suggest using Hashicorpâs Vault.
-
This project did teach me about the limits and flexibility of Terraform, so all in all a success!
-All code can be found on the git repository here.
]]>Pim Kunis
\ No newline at end of file
diff --git a/src/_site/fluent-bit-memory/index.html b/src/_site/fluent-bit-memory/index.html
deleted file mode 100644
index b1eecfd..0000000
--- a/src/_site/fluent-bit-memory/index.html
+++ /dev/null
@@ -1,19 +0,0 @@
- Monitoring Correct Memory Usage in Fluent Bit - Pim Kunis
Monitoring Correct Memory Usage in Fluent Bit
Pim KunisPim Kunis
Previously, I have used Prometheusâ node_exporter to monitor the memory usage of my servers. However, I am currently in the process of moving away from Prometheus to a new Monioring stack. While I understand the advantages, I felt like Prometheusâ pull architecture does not scale nicely. Everytime I spin up a new machine, I would have to centrally change Prometheusâ configuration in order for it to query the new server.
Previously, I have used Prometheusâ node_exporter to monitor the memory usage of my servers. However, I am currently in the process of moving away from Prometheus to a new Monioring stack. While I understand the advantages, I felt like Prometheusâ pull architecture does not scale nicely. Everytime I spin up a new machine, I would have to centrally change Prometheusâ configuration in order for it to query the new server.
In order to collect metrics from my servers, I am now using Fluent Bit. I love Fluent Bitâs way of configuration which I can easily express as code and automate, its focus on effiency and being vendor agnostic. However, I have stumbled upon one, in my opinion, big issue with Fluent Bit: its mem plugin to monitor memory usage is completely useless. In this post I will go over the problem and my temporary solution.
The Problem with Fluent Bitâs mem Plugin
As can be seen in the documentation, Fluent Bitâs mem input plugin exposes a few metrics regarding memory usage which should be self-explaining: Mem.total, Mem.used, Mem.free, Swap.total, Swap.used and Swap.free. The problem is that Mem.used and Mem.free do not accurately reflect the machineâs actual memory usage. This is because these metrics include caches and buffers, which can be reclaimed by other processes if needed. Most tools reporting memory usage therefore include an additional metric that specifices the memory available on the system. For example, the command free -m reports the following data on my laptop:
total used free shared buff/cache available
-Mem: 15864 3728 7334 518 5647 12136
-Swap: 2383 663 1720
-
Notice that the available memory is more than free memory.
While the issue is known (see this and this link), it is unfortunately not yet fixed.
A Temporary Solution
The issues I linked previously provide stand-alone plugins that fix the problem, which will hopefully be merged in the official project at some point. However, I didnât want to install another plugin so I used Fluent Bitâs exec input plugin and the free Linux command to query memory usage like so:
With this configuration, you can use the mem_available metric to get accurate memory usage in Fluent Bit.
Conclusion
Letâs hope Fluent Bitâs mem input plugin is improved upon soon so this hacky solution is not needed. I also intend to document my new monitoring pipeline, which at the moment consists of:
Fluent Bit
Fluentd
Elasticsearch
Grafana
diff --git a/src/_site/hashicorp-license-change/index.html b/src/_site/hashicorp-license-change/index.html
deleted file mode 100644
index f0452f0..0000000
--- a/src/_site/hashicorp-license-change/index.html
+++ /dev/null
@@ -1 +0,0 @@
- Hashicorp's License Change and my Home Lab - Update - Pim Kunis
Hashicorp's License Change and my Home Lab - Update
Already a week ago, Hashicorp announced it would change the license on almost all its projects. Unlike their previous license, which was the Mozilla Public License 2.0, their new license is no longer truly open source. It is called the Business Source Licenseâą and restricts use of their software for competitors. In their own words:
Vendors who provide competitive services built on our community products will no longer be able to incorporate future releases, bug fixes, or security patches contributed to our products.
I found a great article by MeshedInsights that names this behaviour the ârights ratchet modelâ. They define a script start-ups use to garner the interest of open source enthusiasts but eventually turn their back on them for profit. The reason why Hashicorp can do this, is because contributors signed a copyright license agreement (CLA). This agreement transfers the copyright of contributorsâ code to Hashicorp, allowing them to change the license if they want to.
I find this action really regrettable because I like their products. This sort of action was also why I wanted to avoid using an Elastic stack, which also had their license changed.1 These companies do not respect their contributors and the software stack beneath they built their product on, which is actually open source (Golang, Linux, etc.).
Impact on my Home Lab
I am using Terraform in my home lab to manage several important things:
Libvirt virtual machines
PowerDNS records
Elasticsearch configuration
With Hashicorpâs anti open source move, I intend to move away from Terraform in the future. While I will not use Hashicorpâs products for new personal projects, I will leave my current setup as-is for some time because there is no real need to quickly migrate.
I might also investigate some of Terraformâs competitors, like Pulumi. Hopefully there is a project that respects open source which I can use in the future.
Update
A promising fork of Terraform has been announced called OpenTF. They intend to take part of the Cloud Native Computing Foundation, which I think is a good effort because Terraform is so important for modern cloud infrastructures.
Homebrew SSH Certificate Authority for the Terraform Libvirt Provider
Pim KunisPim Kunis
Ever SSHâed into a freshly installed server and gotten the following annoying message?
The authenticity of host 'host.tld (1.2.3.4)' can't be established.
-ED25519 key fingerprint is SHA256:eUXGdm1YdsMAS7vkdx6dOJdOGHdem5gQp4tadCfdLB8.
-Are you sure you want to continue connecting (yes/no)?
-
Ever SSHâed into a freshly installed server and gotten the following annoying message?
The authenticity of host 'host.tld (1.2.3.4)' can't be established.
-ED25519 key fingerprint is SHA256:eUXGdm1YdsMAS7vkdx6dOJdOGHdem5gQp4tadCfdLB8.
-Are you sure you want to continue connecting (yes/no)?
-
Or even more annoying:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
-@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
-@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
-IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
-Someone could be eavesdropping on you right now (man-in-the-middle attack)!
-It is also possible that a host key has just been changed.
-The fingerprint for the ED25519 key sent by the remote host is
-SHA256:eUXGdm1YdsMAS7vkdx6dOJdOGHdem5gQp4tadCfdLB8.
-Please contact your system administrator.
-Add correct host key in /home/user/.ssh/known_hosts to get rid of this message.
-Offending ED25519 key in /home/user/.ssh/known_hosts:3
- remove with:
- ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "1.2.3.4"
-ED25519 host key for 1.2.3.4 has changed and you have requested strict checking.
-Host key verification failed.
-
Could it be that the programmers at OpenSSH simply like to annoy us with these confusing messages? Maybe, but these warnings also serve as a way to notify users of a potential Man-in-the-Middle (MITM) attack. I wonât go into the details of this problem, but I refer you to this excellent blog post. Instead, I would like to talk about ways to solve these annoying warnings.
One obvious solution is simply to add each host to your known_hosts file. This works okay when managing a handful of servers, but becomes unbearable when managing many servers. In my case, I wanted to quickly spin up virtual machines using Duncan Mac-Vicarâs Terraform Libvirt provider, without having to accept their host key before connecting. The solution? Issuing SSH host certificates using an SSH certificate authority.
SSH Certificate Authorities vs. the Web
The idea of an SSH certificate authority (CA) is quite easy to grasp, if you understand the webâs Public Key Infrastructure (PKI). Just like with the web, a trusted party can issue certificates that are offered when establishing a connection. The idea is, just by trusting the trusted party, you trust every certificate they issue. In the case of the webâs PKI, this trusted party is bundled and trusted by your browser or operating system. However, in the case of SSH, the trusted party is you! (Okay you can also trust your own web certificate authority) With this great power, comes great responsibility which we will abuse heavily in this article.
SSH Certificate Authority for Terraform
So, letâs start with a plan. I want to spawn virtual machines with Terraform which which are automatically provisioned with a SSH host certificate issued by my CA. This CA will be another host on my private network, issuing certificates over SSH.
Fetching the SSH Host Certificate
First we generate an SSH key pair in Terraform. Below is the code for that:
Now that we have an SSH key pair, we need to somehow make Terraform communicate this with the CA. Lucky for us, there is a way for Terraform to execute an arbitrary command with the external data feature. We call this script below:
These query parameters will end up in the scriptâs stdin in JSON format. We can then read these parameters, and send them to the CA over SSH. The result must as well be in JSON format.
#!/bin/bash
-set-euo pipefail
-IFS=$'\n\t'
-
-# Read the query parameters
-eval"$(jq -r'@sh "PUBKEY=\(.pubkey) HOST=\(.host) CAHOST=\(.cahost) CASCRIPT=\(.cascript) CAKEY=\(.cakey)"')"
-
-# Fetch certificate from the CA
-# Warning: extremely ugly code that I am to lazy to fix
-CERT=$(ssh -oConnectTimeout=3 -oConnectionAttempts=1 root@$CAHOST'"'"$CASCRIPT"'" host "'"$CAKEY"'" "'"$PUBKEY"'" "'"$HOST"'".dmz')
-
-jq -n--arg cert "$CERT"'{"cert":$cert}'
-
We see that a script is called on the remote host that issues the certificate. This is just a simple wrapper around ssh-keygen, which you can see below.
So nice, we can fetch the SSH host certificate from the CA. We should just be able to use it right? We can, but it brings a big annoyance with it: Terraform will fetch a new certificate every time it is run. This is because the external feature of Terraform is a data source. If we were to use this data source for a Terraform resource, it would need to be updated every time we run Terraform. I have not been able to find a way to avoid fetching the certificate every time, except for writing my own resource provider which Iâd rather not. I have, however, found a way to hack around the issue.
The idea is as follows: we can use Terraformâs ignore_changes to, well, ignore any changes of a resource. Unfortunately, we cannot use this for a data source, so we must create a glue null_resource that supports ignore_changes. This is shown in the code snipppet below. We use the triggers property simply to copy the certificate in; we donât use it for itâs original purpose.
And voilĂ , we can now use null_resource.cert.triggers["cert"] as our certificate, that wonât trigger replacements in Terraform.
Setting the Host Certificate with Cloud-Init
Terraformâs Libvirt provider has native support for Cloud-Init, which is very handy. We can give the host certificate directly to Cloud-Init and place it on the virtual machine. Inside the Cloud-Init configuration, we can set the ssh_keys property to do this:
I hardcoded this to ED25519 keys, because this is all I use.
This works perfectly, and I never have to accept host certificates from virtual machines again.
Caveats
A sharp eye might have noticed the lifecycle of these host certificates is severely lacking. Namely, the deployed host certificates have no expiration date nore is there revocation function. There are ways to implement these, but for my home lab I did not deem this necessary at this point. In a more professional environment, I would suggest using Hashicorpâs Vault.
This project did teach me about the limits and flexibility of Terraform, so all in all a success! All code can be found on the git repository here.
diff --git a/src/_site/infrastructure-snapshot/index.html b/src/_site/infrastructure-snapshot/index.html
deleted file mode 100644
index 9187eca..0000000
--- a/src/_site/infrastructure-snapshot/index.html
+++ /dev/null
@@ -1,41 +0,0 @@
- Home Lab Infrastructure Snapshot August 2023 - Pim Kunis
Home Lab Infrastructure Snapshot August 2023
Pim KunisPim Kunis
I have been meaning to write about the current state of my home lab infrastructure for a while now. Now that the most important parts are quite stable, I think the opportunity is ripe. I expect this post to get quite long, so I might have to leave out some details along the way.
I have been meaning to write about the current state of my home lab infrastructure for a while now. Now that the most important parts are quite stable, I think the opportunity is ripe. I expect this post to get quite long, so I might have to leave out some details along the way.
This post will be a starting point for future infrastructure snapshots which I can hopefully put out periodically. That is, if there is enough worth talking about.
Keep an eye out for the icon, which links to the source code and configuration of anything mentioned. Oh yeah, did I mention everything I do is open source?
Networking and Infrastructure Overview
Hardware and Operating Systems
Letâs start with the basics: what kind of hardware do I use for my home lab? The most important servers are my three Gigabyte Brix GB-BLCE-4105. Two of them have 16 GB of memory, and one 8 GB. I named these servers as follows:
Atlas: because this server was going to âliftâ a lot of virtual machines.
Lewis: we started out with a âMaxâ server named after the Formula 1 driver Max Verstappen, but it kind of became an unmanagable behemoth without infrastructure-as-code. Our second server we subsequently named Lewis after his colleague Lewis Hamilton. Note: people around me vetoed these names and I am no F1 fan!
Jefke: itâs a funny Belgian name. Thatâs all.
Here is a picture of them sitting in their cosy closet:
If you look look to the left, you will also see a Raspberry pi 4B. I use this Pi to do some rudimentary monitoring whether servers and services are running. More on this in the relevant section below. The Pi is called Iris because itâs a messenger for the other servers.
I used to run Ubuntu on these systems, but I have since migrated away to Debian. The main reasons were Canonical putting advertisements in my terminal and pushing Snap which has a proprietry backend. Two of my servers run the newly released Debian Bookworm, while one still runs Debian Bullseye.
Networking
For networking, I wanted hypervisors and virtual machines separated by VLANs for security reasons. The following picture shows a simplified view of the VLANs present in my home lab:
All virtual machines are connected to a virtual bridge which tags network traffic with the DMZ VLAN. The hypervisors VLAN is used for traffic to and from the hypervisors. Devices from the hypervisors VLAN are allowed to connect to devices in the DMZ, but not vice versa. The hypervisors are connected to a switch using a trunk link, allows both DMZ and hypervisors traffic.
I realised the above design using ifupdown. Below is the configuration for each hypervisor, which creates a new enp3s0.30 interface with all DMZ traffic from the enp3s0 interface .
This configuration seems more complex than it actually is. Most of it is to make sure the interface is not assigned an IPv4/6 address on the hypervisor host. The magic .30 at the end of the interface name makes this interface tagged with VLAN ID 30 (DMZ for me).
Now that we have an interface tagged for the DMZ VLAN, we can create a bridge where future virtual machines can connect to:
Just like the previous config, this is quite bloated because I donât want the interface to be assigned an IP address on the host. Most importantly, the bridge_ports enp3s0.30 line here makes this interface a virtual bridge for the enp3s0.30 interface.
And voilĂ , we now have a virtual bridge on each machine, where only DMZ traffic will flow. Here I verify whether this configuration works:
Show
We can see that the two virtual interfaces are created, and are only assigned a MAC address and not a IP address:
root@atlas:~# ip a show enp3s0.30
-4: enp3s0.30@enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master dmzbr state UP group default qlen 1000
- link/ether d8:5e:d3:4c:70:38 brd ff:ff:ff:ff:ff:ff
-5: dmzbr: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
- link/ether 4e:f7:1f:0f:ad:17 brd ff:ff:ff:ff:ff:ff
-
Pinging a VM from a hypervisor works:
root@atlas:~# ping -c1 maestro.dmz
-PING maestro.dmz (192.168.30.8) 56(84) bytes of data.
-64 bytes from 192.168.30.8 (192.168.30.8): icmp_seq=1 ttl=63 time=0.457 ms
-
Now that we have a working DMZ network, letâs build on it to get DNS and DHCP working. This will enable new virtual machines to obtain a static or dynamic IP address and register their host in DNS. This has actually been incredibly annoying due to our friend Network address translation (NAT).
NAT recap
Network address translation (NAT) is a function of a router which allows multiple hosts to share a single IP address. This is needed for IPv4, because IPv4 addresses are scarce and usually one household is only assigned a single IPv4 address. This is one of the problems IPv6 attempts to solve (mainly by having so many IP addresses that they should never run out). To solve the problem for IPv4, each host in a network is assigned a private IPv4 address, which can be reused for every network.
Then, the router must perform address translation. It does this by keeping track of ports opened by hosts in its private network. If a packet from the internet arrives at the router for such a port, it forwards this packet to the correct host.
I would like to host my own DNS on a virtual machine (called hermes, more on VMs later) in the DMZ network. This basically gives two problems:
The upstream DNS server will refer to the public internet-accessible IP address of our DNS server. This IP-address has no meaning inside the private network due to NAT and the router will reject the packet.
Our DNS resolves hosts to their public internet-accessible IP address. This is similar to the previous problem as the public IP address has no meaning.
The first problem can be remediated by overriding the location of the DNS server for hosts inside the DMZ network. This can be achieved on my router, which uses Unbound as its recursive DNS server:
Any DNS requests to Unbound to domains in either dmz or kun.is will now be forwarded 192.168.30.7 (port 5353). This is the virtual machine hosting my DNS.
The second problem can be solved at the DNS server. We need to do some magic overriding, which dnsmasq is perfect for :
This always overrides the public IPv4 address to the private one. It also overrides the DNS server for kun.is to 192.168.30.7.
Finally, behind the dnsmasq server, I run Powerdns as authoritative DNS server . I like this DNS server because I can manage it with Terraform .
Here is a small diagram showing my setup (my networking teacher would probably kill me for this):
Virtualization
https://github.com/containrrr/shepherd Now that we have laid out the basic networking, letâs talk virtualization. Each of my servers are configured to run KVM virtual machines, orchestrated using Libvirt. Configuration of the physical hypervisor servers, including KVM/Libvirt is done using Ansible. The VMs are spun up using Terraform and the dmacvicar/libvirt Terraform provider.
This all isnât too exciting, except that I created a Terraform module that abstracts the Terraform Libvirt provider for my specific scenario :
This automatically creates a Debian virtual machines with the properties specified. It also sets up certificate-based SSH authentication which I talked about before.
Clustering
With virtualization explained, letâs move up one level further. Each of my three physical servers hosts a virtual machine running Docker, which together form a Docker Swarm. I use Traefik as a reverse proxy which routes requests to the correct container.
All data is hosted on a single machine and made available to containers using NFS. This might not be very secure (as NFS is not encrypted and no proper authentication), it is quite fast.
As of today, I host the following services on my Docker Swarm :
For CI / CD, I run Concourse CI in a separate VM. This is needed, because Concourse heavily uses containers to create reproducible builds.
Although I should probably use it for more, I currently use my Concourse for three pipelines:
A pipeline to build this static website and create a container image of it. The image is then uploaded to the image registry of my Forgejo instance. I love it when I can use stuff I previously built :) The pipeline finally deploys this new image to the Docker Swarm .
A pipeline to create a Concourse resource that sends Apprise alerts (Concourse-ception?)
A pipeline to build a custom Fluentd image with plugins installed
Backups
To create backups, I use Borg. As I keep all data on one machine, this backup process is quite simple. In fact, all this data is stored in a single Libvirt volume. To configure Borg with a simple declarative script, I use Borgmatic.
In order to back up the data inside the Libvirt volume, I create a snapshot to a file. Then I can mount this snapshot in my file system. The files can then be backed up while the system is still running. It is also possible to simply back up the Libvirt image, but this takes more time and storage .
Monitoring and Alerting
The last topic I would like to talk about is monitoring and alerting. This is something Iâm still actively improving and only just set up properly.
Alerting
For alerting, I wanted something that runs entirely on my own infrastructure. I settled for Apprise + Ntfy.
Apprise is a server that is able to send notifications to dozens of services. For application developers, it is thus only necessary to implement the Apprise API to gain access to all these services. The Apprise API itself is also very simple. By using Apprise, I can also easily switch to another notification service later. Ntfy is free software made for mobile push notifications.
I use this alerting system in quite a lot of places in my infrastructure, for example when creating backups.
Uptime Monitoring
The first monitoring setup I created, was using Uptime Kuma. Uptime Kuma periodically pings a service to see whether it is still running. You can do a literal ping, test HTTP response codes, check database connectivity and much more. I use it to check whether my services and VMs are online. And the best part is, Uptime Kuma supports Apprise so I get push notifications on my phone whenever something goes down!
Metrics and Log Monitoring
A new monitoring system I am still in the process of deploying is focused on metrics and logs. I plan on creating a separate blog post about this, so keep an eye out on that (for example using RSS :)). Safe to say, it is no basic ELK stack!
Conclusion
Thatâs it for now! Hopefully I inspired someone to build something⊠or how not to :)
diff --git a/src/_site/infrastructure-snapshot/nat.png b/src/_site/infrastructure-snapshot/nat.png
deleted file mode 100644
index 0d5f72c..0000000
Binary files a/src/_site/infrastructure-snapshot/nat.png and /dev/null differ
diff --git a/src/_site/infrastructure-snapshot/servers.jpeg b/src/_site/infrastructure-snapshot/servers.jpeg
deleted file mode 100644
index b269484..0000000
Binary files a/src/_site/infrastructure-snapshot/servers.jpeg and /dev/null differ
diff --git a/src/_site/infrastructure-snapshot/unbound_overrides.png b/src/_site/infrastructure-snapshot/unbound_overrides.png
deleted file mode 100644
index f94394f..0000000
Binary files a/src/_site/infrastructure-snapshot/unbound_overrides.png and /dev/null differ
diff --git a/src/_site/infrastructure-snapshot/vlans.png b/src/_site/infrastructure-snapshot/vlans.png
deleted file mode 100644
index 7bf4add..0000000
Binary files a/src/_site/infrastructure-snapshot/vlans.png and /dev/null differ
diff --git a/src/_site/its-alive/index.html b/src/_site/its-alive/index.html
deleted file mode 100644
index ed3f035..0000000
--- a/src/_site/its-alive/index.html
+++ /dev/null
@@ -1 +0,0 @@
- It's alive! - Pim Kunis
It's alive!
Pim KunisPim Kunis
Finally, after several months this website is up and running again!
Finally, after several months this website is up and running again!
My homelab has completely changed, but the reason why it initially went offline is because of my failing CI installation. I was using Concourse CI which I was initially interested in due to the reproducible nature of its builds using containers. However, for some reason pipelines were sporadically getting stuck when I reboot the virtual machine it was running on. The fix was very annoying: I had to re-create the pipelines manually (which feels very backwards for a CI/CD system!) Additionally, my virtual machine setup back then was also quite fragile and I decided to get rid of that as well.
I have learned that having an escape hatch to deploy something is probably a good idea đ Expect a new overview of my homelab soon, in the same vein as this post from last year!
diff --git a/src/_site/virtio-9p-experiences/index.html b/src/_site/virtio-9p-experiences/index.html
deleted file mode 100644
index d37388a..0000000
--- a/src/_site/virtio-9p-experiences/index.html
+++ /dev/null
@@ -1,3 +0,0 @@
- My Experiences with virtio-9p - Pim Kunis
My Experiences with virtio-9p
Pim KunisPim Kunis
When I was scaling up my home lab, I started thinking more about data management. I hadnât (and still havenât) set up any form of network storage. I have, however, set up a backup mechanism using Borg. Still, I want to operate lots of virtual machines, and backing up each one of them separately seemed excessive. So I started thinking, what if I just let the host machines back up the data? After all, the amount of physical hosts I have in my home lab is unlikely to increase drastically.
When I was scaling up my home lab, I started thinking more about data management. I hadnât (and still havenât) set up any form of network storage. I have, however, set up a backup mechanism using Borg. Still, I want to operate lots of virtual machines, and backing up each one of them separately seemed excessive. So I started thinking, what if I just let the host machines back up the data? After all, the amount of physical hosts I have in my home lab is unlikely to increase drastically.
The Use Case for Sharing Directories
I started working out this idea further. Without network storage, I needed a way for guest VMs to access the hostâs disks. Here there are two possibilities, either expose some block device or a file system. Creating a whole virtual disk for just the data of some VMs seemed wasteful, and from my experiences also increases backup times dramatically. I therefore searched for a way to mount a directory from the host OS on the guest VM. This is when I stumbled upon this blog post talking about sharing directories with virtual machines.
Sharing Directories with virtio-9p
virtio-9p is a way to map a directory on the host OS to a special device on the virtual machine. In virt-manager, it looks like the following: Under the hood, virtio-9p uses the 9pnet protocol. Originally developed at Bell Labs, support for this is available in all modern Linux kernels. If you share a directory with a VM, you can then mount it. Below is an extract of my /etc/fstab to automatically mount the directory:
data /mnt/data 9p trans=virtio,rw 0 0
-
The first argument (data) refers to the name you gave this share from the host With the trans option we specify that this is a virtio share.
Problems with virtio-9p
At first I had no problems with my setup, but I am now contemplating just moving to a network storage based setup because of two problems.
The first problem is that some files have suddenly changed ownership from libvirt-qemu to root. If the file is owned by root, the guest OS can still see it, but cannot access it. I am not entirely sure the problem lies with virtio, but I suspect it is. For anyone experiencing this problem, I wrote a small shell script to revert ownership to the libvirt-qemu user:
Another problem that I have experienced, is guests being unable to mount the directory at all. I have only experienced this problem once, but it was highly annoying. To fix it, I had to reboot the whole physical machine.
Alternatives
virtio-9p seemed like a good idea, but as discussed, I had some problems with it. It seems virtioFS might be a an interesting alternative as it is designed specifically for sharing directories with VMs.
As for me, I will probably finally look into deploying network storage either with NFS or SSHFS.
diff --git a/src/_site/virtio-9p-experiences/virt-manager.png b/src/_site/virtio-9p-experiences/virt-manager.png
deleted file mode 100644
index 8567efb..0000000
Binary files a/src/_site/virtio-9p-experiences/virt-manager.png and /dev/null differ
diff --git a/src/about.md b/src/about.md
index 292ca93..603c174 100644
--- a/src/about.md
+++ b/src/about.md
@@ -2,8 +2,24 @@
title: Me
permalink: /about/
layout: page
-excerpt: Free PIIs
+excerpt: About me
comments: false
---
-Here I might post some personally identifiable information.
+Welcome to my humble blog! đ
+I write technical posts with the intention of either documenting problems I have solved or showing off stuff I built.
+
+My passion is self-hosting in my home lab with these important goals:
+- **Data sovereignty, privacy and autonomy**: Nowadays our data is increasingly in the hands of companies. This is problematic because these companies have but one goal: to make money, oftentimes using the data you entrust them. Worse still, these companies are not scared of barring you from your own data (see [this link](https://archive.is/saQXe) for example). These facts have made it abundantly clear we need to have full control over our own data.
+- **Expanding knowledge for my professional life**: Diving into new technologies without the risk of breaking important systems is in my opinion one of the best methods to learn. Actually, breaking things is the best way to learn! Stuff breaks (usually) because you don't fully understand it, and these failures are therefore very valuable.
+- **Fun**: đ
+
+Infrastructure as Code (IaC) is the most important principle I adhere to when building my home lab.
+With IaC, all (digital) infrastructure and systems are defined in code that can be automatically rolled out.
+Ansible is probably the most used IaC tool out there, but it has a huge problem: it suffers from configuration drift.
+You can create a task in Ansible to install a package, but if you remove this task, the package remains.
+At this point, your configuration does not reflect reality anymore.
+
+What is the solution to this configuration drift? Nix and NixOS!
+NixOS will always make sure you machine is in the exact state you define in your configuration.
+My current Linux systems now all run NixOS and I have no intention of ever going back!
diff --git a/src/ideas.md b/src/ideas.md
new file mode 100644
index 0000000..52b76a5
--- /dev/null
+++ b/src/ideas.md
@@ -0,0 +1,9 @@
+---
+title: Ideas
+permalink: /ideas/
+layout: page
+excerpt: Plans for the future
+comments: false
+---
+
+đ Under construction đ
diff --git a/src/now.md b/src/now.md
new file mode 100644
index 0000000..182348e
--- /dev/null
+++ b/src/now.md
@@ -0,0 +1,9 @@
+---
+title: Now
+permalink: /now/
+layout: page
+excerpt: Things I am working on now
+comments: false
+---
+
+đ Under construction đ