Nicolas Le Manchet

What I've Learned Today: an Engineering Logbook

Jan 9 2023 - MySQL got a packet bigger than max_allowed_packet

MySQL server has a setting called max_allowed_packet that basically limits the size of each query. The default value varies but can be as low as 4 MB, meaning that trying to fill a mediumtext column, which can support up to 16 MB, will fail.

The current value in bytes can be retrieved by the client by executing:

show variables where Variable_name = 'max_allowed_packet';

Unfortunately when receiving a query that is too big, the MySQL server will randomly either:

This means that in Python, the client application needs to correctly handle both:

pymysql.OperationalError((1153, "Got a packet bigger than 'max_allowed_packet' bytes"))
pymysql.OperationalError((2006, "MySQL server has gone away (BrokenPipeError(32, 'Broken pipe'))"))

This is problematic because the BrokenPipeError can happen for other reasons than a query that is too big, which means that the application cannot immediately identify the issue.

The documentation suggests that there is no real downside to setting max_allowed_packet to its maximum value of 1 GB on the server. Some clients also need to be aware of this change, so changing it in both places is recommended, however pymysql does not seem to care.

Oct 1 2021 - Adding asyncio support to a threaded task queue

I maintain Spinach, a Python library for distributing tasks to a cluster of machines. Since its inception it only supported threads as the unit of concurrency.

At the time this choice made sense since other available task queues had poor support for threads, good support for processes and the asynchronous landscape was still constantly changing. Fast forward a few years and Python now has async/await as part of the language and has largely settled on the asyncio framework. It now makes sense to support running tasks as asyncio coroutines.

The existing threaded implementation

Spinach has two levels of queues where jobs wait to be picked by a worker: a global queue that lives in Redis and a local one in each process running Spinach. When a process is running out of work to do it fetches a handful of jobs from Redis and puts them in the local in_queue that is consumed by a pool of worker threads managed by the Workers class:

import queue
import threading

class Workers:

    def __init__(self, num_workers: int):
        # Local queue consumed by worker threads
        self.in_queue = queue.Queue(maxsize=num_workers)

        # Start the thread pool
        for _ in range(self._num_workers):
            thread = threading.Thread(target=self._worker_func)
            thread.start()

    def _worker_func(self):
        while True:
            # Each worker thread blocks until a job is ready
            job = self.in_queue.get()

            if job is None:
                break

            # Actually execute the job
            job.task_func(*job.task_args, **job.task_kwargs)

Adding asyncio support

Creating a new kind of workers class running coroutines is straightforward. Instead of launching a pool of threads, a single thread managing the asyncio event loop is launched, creating one coroutine per worker.

import asyncio

class AsyncioWorkers:

    def __init__(self, num_workers: int):
        self.num_workers = num_workers
        self.in_queue = queue.Queue(maxsize=num_workers)

        # Create the thread managing the event loop
        thread = threading.Thread(
            target=asyncio.run,
            args=(self._async_interface_func,)
        )
        thread.start()

    async def _async_interface_func(self):
        # Create a future for each worker
        worker_futures = list()
        for _ in range(self.num_workers):
            worker_futures.append(
                asyncio.ensure_future(self._worker_func())
            )

        # Run the futures and wait for them to terminate
        await asyncio.gather(*worker_futures)

    async def _worker_func(self):
        while True:
            job = self.in_queue.get()  # Blocking!

            if job is None:
                break

            # Execute the asynchronous job
            await job.task_func(*job.task_args, **job.task_kwargs)

The _worker_func as implemented here has a problem: the queue blocks! This means that the whole event loop gets stuck when one worker waits for jobs, preventing other workers from making progress.

To solve this problem a new kind of Queue is needed, one that can be used from synchronous code to push jobs into it but that can also be used from asynchronous code to wait for jobs without blocking the event loop.

The asyncio ecosystem has such queue however it has a few design decisions that make it a poor fit for this use case. After shelving the feature for a few months waiting to come up with a better design, I stumbled upon a great talk by David Beazley "The Other Async" where he creates the exact queue needed from scratch.

Using his queue, workers can wait for jobs without blocking:

await job = self.in_queue.get_async()  # Not blocking :)

Wrapping it all together

With this change, Spinach can run asyncio tasks:

import asyncio

from spinach import Engine, MemoryBroker, AsyncioWorkers

spin = Engine(MemoryBroker())

@spin.task(name='sleep')
async def sleep(duration: int):
    await asyncio.sleep(duration)

spin.schedule(sleep, 1)
spin.start_workers(workers_class=AsyncioWorkers)

More information:

Aug 25 2021 - Using Python to check for APT updates

The python3-apt library that comes with Debian can be used to do most of what apt does from Python. Read only operations can even be performed by unprivileged users. For example to check if there are pending updates:

import apt
with apt.Cache() as cache:
    cache.upgrade()  # Only reads the packages to upgrade
    packages_to_upgrade = cache.get_changes()

Note that the library is not available on PyPI. To use it in a virtual environment install it with apt and create the venv with:

python3 -m venv --system-site-packages venv

Feb 23 2020 - FreeNAS backup with Restic

FreeNAS 11.3 natively supports Backblaze B2 for backup. Unfortunately it is done via rclone, a tool that doesn't use object storage very efficiently. For example, if a single byte changes in a 700 MB file rclone will need to reupload the whole file to B2.

I much prefer restic, a backup tool that works well with object storage and that supports encryption and snapshots. Since FreeNAS does not support restic out of the box, a bit of manual setup is required.

A jail that will contain restic needs to be created, the default values of the FreeNAS wizard are good enough. By default jails don't have access to any pool, so each path to backup needs to be whitelisted by adding a mount point from the system to the jail. For instance by adding a read-only mount from /mnt/main/pictures to /mnt/main/iocage/jails/Restic/root/mnt/pictures. All this can be done through the web GUI.

In this jail, install restic and ca_root_nss:

pkg install restic ca_root_nss

Create a directory to hold the configuration:

mkdir /root/restic

Create /root/restic/backup.conf with the configuration and chmod it as 640:

setenv BACKUP_PATHS "/mnt"
setenv RETENTION_LAST 4
setenv RETENTION_DAYS 7
setenv RETENTION_WEEKS 5
setenv RETENTION_MONTHS 12
setenv RETENTION_YEARS 3

setenv B2_ACCOUNT_ID <account ID>
setenv B2_ACCOUNT_KEY <account key>

setenv RESTIC_REPOSITORY b2:<bucket name>:<directory in bucket>
setenv RESTIC_PASSWORD <random long encryption key>

Create /root/restic/backup.sh the script that will run restic and chmod it as 750:

#!/bin/csh

source /root/restic/backup.conf

restic backup --verbose --tag periodic $BACKUP_PATHS
restic forget --verbose --tag periodic --group-by "paths,tags" --keep-last $RETENTION_LAST --keep-daily $RETENTION_DAYS --keep-weekly $RETENTION_WEEKS --keep-monthly $RETENTION_MONTHS --keep-yearly $RETENTION_YEARS

Initialize the repository and start the first backup:

source /root/restic/backup.conf
restic init
/root/restic/backup.sh

Add an entry in the crontab to run the backup every night:

crontab -e
25 3 * * * /root/restic/backup.sh

Notes:

Sep 11 2019 - Raspberry Pi network scanner

Transforming an old all-in-one HP printer/scanner into a networked scanner is quite easy on Debian Buster. The idea is to connect the scanner to a Raspberry Pi via USB, configure the SANE scanning daemon and allow remote clients to use it.

Server (Raspberry Pi)

Install the packages and configure the printer on the server:

apt install --no-install-recommends sane-utils hplip
hp-setup -i

This created a saned user and a scanner group, unfortunately a bug in Buster requires to add a custom udev rule to give permission to the saned user to use the scanner[0]:

/etc/udev/rules.d/70-libsane-group.rules

ENV{libsane_matched}=="yes", RUN+="/bin/setfacl -m g:scanner:rw $env{DEVNAME}"

Apply to rule with udevadm trigger or reboot the Pi.

At this point the saned user should be able to detect the scanner and use it[1]:

sudo -u saned sane-find-scanner
sudo -u saned scanimage -L
sudo -u saned scanimage -T

Once all these commands terminate without errors, the network part can be configured. The SANE daemon is a bit special for two reasons:

  • It needs to be started by inetd (or its systemd replacement) that will launch it when a connection on port 6566 is made, a bit like a CGI script.
  • Once it is started, it will instruct the client to make a second connection to another port, a bit like in FTP.

With that in mind, let's configure it:

/etc/sane.d/saned.conf

data_portrange = 10000 - 10100  # You can leave this commented if you don't have a firewall
192.168.0.0/24

Start the inetd that will launch saned:

systemctl enable saned.socket
systemctl start saned.socket

If you use a netfilter based firewall on the Pi (iptables or ufw), you should be able to use nf_conntrack_sane to automatically track and open the ports specified in data_portrange, but I couldn't get it to work. Instead I manually open all of them:

ufw allow from 192.168.0.0/24 to any port 6566 proto tcp
ufw allow from 192.168.0.0/24 to any port 10000:10100 proto tcp

If when you try to scan something, it briefly works but hangs after a second, it is a sign that the first connection to port 6566 works, but the second connection to one port in the range 10000:10100 does not.

Client

Setting up the client is straightforward, see instructions on Debian Wiki.

Alternatives

  • Using SSH port forwarding instead of the SANE daemon, more secure but requires connecting to the server before scanning.
  • Pretending that the scanner is connected directly to the client with USBIP.

More information:

Aug 10 2019 - Resizing swap on Debian Buster

The Debian installer allocates a swap partition matching the amount of RAM the machine has. This is done mostly to support hibernation which dumps the memory to the swap partition and powers off the computer.

For those who prefer to use sleep rather than hibernate, having a gigantic swap makes little sense. By removing the swap partition and instead creating a swap file it is simple to grow or reduce the amount of swap latter on.

Removing the swap partition and resizing the root file system [0]:

swapoff -a
lvremove /dev/debian-vg/swap_1  # Change debian with the name of your machine
lvextend -l +100%FREE /dev/debian-vg/root
resize2fs /dev/debian-vg/root
sed -i -e 's/^/# /' /etc/initramfs-tools/conf.d/resume

Creating a swap file [1]:

dd if=/dev/zero of=/swapfile bs=1024 count=1048576  # 1 GB swap file
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile

Finally replace the swap partition in /etc/fstab by /swapfile and run:

update-initramfs -u

More information:

Jul 16 2019 - Checking that arguments are compatible with a function without running it

Python allows to check that a set of arguments is compatible with a function without having to actually run the function:

>>> from inspect import signature
>>>
>>> def my_function(a, b, foo=None):
...     pass
...
>>> signature(my_function).bind(1, 2, foo='bar')
<BoundArguments (a=1, b=2, foo='bar')>  # No exception

>>> signature(my_function).bind(1, foo='bar')
TypeError: missing a required argument: 'b'

This is very useful for distributed task queues because often the machine that schedules the background job is not the same that executes it. I use it in Spinach to prevent scheduling of jobs which are sure to fail because workers would not be able to execute them.

Jul 13 2019 - PostgreSQL autovacuum not running

In PostgreSQL whenever a row is updated a new tuple is inserted and the old one is marked as dead. Likewise when a row is deleted the underlying tuple is only marked as dead. VACUUM removes dead tuples allowing to reclaim space and speed up queries.

This operation can be done manually using the command VACUUM (FULL) but it's much better to let autovacuum clean up dead rows automatically when needed.

Autovacuum is enabled by default on PostgreSQL but the default configuration is conservative as it waits for a table to have 20% of dead tuples before triggering a vacuum on it.

Checking the last manual and auto vacuum on the tables of a database:

SELECT relname, last_vacuum, last_autovacuum FROM pg_stat_user_tables;

          relname           |          last_vacuum          |        last_autovacuum
----------------------------+-------------------------------+-------------------------------
 django_site                | 2018-12-05 08:15:19.946175+00 |
 django_content_type        | 2018-12-05 08:15:19.926017+00 |
 reader_attachment          | 2018-12-05 08:15:21.058618+00 | 2019-01-31 14:56:29.451346+00

django_site and django_content_type are small read only tables, so they must contain very few dead rows which explains why they have never been vacuumed automatically.

reader_attachment however sees many updates and at the time of writing the last auto vacuum was run about six months ago.

Current autovacuum settings can be checked with:

SELECT name, setting from pg_settings where category like 'Autovacuum';

                name                 |  setting
-------------------------------------+-----------
 autovacuum                          | on
 autovacuum_analyze_scale_factor     | 0.1
 autovacuum_analyze_threshold        | 50
 autovacuum_vacuum_scale_factor      | 0.2
 autovacuum_vacuum_threshold         | 50

Here we can make sure that autovacuum is actually turned on autovacuum = on. The next interesting settings is autovacuum_vacuum_scale_factor which is a percentage of dead rows in a table to reach before a vacuum is triggered.

Checking the percentage of dead rows per table:

SELECT relname as table, n_live_tup as live, n_dead_tup as dead, ROUND(CAST((n_dead_tup * 100.0 / (n_live_tup+1)) AS FLOAT)) AS percent_dead FROM pg_stat_user_tables;

           table            |  live   | dead  | percent_dead
----------------------------+---------+-------+--------------
 django_site                |       1 |     0 |            0
 django_content_type        |      18 |     0 |            0
 reader_attachment          |  575984 | 65414 |           11

Here lowering autovacuum_vacuum_scale_factor = 0.05 would trigger autovacuum on reader_attachment since it contains more than 5% of dead tuples.

More information:

Jul 8 2019 - Pagination in MySQL

Every once in a while I need to implement pagination in an application. The approach using OFFSET/LIMIT is straightforward, getting the third page of a product list with 20 items per page is as simple as:

SELECT *
FROM products
ORDER BY id
LIMIT 20 OFFSET 40;

This works great for the first handful of pages, but as the page count grows the query gets slower and slower. This is because the SQL engine cannot directly land on the first row of the page. It needs to go through all rows starting from the first one in the first page just to count them. Even with a proper index, the engine still needs to read many system pages in the index itself.

The alternative, often called keyset pagination, does not count pages but remembers the ID of the last row seen in the previous page.

SELECT *
FROM products
WHERE id > 560
ORDER BY id
LIMIT 20;

Now the SQL engine can use an index to land directly on the row 560 and start retrieving the next 20 rows. It's much faster but comes with a big caveat: it is not possible to easily jump from one page to another page that does not come just after.

More information Use The Index, Luke!.

Jul 7 2019 - PostgreSQL index not being used

I was trying to speed up a query by adding a composite index to a table:

CREATE INDEX ON reader_article (feed_id, published_at);

I was sure that the index should be able to improve things but somehow I didn't notice any difference after creating it. Running EXPLAIN ANALYZE indeed showed that another index was used, but not the newly created one:

EXPLAIN ANALYZE SELECT COUNT(*) AS "__count" FROM "reader_article" WHERE ("reader_article"."feed_id" = 1890 AND "reader_article"."published_at" > '2018-07-07T16:08:09.562165+00:00'::timestamptz);
Aggregate  (cost=25857.39..25857.40 rows=1 width=8) (actual time=3444.532..3444.532 rows=1 loops=1)
    ->  Index Scan using reader_article_feed_id_a710e3e1 on reader_article  (cost=0.43..25790.23 rows=26865 width=0) (actual time=1.216..3436.333 rows=26985 loops=1)
          Index Cond: (feed_id = 1890)
          Filter: (published_at > '2018-07-07 16:08:09.562165+00'::timestamp with time zone)
Planning Time: 0.476 ms
Execution Time: 3444.624 ms

It turns out PostgreSQL needs to update its statistics to better know the cost associated with using (or not) each index. This is done with the help of VACUUM ANALYZE:

VACUUM ANALYZE reader_article;

Sure enough, once the stats are updated the query runs in 14 ms using the composite index.

Jul 6 2019 - Timezone abbreviations in dateutil

In Atoma, I use dateutil to parse dates found in RSS and Atom feeds. Some of these dates are RFC 822 which can include timezone abbreviations like EDT or PST.

Because these abbreviations are ambiguous dateutil does not parse them by default, so it throws an UnknownTimezoneWarning. However the few aliases used by RFC 822 are well defined and can be added manually:

tzinfos = {
    'UT': gettz('GMT'),
    'EST': -18000,
    'EDT': -14400,
    'CST': -21600,
    'CDT': -18000,
    'MST': -25200,
    'MDT': -21600,
    'PST': -28800,
    'PDT': -25200
}
dateutil.parser.parse(date_str, tzinfos=tzinfos)

Oct 12 2018 - Django database connections in background workers

Executing of a task in a background worker is similar to handling a request in a WSGI server. At the beginning the task or request must fetch a database connection and at the end it must clean it up.

Django uses signals sent at the beginning and end of a request to manage database connections. This means that task queues can do the same and be confident that background tasks get the same isolation as HTTP requests.

In Spinach this translates to:

from django.db import reset_queries, close_old_connections
from spinach import signals

@signals.job_started.connect
def job_started(*args, job=None, **kwargs):
    reset_queries()
    close_old_connections()

@signals.job_finished.connect
def job_finished(*args, job=None, **kwargs):
    close_old_connections()

Oct 10 2018 - Testing reusable Django apps

I added a Django integration to Spinach, my Redis task queue for Python. It comes in the form of a small Django application. Naturally I wanted to write automated tests to make sure I don't break it by mistake in the future.

Since the code lives in the main Spinach repository, I didn't want to setup a full Django project just for the sake of testing. It turns out this is not needed, Django can be bootstrapped right from the test suite:

import django
import django.conf

def test_django_app():
    django.conf.settings.configure(
        INSTALLED_APPS=('spinach.contrib.spinachd',),
    )
    django.setup()

    # Start testing here

Oct 2 2018 - Disallowing access to Django admin for non-staff users

Websites built with Django often have their own login page, making the admin login page unnecessary. Django does not provide a setting to disable the admin login page while keeping the rest of the dashboard.

A simple middleware can prevent anonymous and regular users from seeing the admin dashboard:

from django.http import Http404
from django.urls import reverse


def hide_django_admin_middleware(get_response):

    def middleware(request):
        if request.path.startswith(reverse('admin:index')):
            if not request.user.is_staff:
                raise Http404()

        return get_response(request)

return middleware

Sep 27 2018 - Optimizing Caddy logs for Goaccess

I recently discovered Goaccess, a simple tool that generates analytics reports from web server logs.

The more information the log file contains, the better the report is. The default log format of Caddy does not record virtual hosts, user agent, referer and latency so they need to be added explicitly:

log / /var/log/caddy/access.log "{hostonly} {combined} {latency_ms}"

Goaccess can then use this extra information with:

goaccess access.log -o goaccess.html --log-format='%v %h %^[%d:%t %^] "%r" %s %b "%R" "%u" %L' --time-format='%H:%M:%S' --date-format='%d/%b/%Y'

Mar 13 2018 - Postgres on ZFS

I always thought that Copy-On-Write filesystems like ZFS or Btrfs were poor fit for databases because of the amount of in-place updates the database does.

This article from 2nd Quandrant actually shows that ZFS can work well with Postgres. The key seems to keep the default ZFS block size of 128k (unlike the usually recommended 8k) and use a fast disk for ZFS SLOG and cache.

The big advantage of running a database on a COW filesystem is the ability to take an atomic snapshot of the database at any moment easily.

Mar 11 2018 - Docker userland proxy

I was trying to make my task queue based on Redis more resilient to network errors. Both my application and Redis use TCP keepalive to detect disconnects on long lived connections.

To do that, I was simulating network partitions by dropping traffic to/from the container Redis runs in. This is easily done by entering the Redis process network namespace and adding an iptables rule:

nsenter -t <Redis PID> --net iptables -A INPUT -j DROP

This actually worked and made my application and Redis unable to talk with each other, but not detect the disconnect if no traffic was happening (as TCP keepalive should). The reason is related to how Docker publishes ports to the host.

When publishing a port from a docker container using the default bridged network, the containerized process does not actually bind on the host. The bind is done by a Docker process called the Docker userland proxy which runs on the host as root.

When an application connects to the containerized Redis, it in fact connects to another process which forwards packets to Redis. Thus my application did not detect the disconnect because the TCP keepalive probes between it and the userland proxy were successful, only the traffic between the proxy and Redis was dropped.

Mar 8 2018 - Glibc getpid syscall caching

Glibc used to cache the result of the getpid syscall. Because the caching did not play well, it was removed from newer versions of glibc. As a result the call got more expensive and it is even visible from Python.

Debian Jessie shipped the old glibc while Stretch has the new one without cache. Alpine uses musl libc, which apparently does not cache the syscall:

$ docker run -ti --rm python:3.6-jessie python -m timeit -s "import os" "os.getpid()"
10000000 loops, best of 3: 0.0523 usec per loop

$ docker run -ti --rm python:3.6-stretch python -m timeit -s "import os" "os.getpid()"
10000000 loops, best of 3: 0.193 usec per loop

$ docker run -ti --rm python:3.6-alpine python -m timeit -s "import os" "os.getpid()"
1000000 loops, best of 3: 0.203 usec per loop

Mar 7 2018 - Oven clocks getting slowed down in Europe

I noticed that the clock of my kitchen oven got late, it lost about three minutes in just two weeks. It surprised me since this clock has been very accurate for two years.

I got the answer from a Hacker News thread: oven clocks often don't have a quartz but instead count how many times the tension goes to zero, assuming that at 50 Hz it reaches zero 100 times per second.

This works well when the frequency is stable, but currently Europe suffers from a deficit in energy production slowing the grid down. Thus clocks getting late.

Feb 15 2018 - Docker and writing to stdout

On a Docker container, a process writing to stdout can get blocked. The Docker daemon is responsible for consuming the stream and if the daemon cannot consume the data fast enough (during a restart for instance) calls to print or even a logger.info can pause the execution until the buffer is cleared. By default the buffer size is 64 KB.

Feb 6 2018 - Cleaning untrusted HTML markup

RSS and Atom feeds embed HTML markup that readers should display correctly. For web based readers, just including the HTML is very dangerous since an evil feed could easily embed Javascript code.

Mozilla made bleach, a nice library that has a strict whitelist of HTML tags and attributes that can safely be used within a web page.

Feb 4 2018 - Safely loading XML files in Python

XML is a tricky format full of unexpected behavior. The only safe way to load untrusted XML files in Python is to do it via defusedxml.

This is what I use in Atoma to safely load RSS and Atom feeds.

Feb 1 2018 - Gracefully stop Python Docker container

Some Python libraries like Waitress or my own Spinach expect to be stopped by raising a KeyboardInterrupt. This exception is raised automatically by Python when the interpreter receives a SIGINT.

On the other hand, Docker by default sends a SIGTERM to a container and waits 10 seconds for it to stop before killing it with a SIGKILL. To gracefully stop a Python process running in Docker, KeyboardInterrupt can be raised when receiving a SIGTERM:

import signal

def handle_sigterm(*args):
    raise KeyboardInterrupt()

signal.signal(signal.SIGTERM, handle_sigterm)

Jan 25 2018 - New Python 3 email API

In the latest releases of Python the email package was redesigned. It is now more user-friendly, making the complicated topic of emails encodings easier to approach.

For instance the old API required internationalized headers to be decoded manually:

>>> import email
>>> data = b'Subject: Asperges pour le =?utf-8?q?d=C3=A9jeuner?=\n\n'
>>> m = email.message_from_bytes(data)
>>> m['Subject']
'Asperges pour le =?utf-8?q?d=C3=A9jeuner?='

The new API takes care of this automatically:

>>> import email
>>> from email import policy
>>> data = b'Subject: Asperges pour le =?utf-8?q?d=C3=A9jeuner?=\n\n'
>>> # For backward compatibility reasons, the policy needs
>>> # to be passed explicitly to use the new API
>>> m = email.message_from_bytes(data, policy=policy.default)
>>> m['Subject']
'Asperges pour le déjeuner'

Jan 24 2018 - CPU microcode update

The microcode running on the CPU can be updated by either flashing the BIOS or injecting the code very early on boot.

The second approach is easy on Linux: the microcode gets copied into the initramfs that is run before booting the Linux kernel. As the initramfs runs early enough on boot, it is not too late to update the CPU microcode.

The Debian wiki explains how to do that, it boils down to:

apt install intel-microcode

After a reboot the update can be confirmed in kernel logs:

CPU1 microcode updated early to revision 0x20, date = 2017-01-27