TTimo

TTimo

@TTimo@mastodon.social

Software freelancer. Previously id Software. Valve contractor: Linux, Steam Deck, Steam Frame. Other tech startups. Opinions are my own.

1,893 posts 1,587 followers 580 following

Recent Posts

Certificate verification of sites backed by letsencrypt in python

A follow up to my post a few months back about setting up letsencrypt certificates for appengine sites .. I found that when accessing the site with python requests or urllib.request modules, I was still getting a SSL certificate verification failure "unable to get local issuer certificate". Browsers however have no problem with the site so it didn't seem like a general problem with my setup.

I found a somewhat old SO issue about this but none of the solutions really worked out. That did put me on the right track to produce this fix though:

# appengine backend by a letsencrypt certificate
TEST_URL='https://core-drones.corecomplex.cc/testSSL'

import os
import ssl
R10_PEM=os.path.abspath('letsencrypt-r10.pem')
context = ssl.create_default_context()
context.load_verify_locations(cafile=R10_PEM)

import urllib.request
response = urllib.request.urlopen(TEST_URL, context=context)
print('urllib.request success')

# verify takes the path to a CA bundle, we want the certifi bundle + our extra R10
import shutil
import certifi
ca_bundle_path = os.path.abspath('ca_bundle.pem')
shutil.copyfile(certifi.where(), ca_bundle_path)
ca_bundle = open(ca_bundle_path, 'at')
ca_bundle.write(open(R10_PEM, 'rt').read())
ca_bundle.close()
print(f'prepared {ca_bundle_path} from R10 cert and {certifi.where()}')

import requests
response = requests.get(TEST_URL, verify=ca_bundle_path)
print('requests.get success')

And the accompanying letsencrypt-r10.pem issuer certificate. I extracted this by following the certificate information in my browser and downloading the R10 PEM file.

This url on my test site gives a good overview of what is going on and reports "certificate chain is incomplete". My understanding is that browsers don't carry the R10 certificate, but they are smart enough to download it on the fly to verify the chain. Python needs a little help, requests being the more annoying module as it doesn't support adding certificates, so you need to pull the current set from certifi and append to it yourself. Pfew!

Update 1/20 - I had a very instructive follow up conversation with another Mastodon user who obviously understands the intricacies of certificate verification better than I do, concluding that this is likely a bug in google's appengine setup ..

Improving Steam Client stability on Linux: setenv and multithreaded environments

The Steam client update on November 5th mentions "Fixed some miscellaneous common crashes." in the Linux notes, which I wanted to give a bit of background on. There's more than one fix that made it in under the somewhat generic header, but the one change that made the most significant impact to Steam client stability on Linux has been a revamping of how we are approaching the setenv and getenv functions.

One of my colleagues rightly dubbed setenv "the worst Linux API". It's such a simple, common API, available on all platforms that it was a little difficult to convince ourselves just how bad it is. I highly encourage anyone who writes software that will run on Linux at some point to read through "RachelByTheBay"'s very engaging post on the subject.

Being the consummate Linux developers that we are, of course we already knew that setenv and getenv aren't safe to use in multithreaded environments. Our policy up to now had been to minimize usage of these functions, and hope for the best.

The Steam client collects basic crash information. The reports have a backtrace of where the crash happened, and what the other threads were doing. On Linux this data is very noisy, there is so much variation across distributions, driver versions, window manager choices, extensive user customization etc., that the reports do not bucket as nicely as they do on Windows.

After a concerted effort to improve our grouping, a pattern emerged. It turns out that if you call setenv in a multithreaded program, sometimes you will crash outright, but that's pretty rare. That happens, but the volumes we saw were always low.

We found that other threads would blow up though, usually with a SIGABRT, shortly after calling getenv themselves. The backtraces for such crashes were all over the place and could not be easily tied to a single cause, but there were several orders of magniture more of those that we had direct setenv crashes.

There is no silver bullet to address this. These APIs are thread safe on Windows and Mac, so developers use them. Mac opted to leak the strings rather than crash (see BUGS in the Mac OS X manual page for getenv).

In the latest release of the Steam client we changed several things:

  • We removed the majority of setenv calls. It was mostly used when spawning processes, and refactoring to use exevpe to pass down a prepared environment is an all around improvement.
  • We reduced how much we rely on getenv, mostly by caching the calls. There is still an uncomfortable amount of it, but it's in OS libraries at this point (x11, xcb, dbus etc.) and we continue reducing it's usage.
  • For the few remaining setenv use cases that could not be easily refactored, we introduced an 'environment manager' that pre-allocates large enough value buffers at startup for fixed environment variable names, before any threading has started.

This last change is what really made a difference for us. Large enough buffers are preallocated at startup, the targeted environment variables exist for the whole process lifetime but start as an empty string. Wherever a setenv would previously happen, we call getenv first to make sure the buffer hasn't moved, and use a direct string copy to update the value. That's not a completely reliable fix, a third party library could still call setenv and trigger crashes, there's still a risk of data races, but we've observed a significant reduction in SIGABRT volumes.

If this can be addressed in glibc, it may involve a tradeoff on features, maybe an opt-in mechanism with a slight departure from the "impossible" POSIX spec. That's something we may pursue in the long term if we can propose something sensible.

Using a SSL certificate from letsencrypt with google app engine

I'm reviving some antique code of mine that uses google app engine. The SSL certificate was self-signed and long expired so this time around I tried to use letsencrypt.

If you search you'll likely come across this post from 2015 which is mostly correct but has a few gotchas. I'm taking a few notes for next time I need to renew. I was doing this on Windows for a change.

  • Install certbox.exe via pip install certbot

  • Request a certificate from letsencrypt:

certbot.exe certonly --manual --preferred-challenges=dns --email <your email> --agree-tos --no-eff-email --key-type rsa -d <your hostname>

I prefer the DNS challenge, I find adding a TXT record is easier than uploading some custom request handler.

Make sure to set --key-type rsa ! This is the important bit, certbot has switched to ECDSA keys by default and app engine only supports RSA.

  • Convert the private key from "OpenSSH key" to "PEM encoded RSA key":

Make a copy of the private key that certbot downloaded (privkey2-rsa.pem below) and convert it: ssh-keygen.exe -p -N "" -m pem -t rsa -f privkey2-rsa.pem. See this SO post for more details.

  • You can now upload the fullchain file and the key file to app engine. If you are getting errors such as "The private key you've selected does not appear to be valid." or "the certificate data is invalid", you didn't correctly configure to use RSA or didn't convert the key correctly.

gdb the hard way: using add-symbol-file

gdb's add-symbol-file command is versatile and powerful, but it has some gotchas that will catch the unaware. I haven't found good examples showing how to use it, so I thought I'd write down a few notes:

First, you can read the official documentation, to get a general idea: Commands to specify files.

There is a variety of situations where you'll have gdb attached to a process and it won't find the dynamic modules and debug symbols. In the example below we have a game running under proton that spits out an error message and crashes: *** bit out of range 0 - FD_SETSIZE on fd_set ***: terminated.

A bit of googling and you'll know this crashes in glibc. We would like to obtain a backtrace and investigate, but here is what we have as a starting point:

Thread 248 "IPC:CSteamEngin" received signal SIGABRT, Aborted.   
[Switching to LWP 1059407]   
0x00007b0c1edd683c in ?? ()   
(gdb) bt   
#0 0x00007b0c1edd683c in ?? ()   
#1 0x0000000000000000 in ?? ()

We know we should be in glibc, we can verify this by looking at the maps:

root@vanguard ~# cat /proc/1059034/maps | grep libc   
7b0c1ed48000-7b0c1ed6e000 r--p 00000000 00:19 43674755 /run/host/usr/lib/libc.so.6   
7b0c1ed6e000-7b0c1eec8000 r-xp 00026000 00:19 43674755 /run/host/usr/lib/libc.so.6   
7b0c1eec8000-7b0c1ef1c000 r--p 00180000 00:19 43674755 /run/host/usr/lib/libc.so.6

We are interested in the executable section (r-xp), also known as the .text section. 7b0c1edd683c is within 7b0c1ed6e000-7b0c1eec8000, that checks out.

But gdb didn't load the symbols for glibc, can we fix that?

This is where the fun begins! Based on the documentation, one might be tempted to type:

add-symbol-file /usr/lib/libc.so.6 0x7b0c1ed6e000

gdb happily eats that up, finds the glibc debug symbols on Arch's debuginfod servers, and shows you this:

(gdb) bt   
#0 0x00007b0c1edd683c in __pthread_mutex_cond_lock_full (mutex=0x102a4f) at ../nptl/pthread_mutex_lock.c:514   
#1 0x0000000000000000 in ?? ()

Still no backtrace. Mutexes when we should be in FD_SET and friends? What gives?

It turns out we gave gdb the wrong address, and since it doesn't know any better, it did what we asked for, loaded the symbols in the wrong location and now they are off. We need to figure the correct address for the .text section.

Use readelf to find the offset of the .text section: readelf -S /usr/lib/libc.so.6

The offset is 263c0

And the /proc/1059034/maps above reports that offset 26000 in libc.so.6 was loaded at address 7b0c1ed6e000

So that's the correction we need. The .text section is actually at 7b0c1ed6e000 + 3c0: add-symbol-file /usr/lib/libc.so.6 -s .text 0x7b0c1ed6e3c0

Now we get the beginnings of a valid backtrace:

(gdb) bt   
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44   
#1 0x00007b0c1edd68a3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78   
#2 0x00007b0c1ed86668 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26   
#3 0x00007b0c1ed6e4b8 in __GI_abort () at abort.c:79   
#4 0x00007b0c1ed6f390 in __libc_message (fmt=fmt@entry=0x7b0c1eee62fc "*** %s ***: terminated\n") at ../sysdeps/posix/libc_fatal.c:150   
#5 0x00007b0c1ee66b4b in __GI___fortify_fail (msg=msg@entry=0x7b0c1eeeb418 "bit out of range 0 - FD_SETSIZE on fd_set") at fortify_fail.c:24   
#6 0x00007b0c1ee66642 in __GI___fdelt_chk (d=<optimized out>) at fdelt_chk.c:26  
#7 0x00007b0c1a067f26 in ?? ()

Frame #7 is in steamclient.so, which I can't show you here, but the same process applies to load the symbols in and get the complete trace (if you have them that is).

I've been bit by this on several occasions. Now I have some notes I can go back to! I hope it'll be useful to someone else.

GtkRadiant is now available on Flathub

For roughly 25 years I've told folks looking to run GtkRadiant on Linux to download the source and build it themselves. Last week this finally changed, and users now have the option to install an official GtkRadiant release from Flathub.

The source code for GtkRadiant is very old and hardly evolves anymore. It depends on gtk2 and gtkglext, libraries that are no longer in mainline distributions and require patching in order to build on modern systems. Flatpak offers a stable SDK and runtime, I'm hoping this shields us from distribution and compiler issues in the future.

Doing feature development iteration for an app via Flatpak introduces some annoying friction. It's possible to work with a local checkout of the source at least, but I have not found a way to do incremental rebuilds .. please comment on this post if you have a solution!

Looking at development workflows based on distrobox for instance could be a great direction to follow in order to improve this. But for now, the insulation we get from distro variations is well worth it.