Wesley Moore

Somewhere to write things down

Progress on ULD has been slow of late. I haven't been feeling especially motivated to code out of work lately but I have continued to think about this project. I feel like I'm a bit off in the weeds building the ports tree build tool before I've even validated whether the particular set of tools I've selected for the base the system will even work together.

So with that in mind I think I'll take a step back to try to build a minimal bootable system image more or less manually to see if the selected components are capable of producing a working system.

If that works then I think the next step would be to try to get the system to compile itself, and from there it should be self hosting and further incremental improvements can be made.

In the previous post I described the small configuration language that will be used the define the ULD ports tree. In this post I outline the tool that I'm building that will take those definitions and hopefully turn them into built software.

The build tool is responsible for taking a description of how a particular piece of software is built, and what it depends on and building it. I have a few goals in mind for the build tool:

  • Isolated builds
  • Don't build as root
  • Fast and reliable

Isolated Builds

A common issue with creating new packages is correctly determining the dependencies of the software being packaged. It's not uncommon to see comments on AUR packages pointing out dependencies that were missed because they were picked up from the host system. xbsp-src goes some way towards addressing this by performing builds in a chroot isolated from the host. It calls this the masterdir. However it appears that it still installs all built packages into the masterdir so it's still possible to have undeclared dependencies.

My idea is to construct the chroot from the base system plus the declared dependencies. Each package being build is staged into an isolated file tree that mirrors the structure of the system it will be installed in. For example the build version of the pastel tool looks like this:

pastel/pkg
└── pastel
   └── usr
      ├── bin
      │  └── pastel
      └── share
         ├── bash-completion
         │  └── completions
         │     └── pastel
         ├── fish
         │  └── vendor_completions.d
         │     └── pastel.fish
         ├── licenses
         │  └── pastel
         │     ├── LICENSE-APACHE
         │     └── LICENSE-MIT
         └── zsh
            └── site-functions
               └── _pastel

Given all built package will have a (versioned) directory like that, those file trees can be overlaid with a base tree to construct the chroot. Additionally it can hopefully all be read-only. The overlay file system in the Linux kernel will be used to create this union view.

When an individual package is built and staged into its tree xbps-create will take care of creating the package from it. xbps is the package manager I've chosen for ULD, so it will take care of managing installation, updating, and deletion of packages on a ULD system.

Don't Build as Root

It's preferable to build software as an unprivileged user. Sometimes there is a need to be root at some stage though, such as to set file owners and permissions. One option is something like fakeroot, which uses so LD_PRELOAD hackery to to patch libc functions:

fakeroot works by replacing the file manipulation library functions (chmod(2), stat(2) etc.) by ones that simulate the effect the real library functions would have had, had the user really been root. These wrapper functions are in a shared library /usr/lib//libfakeroot-.so or similar location on your platform. The shared object is loaded through the LD_PRELOAD mechanism of the dynamic loader. (See ld.so(8))

I'd prefer to avoid such hacks though, especially since it can get in the way of other sensible things. The option I'm exploring is inspired by the OpenBSD project where often their tools will be started as root then drop privileges (and often additionally limit functionality with unveil and pledge). The build tool will be started as root but it will spawn builders that run with regular/reduced privileges. The master process can still potentially do things that require root though, if needed.

Fast and Reliable

A system tool requires a systems language, so along with the rest of the ULD tools that are written from scratch the build tool is written in Rust. This means it is able to readily interact at a system level (constructing and mounting the union filesystem, chroot, multi-process builder, etc.) and be fast and correct about it.

The builder will aim to make the most of the resources available to it. That means using all available cores where possible, downloading distfiles in parallel, and removing the need to install any extra runtime (such a scripting language) to use it.

What's Been Done

So far the build tool can read the full ports tree and then come up with a build plan for how to build either the full set of packages, or a named subset. The ports tree is represented as a graph using the petgraph crate. This crate and data structure make traversing the hierarchy to construct a build plan simple and well defined. Using this library means all dependency queries can be done rapidly, in memory, without the need to repeatedly go to the file system or write out state into the ports tree.

I'm currently working on the worker pool that will be sent build jobs to complete. Once that's in place I'll need to start implementing the various build-styles (cargo, GNU configure, etc.).

Having decided on a set of software that I’d like to compose into a Linux distribution I need a way to repeatedly build and package that software. This is where the ports tree comes in. The ports tree defines recipes for how software is built. Typically these are defined using Makefiles (BSDs) or scripting languages: bash (Arch and Void Linux), Tcl (MacPorts), Python (Gentoo), Ruby (Homebrew), etc.

The issue with general purpose scripting languages is that they can do anything they want including access and change your system, and the network. This is fine if the ports are from trusted sources but if they aren’t like in the Arch User Repository (AUR) then extracting metadata requires executing arbitrary user submitted code, which is not good. The AUR solution is to have the user evaluate the script and supply an inert .SRCINFO file alongside the PKGBUILD. Alternatively you could run the script in a container or sandbox.

Another issue with general purpose scripting languages is they can be slow. For example, both Portage from Gentoo (Python) and MacPorts feature multiple second “calculating...” delays for some operations. I'd like something that is fast, even on modest hardware.

Something the scripting languages do have going for them is things like variable interpolation and a nice readable package definition. With these things in mind I set about trying to find a configuration language with these attributes:

  • The readability of bash based solutions
  • Fast
  • Variable interpolation
  • Configuration language, not general purpose language
  • Sandboxed/non-Turing complete

I explored lots of possibilities. Most were eliminated because they didn't support interpolation. I initially landed on Dhall and proceeded with defining a couple of packages with it. The main issue was that idiomatic formatting was really unpleasant to read. For example:

let default = ../../lib/defaults/Package.dhall

let register = ../../lib/register.dhall

let BuildStyle = ../../lib/types/BuildStyle.dhall

let pkgname = "ripgrep"

let version = "11.0.2"

in  [ register
        "ripgrep"
        (   default
          ⫽ { pkgname =
                pkgname
            , version =
                version
            , revision =
                1
            , build-style =
                BuildStyle.Cargo
            , hostmakedepends =
                [ "asciidoc" ]
            , short-desc =
                "Fast search tool inspired by ag and grep"
            , maintainer =
                "Wesley Moore <wes@wezm.net>"
            , license =
                "Public Domain OR MIT"
            , homepage =
                "https://github.com/BurntSushi/ripgrep/"
            , distfiles =
                [ { url =
                      "https://github.com/BurntSushi/${pkgname}/archive/${version}.tar.gz"
                  , checksum =
                      "0983861279936ada8bc7a6d5d663d590ad34eb44a44c75c2d6ccd0ab33490055"
                  }
                ]
            }
        )
    ]

Further research led me to Tcl, which MacPorts also uses. The language is super simple and its command oriented nature leads to very readable code. It’s still a general purpose scripting language though, with all the system access that allows. So at this point I decided to implement my own Tcl inspired configuration language. I’m still unsure what to call it. The crate is currently called tcl but I’ve been using rcl as the file extension. tcl in this context stands for, tiny configuration language, and rcl is Rust command language. I’ll stick with rcl for the rest of the post.

A port defined in rcl looks like this:

set name ruby
set version 2.6.3
set ruby_abiver 2.6.0
set subdir 2.6

pkgname $name
version $version
revision 2
build-style gnu-configure {
    configure-args --enable-shared --disable-rpath DOXYGEN=/usr/bin/doxygen DOT=/usr/bin/dot PKG_CONFIG=/usr/bin/pkg-config
    make-build-args all capi
}
hostmakedepends pkg-config bison groff
makedepends {
  zlib-devel
  readline-devel
  libffi-devel
  libressl-devel
  gdbm-devel
  libyaml-devel
  pango-devel
}
testdepends tzdata
short-desc "Ruby programming language"
homepage http://www.ruby-lang.org/en/
maintainer "Wesley Moore <wes@wezm.net>"
license "Ruby BSD-2-Clause"
distfile https://cache.ruby-lang.org/pub/ruby/$subdir/$name-$version.tar.bz2 {
  checksum dd638bf42059182c1d04af0d5577131d4ce70b79105231c4cc0a60de77b14f2e
}

Things to note:

  • Clean syntax
  • Variables and interpolation with $
  • Multi-line commands with {}
  • Quoted strings only when the string contains white space
  • White space separated command arguments, no need to deal with commas (trailing or otherwise)

Not shown is command substitution with []. This might be usefully the future for deriving version numbers like 2.6 from 2.6.3 amongst other things. I also still need to add comments. And that’s the whole language.

The parser is written in a zero-copy fashion with nom and can parse a file like the one above 100,000 times in a second on my i7-6700K desktop. Things slow down a bit when actually interpreting the file, performing variable substitution and building Rust struct instances like this from the definition:

pub struct Package {
    pkgname: String,
    version: String,
    revision: u16,
    short_desc: String,
    homepage: String,
    license: String,
    maintainer: String,
    build_style: Box<dyn Strategy>,
    distfiles: Vec<DistFile>,
    hostmakedepends: Vec<String>,
    makedepends: Vec<String>,
    testdepends: Vec<String>,
    depends: Vec<String>,
}

I can currently parse and interpret a package file like that into the above struct, with validation about 10,000 times in a second, including reading it from disk each time (although it would be in the file system cache after the first time). I'd actually like this to be a bit faster. Repology show that 20k+ packages is not unreasonable and I'd like to be able to handle numbers like that in less than a second. There's bound to be some optimisation possible, as well as traversing and interpreting the tree in parallel. That's a future problem though.

So now that I can define packages the next step is building them. I had some goals in mind for this too, which I'll cover in the next post.

It was bound to happen eventually. After years of jotting down thoughts on building my own OS I’ve started tinkering with the idea. Not a from scratch OS, a Linux distribution, cobbled together from existing tools along with a bit of custom tooling. The current working title (inspired by Untitled Goose Game) is: Untitled Linux Distribution.

Ultimately I’d like to use Redox OS as my primary OS. It’s mostly written in Rust and permissively licensed (MIT) — two things that I value. Building an entire OS from scratch is a bunch of work though. Until such time as Redox is ready for use as a daily OS I’m exploring the idea of constructing a Linux distribution that can share some of the same ideals:

  • Use memory safe tools with a preference for Rust where possible.
  • BSD/similarly permissive licensed software in the base where possible.
  • Stable base + rolling packages like FreeBSD.

The initial target is my own desktop computing needs (where I previously used FreeBSD and currently use Arch Linux).

This is a pragmatic system. At the very least the Linux kernel is GPL and millions of lines of C. The goal is to get something working and then iterate. The reason for using Linux is that it has the best performance and hardware support for desktop use as far as open source operating systems go.

My currently untested grab bag of components for the base system is:

This isn't everything that the distro will need, just thing things I've researched so far.

I’m currently deep in yak shaving territory writing a ports tree and build tool in Rust that will feed into xbps. More on that in the next post. Development will happen on GitHub, I want contributions to be easy and have a low barrier to entry. Nothing is public yet. I'll open things up when I'm a bit further along.