Defining a Ports Tree

August 29, 2019

Having decided on a set of software that I’d like to compose into a Linux distribution I need a way to repeatedly build and package that software. This is where the ports tree comes in. The ports tree defines recipes for how software is built. Typically these are defined using Makefiles (BSDs) or scripting languages: bash (Arch and Void Linux), Tcl (MacPorts), Python (Gentoo), Ruby (Homebrew), etc.

The issue with general purpose scripting languages is that they can do anything they want including access and change your system, and the network. This is fine if the ports are from trusted sources but if they aren’t like in the Arch User Repository (AUR) then extracting metadata requires executing arbitrary user submitted code, which is not good. The AUR solution is to have the user evaluate the script and supply an inert .SRCINFO file alongside the PKGBUILD. Alternatively you could run the script in a container or sandbox.

Another issue with general purpose scripting languages is they can be slow. For example, both Portage from Gentoo (Python) and MacPorts feature multiple second “calculating...” delays for some operations. I'd like something that is fast, even on modest hardware.

Something the scripting languages do have going for them is things like variable interpolation and a nice readable package definition. With these things in mind I set about trying to find a configuration language with these attributes:

The readability of bash based solutions
Fast
Variable interpolation
Configuration language, not general purpose language
Sandboxed/non-Turing complete

I explored lots of possibilities. Most were eliminated because they didn't support interpolation. I initially landed on Dhall and proceeded with defining a couple of packages with it. The main issue was that idiomatic formatting was really unpleasant to read. For example:

let default = ../../lib/defaults/Package.dhall

let register = ../../lib/register.dhall

let BuildStyle = ../../lib/types/BuildStyle.dhall

let pkgname = "ripgrep"

let version = "11.0.2"

in  [ register
        "ripgrep"
        (   default
          ⫽ { pkgname =
                pkgname
            , version =
                version
            , revision =
                1
            , build-style =
                BuildStyle.Cargo
            , hostmakedepends =
                [ "asciidoc" ]
            , short-desc =
                "Fast search tool inspired by ag and grep"
            , maintainer =
                "Wesley Moore <wes@wezm.net>"
            , license =
                "Public Domain OR MIT"
            , homepage =
                "https://github.com/BurntSushi/ripgrep/"
            , distfiles =
                [ { url =
                      "https://github.com/BurntSushi/${pkgname}/archive/${version}.tar.gz"
                  , checksum =
                      "0983861279936ada8bc7a6d5d663d590ad34eb44a44c75c2d6ccd0ab33490055"
                  }
                ]
            }
        )
    ]

Further research led me to Tcl, which MacPorts also uses. The language is super simple and its command oriented nature leads to very readable code. It’s still a general purpose scripting language though, with all the system access that allows. So at this point I decided to implement my own Tcl inspired configuration language. I’m still unsure what to call it. The crate is currently called tcl but I’ve been using rcl as the file extension. tcl in this context stands for, tiny configuration language, and rcl is Rust command language. I’ll stick with rcl for the rest of the post.

A port defined in rcl looks like this:

set name ruby
set version 2.6.3
set ruby_abiver 2.6.0
set subdir 2.6

pkgname $name
version $version
revision 2
build-style gnu-configure {
    configure-args --enable-shared --disable-rpath DOXYGEN=/usr/bin/doxygen DOT=/usr/bin/dot PKG_CONFIG=/usr/bin/pkg-config
    make-build-args all capi
}
hostmakedepends pkg-config bison groff
makedepends {
  zlib-devel
  readline-devel
  libffi-devel
  libressl-devel
  gdbm-devel
  libyaml-devel
  pango-devel
}
testdepends tzdata
short-desc "Ruby programming language"
homepage http://www.ruby-lang.org/en/
maintainer "Wesley Moore <wes@wezm.net>"
license "Ruby BSD-2-Clause"
distfile https://cache.ruby-lang.org/pub/ruby/$subdir/$name-$version.tar.bz2 {
  checksum dd638bf42059182c1d04af0d5577131d4ce70b79105231c4cc0a60de77b14f2e
}

Things to note:

Clean syntax
Variables and interpolation with $
Multi-line commands with {}
Quoted strings only when the string contains white space
White space separated command arguments, no need to deal with commas (trailing or otherwise)

Not shown is command substitution with []. This might be usefully the future for deriving version numbers like 2.6 from 2.6.3 amongst other things. I also still need to add comments. And that’s the whole language.

The parser is written in a zero-copy fashion with nom and can parse a file like the one above 100,000 times in a second on my i7-6700K desktop. Things slow down a bit when actually interpreting the file, performing variable substitution and building Rust struct instances like this from the definition:

pub struct Package {
    pkgname: String,
    version: String,
    revision: u16,
    short_desc: String,
    homepage: String,
    license: String,
    maintainer: String,
    build_style: Box<dyn Strategy>,
    distfiles: Vec<DistFile>,
    hostmakedepends: Vec<String>,
    makedepends: Vec<String>,
    testdepends: Vec<String>,
    depends: Vec<String>,
}

I can currently parse and interpret a package file like that into the above struct, with validation about 10,000 times in a second, including reading it from disk each time (although it would be in the file system cache after the first time). I'd actually like this to be a bit faster. Repology show that 20k+ packages is not unreasonable and I'd like to be able to handle numbers like that in less than a second. There's bound to be some optimisation possible, as well as traversing and interpreting the tree in parallel. That's a future problem though.

So now that I can define packages the next step is building them. I had some goals in mind for this too, which I'll cover in the next post.