14 June 2017

Packaging Python programs - DEB (and RPM) packages

Native way to distribute Python code is through PyPI, as I explained in a previous article. But to be honest PyPI has some drawbacks that limit its use to developing environments.

Problem with PyPI is that it does not implement a proper package management so pip uninstall doesn't work properly, and there's no way to rollback to a previous state. Besides pip installation procedures often build packages from source which can be painfully slow for a complete virtualenv.

Deploying Python application on production environments should be fast and should be done in a manner that clean uninstalling would be always available.

Debian has a really solid and stable package managing system. It can check dependencies either in installation and unistallation, and run pre and post installation scripts. That's why is one of the most used package managing system in Linux ecosystem.

This article is going to cover some ways to package our Python applications in Debian packages easily installable in Debian/Ubuntu Linux distros.

Debian package format

Although other linux packaging systems rely on binary formats, Debian packages use a simply ar archive to group in a single file a pair of tar archives, compressed with gzip or bzip. One of those two archives contains a folder tree with all application files and the another one the package configuration files.

Package configuration files are bare text files so the classic way of creating Debian packages involves only a bit of folder creation, textual editing of some configuration files and in it's simplest form just a command to be run:

dante@Camelot:~/project-directory$ debuild -us -uc


You can fin a good tutorial about the topic here.

The thing is not complex but can be tricky and you have to create and edit many textual files. It's not hard to see why developers have created so many tools to automate the task. We are going to see some of them specialised in packaging Python applications.

STDEB

If you are already used to python packaging for PyPI it shouldn't be hard for you to grasp stdeb concepts. If you don't know what I'm speaking about then you should read article I linked at the beggining of this one.

As stdeb call some Debian/Ubuntu tools under the hood you have to use one of those distros or any of their derivatives. If you are in a Debian/Ubuntu, you can download stdeb from standard linux repositories:

dante@Camelot:~$ sudo aptitude search stdeb
p   python-stdeb    - Python to Debian source package conversion utility                                 p   python3-stdeb   - Python to Debian source package conversion plugins for distutils                  
dante@Camelot:~$ sudo aptitude install python-stdeb python3-stdeb

But if you want to get a newer version you can also install stdeb from PyPI as a Python package.

The best workflow to use stdeb involves creating the necessary files to create a PyPI package, that way stdeb can use your setup.py file to get the info for creating debian package configuration files.

So, let's suppose you have your application ready to be packaged for PyPI with your setup.py already done. For this example, I've cloned this git repository.

To make stdeb generate a source debian package you just have to do:

dante@Camelot:~/geolocate$ python3 setup.py --command-packages=stdeb.command sdist_dsc

The --command-packages stuff can be cumbersome but the thing doesn't work without it, so rely on me and include it.

After some verbosy output you'll realize some folders are added to your working directory. You have to focus on the one called deb_dist. That folder contains mainly three files: a .dsc file, a .orig.tar.gz one and a .diff.gz. Those three files together are what we call a debian source package.

Inside deb_dist folder it you'll find another one called as your project with current version appended. That folder contains all generated data to compile a binary debian package, so move there and run next command:

dante@Camelot:~/geolocate/deb_dist/glocate-1.3-0$ dpkg-buildpackage -rfakeroot -uc -us

The fakeroot thing is just a flag to be able to build a debian package withount being logged as root user. That is the command recommended in stdeb documentation, but I've realized that you can use debuild command seen before:

dante@Camelot:~/geolocate/deb_dist/glocate-1.3-0$ debuild -uc -us

In the end debuild is a wrapper for dpkg-buildpackage that automates some things you should do manually otherwise.

After either of those commands you should find a source debian package in your deb_dist folder:

dante@Camelot:~/geolocate/deb_dist/glocate-1.3-0$ ls *.deb python3-glocate_1.3.0-1_all.deb

Those are the "two step" method, you can reduce it to "just one step" method running next command:

dante@Camelot:~/geolocate$ python3 setup.py --command-packages=stdeb.command bdist_deb

Although both methods ends with a .deb package in debian_dist folder, you should be aware that generated debian package is architecture dependent (althought it package names include a "_all" tag), that means that if you generate the package in an amd64 Ubuntu box chances are that you will face problems if you try to install resulting package in another Ubuntu box with a different architecture. To avoid this problem you have two options:
  • Use virtual machines to compile a debian package in every target debian architecture.
  • Use a PPA repository: Ubuntu offers personal hosting space for packaging project. You just upload there your source package (the content of deb_dist folder after running "python3 setup.py --command-packages=stdeb.command sdist_dsc"), and Ubuntu server compile it to each Ubuntu target architecture (mainly x86 and amd64). After compilation, created packages are available in your personal PPA until you remove them or replace them with newer versions.
If your Python applications just use built-in packages then your packaging trip ends here but if you use additional packages, for instance downloaded from PyPI, chances are that your compiled package doesn't include them properly.

Following our example, if you check geolocate's setup.py you should see that it depends of these additional packages:

install_requires=["geoip2>=2.1.0", "maxminddb>=1.1.1", "requests>=2.5.0", "wget>=2.2"]

Lets see if these dependencies have been included in generated debian package metadata:

dante@Camelot:~/geolocate/deb_dist$ dpkg -I python3-glocate_1.3.0-1_all.deb [...] Depends: python3, python3-requests, python3:any (>= 3.3.2-2~) [...]


Obviously they have not been included. In fact our compiling command warned us that dependency check failed at building time. If we check building output we'll find this output:
[...]
I: dh_python3 pydist:184: Cannot find package that provides geoip2. Please add package that provides it to Build-Depends or add "geoip2 python3-geoip2-fixme"
line to debian/py3dist-overrides or add proper  dependency to Depends by hand and ignore this info.
I: dh_python3 pydist:184: Cannot find package that provides maxminddb. Please add package that provides it to Build-Depends or add "maxminddb python3-maxmindd
b-fixme" line to debian/py3dist-overrides or add proper  dependency to Depends by hand and ignore this info.
I: dh_python3 pydist:184: Cannot find package that provides wget. Please add package that provides it to Build-Depends or add "wget python3-wget-fixme" line t
o debian/py3dist-overrides or add proper  dependency to Depends by hand and ignore this info.
[...]
Problem is that stdeb didn't identify which linux packages include those PyPI packages. So we have to set them manually.

To manually configure stdeb you have to create a file called stdeb.cfg in the same folder than setup.py. You usually will create a [DEFAULT] section where you'll put your configuration but you can create too [package_name] sections, where package_name is specified as the name argument to the setup() command.

For instance, if we find out that geoip2 PyPI library is included inside python3-geoip Ubuntu repository's package, and requests PyPI library is included inside python3-requests we could create a stdeb.cfg with this content:
[DEFAULT]
X-Python3-Version: >= 3.4
Depends3: python3-geoip (>=1.3.1), python3-requests
All tags follow the same format as they would have if they were inserted in debian/control. For depends tag, format specification is here.

With that configuration, the given dependencies are included in generated debian package:

dante@Camelot:~/geolocate/deb_dist$ dpkg -I python3-glocate_1.3.0-1_all.deb
[...]
Depends: python3, python3-requests, python3:any (>= 3.3.2-2~), python3-geoip (>= 1.3.1)
[...]

What I haven't found out yet is a way to change the architecture tag of generated package so it is no longer generated with "Architecture: all".

At first glance, stdeb looks great and indeed it is but actually it has too some serious drawbacks.

Problem is that stdeb limits you to use only libraries and python packages from your standard linux repository. If you develop your application using PyPI libraries, chances are that when you try to find which linux package includes your PyPI library you'll find that those packages contain only older versions than those you downloaded from PyPI. Worse even, many PyPI libraries have not been ported to standard linux repositories so you won't find any package to match your dependency. For instance, geolocate needs to use geoip2 (v.2.1.0) which is easily downloadable from PyPI but only Ubuntu 15.04 has an available package called python3-geoip, but this one comes with version 1.3.2 of geoip. Will geolocate work with geoip version provided by python3-geoip package? probably won't. Other geolocate dependencies are even missing in standard repository, like PyPI wget python library.
 
It's clear that if you like to use PyPI libraries stdeb may not be your best option. But if you develop using only libraries available through your standard package manager then stdeb will probably save you the day.

FPM

FPM is a Ruby tool similar to stdeb. It's main advantage is that you can use FPM to create many kinds of packages installer, not only Debian ones, currently RPM packages (for Red Hat distros). And what is even more interesting FPM allows package conversions, for example from RPM to DEB.

FPM has no package in Ubuntu main repository, so you have to download it from Ruby's equivalent of PyPI. To get that you should install first Ruby packages:


dante@Camelot:~/geolocate$ sudo aptitude install ruby-dev gcc make

Afterwards you can install FPM from Ruby repositories:


dante@Camelot:~/geolocate$ gem install fpm

Creating a DEB package is pretty simple, just set FPM source as python, its target as deb and give it your package setup.py file path:


dante@Camelot:~/geolocate$ fpm -s python -t deb ./setup.py

FPM takes dependency names from setup.py and prepend them with the tag you set with --python-package-name-prefix flag (if not set then python prefix is used):


dante@Camelot:~/geolocate$ dpkg -I python-glocate_1.3.0_all.deb
[...]
Depends: python-geoip2 (>= 2.1.0), python-maxminddb (>= 1.1.1), python-requests (>= 2.5.0), python-wget (>= 2.2)
[...]

Problem here is similar than in stdeb: those dependencies doesn't exists in standard Ubuntu repository. In case dependencies would exists but with different names than those autogenerated by FPM, then you could set them manually:


dante@Camelot:~/geolocate$ fpm -s python -t deb --no-auto-depends -d "python3-geoip>=1.3.1, python3-wget" ./setup.py
[...]>
dante@Camelot:~/geolocate$ dpkg -I python-glocate_1.3.0_all.deb 
[...]
Depends: python3-geoip>=1.3.1, python3-wget
[...] 


Another useful flag is "-a native". This flag sets package architecture to the one of your system, so it is not set any longer to "_all" in your package name

FPM is a great tool. It allows to create RPM packages and it is very configurable but in my opinion it has one serious drawbacks: as happened with stdeb, it is useless if your applications imports a library available in PyPI but not in standard operating system repositories.

DH-VIRTUALENV

Up to this point it should be clear the main problem to package a Python a application is ensuring to meet its dependencies at installation, because developer may have used PyPI libraries not available through Linux standard repositories at user end.

The guys at Spotify developed a packager to address this problem: dh-virtualenv.

This assumes that if you are using PyPI to develop then you are likely using virtualenvs. So dh-virtualenv includes the entire virtualenv into the package so you have not to install them in user end.

Nevertheless, in my humble opinion dh-virtualenv has a serious drawback: it is not as cleaner to use as stdeb or fpm (because you have to create manually a debian folder and a rules file) and you end using debuild as in the beginning of the article.

VDIST

Main concepts of vdist are similar to those seen in dh-virtualenv but vdist uses a combination of docker and fpm to create operating system standard package. This tool lets you build linux packages from your Python applications while aiming to build an isolated environment for your Python project using virtualenv. At first glance vdist may looks complex but its documentation it really clear and helpful, and actually is quite simple to use and automate.

If your main problem while packaging python application is to ensure dependencies are present at user end, vdist solves this making your application self contained and self sufficient so it does not depend on OS provided packages of Python modules. This means that packages generated by vdist contain your application, all python dependencies needed by your application, and a Python interpreter. That python interpreter allows to run your application with the interpreter of your choice not with the one shipped with the OS you're deploying on.

To ensure the host used to build the package keeps its system packages intact, vdist uses docker to create a clean OS image at build time and install there needed dependencies before your application is being packaged on top of it. Thanks to this your build machines will always be reverted to it's original state. To load your application into docker image, vdist downloads application source code from a git repository, so having your application in Bitbucket or Github is a good idea. Downloaded source code is placed in a virtualenv created inside your docker image. Pypy dependencies will be installed inside virtualenv

Main dependency for using vdist is having docker installed and its daemon running. To install it in Ubuntu you just need to do the following:


dante@Camelot:~/geolocate$ sudo aptitude install docker.io python-docker python3-docker


After installing docker, remember to add your user to docker group:


dante@Camelot:~/geolocate$ sudo usermod -a -G docker dante


You may need to restart your system to be sure the group is really updated.

Easiest way to install vdist is to install it using your standard package tools. Vdist packages are hosted at Bintray, so to install them from there you should include Bintray in your system repositories before anything else. To do it in Ubuntu just type:


dante@Camelot:~$ sudo apt-get update
dante@Camelot:~$ sudo apt-get install apt-transport-https
dante@Camelot:~$ sudo echo "deb [trusted=yes] https://dl.bintray.com/dante-signal31/deb generic main" | tee -a /etc/apt/sources.list
dante@Camelot:~$ sudo apt-key adv --keyserver pgp.mit.edu --recv-keys 379CE192D401AB61

Once added bintray in your repositories you can install and update vdist like any other system package. For instance, in Ubuntu:


dante@Camelot:~$ sudo apt-get update
dante@Camelot:~$ sudo apt-get install vdist


If you are in a system where you don't have permission to install system packages you may find interesting installing it from PyPI repository inside a virtualenv created ad-hoc for packaging your application:


(env) dante@Camelot:~/geolocate$ pip install vdist


After installing vdist you are provided with a console command called... vdist. Just be aware that if you have installed vdist inside a virtualenv, that console command only will be available inside that virtualenv.

You have many ways to use vdist, I think the easiest way to use it is creating a configuration file and making vdist read it. Vdist is used to package itself so its configuration file is a good example:

[DEFAULT]
app = vdist
version = 1.1.0
source_git = https://github.com/dante-signal31/${app}, master
fpm_args = --maintainer dante.signal31@gmail.com -a native --url
    https://github.com/dante-signal31/${app} --description
    "vdist (Virtualenv Distribute) is a tool that lets you build OS packages
     from your Python applications, while aiming to build an
     isolated environment for your Python project by utilizing virtualenv. This
     means that your application will not depend on OS provided packages of
     Python modules, including their versions."
    --license MIT --category net
requirements_path = /REQUIREMENTS.txt
compile_python = True
python_version = 3.5.3
output_folder = ./package_dist/
after_install = packaging/postinst.sh
after_remove = packaging/postuninst.sh

[Ubuntu-package]
profile = ubuntu-trusty
runtime_deps = libssl1.0.0, docker.io
build_deps =

[Centos7-package]
profile = centos7
runtime_deps = openssl, docker-ce

Vdist documentation is good enough to know what each parameter is useful for. Just note that you can have just in one configuration file parameters for every package you want to build . Just keep common parameters in [DEFAULT] section and put distribution dependent parameters in separate sections (they can be called as you want but you'd better use expressive names).

Once you have your configuration file you can launch vdist like this (guess configuration file is called configuration_file):


dante@Camelot:~/$ vdist batch configuration_file

Then you'll start to see a bunch of screen output while vdists builds your packages. Generated packages will be placed in folder set in output_folder configuration file parameter.

The smallest package size produced by vdist is about 50 MB because an entire python distribucion has to be included in that package. That size is what you pay for your application self-containment. Discounted those 50 MB, all the rest is due your application and it dependencies. At first glance it may seem big but nowadays is quite usual size for any compiled application you may find out there.

I think vdist is the most complete packaging solution available to deploy python apps in Linux boxes. With it you can deploy even in linux boxes with no python installed at all, giving you a valuable isolation in client end and easying your final user life to install your app.

Disclaimer: I started using vdist to write this article and I've ended being the current main contributor to its development, so feel free to comment any further improvement you feel could be interesting.