11 July 2017

Publishing your documentation


Documenting your application is paramount. You cannot expect many people will use your application if they don't know how.

There are many ways to document your application. You could write a text file to be included among your sources, but that is its very problem: it must be found among many other files so they can be easily overlooked. Besides, it has to be simply formatted unless you want your users to have a specific reader: acrobat, word, etc.

Another way, and the one we're going to cover here, is using a markup language to write a textual file to be used as source by a builder to create a richly formated web based documentation. Here there're many markup options but the two main ones are Markdown and ReStructuredTex (RST): the later is more extensible and the de facto standard in Python world, but the former is simpler, easy to use and the default option in Github.

To host generated web documentation you could use any web server but there are many free services out there for open source projects. Here we are going to cover Github wikis and ReadTheDocs service.

Github wikis


Easiest of all is Github that allows documentation hosting throught its wiki pages. Github wikis allows you to use both markdown and RST, editing page directly on Github through a browser or downloading wiki pages as a git repository, apart from the one of your source code. Just keep in mind that although you can make branches, in your wiki git repository, only
pushes to master branch will be rendered and displayed on Github. URL to clone your wiki repository can be found in Github wiki tab. Besides, be aware that if you push your pages files you are supossed to give page files a standard extension: .md for Markdown or .rst for reStructuredText.

Besides documentation editing can be perfomed collaboratively asigning permissions to different users.

To create a new wiki in your Github repository just click on wiki tab inside your project page and add a new page. You can include images already included in any folder inside your repository, add linkssidebars and even footers.

The page called Home is going to be your default landing page. I haven't found a way to order pages in default toolbar so, to keep a kind of index, I use to include in my home page a table of contents with links to pages. Of course you could create a custom sidebar too with correct order pages but it will be under pages sidebar which I don't know yet how to hide it.

ReadTheDocs


Github wikis are easy to use and possibly powerful enough for most projects, but sometimes you need a nicer formatting or a documentation site not related to your code repository on Github, there is where ReadTheDocs comes into play. ReadTheDocs can be connected throught webhooks to Github repository, so any pushes on that repository is going to fire documentation build process automatically. Besides it can keep multiple versions of your documentation, which is useful if your
application usage differs depending on the version the user is using, and it can build a PDF or ePUB files of your documentation to make it available to be downloaded.

Once you register in ReadTheDocs you should go to Settings --> Connected Services to link ReadTheDocs account to the one you use in Github or Bitbucket. After that you should go to My Projects --> Import a Project to create the connection with the repository where your documentation is. You are going to be asked about which name to put for documentation project. That name is going to be used to assign a public URL to your project.

Nevertheless, be aware that you cannot change your project name later (you only could remove project and recreate later with another name). Why you would want to change your project name? maybe you realize another name better or any of it's character happens to be troublesome, for instance I created a project called "vdist-" (note hyphen), after some time I realized that generated url (http://vdist-.readthedocs.io/en/latest/) didn't rendered at all in some browsers and from some locations (my PC chrome browser got page right but my smartphone chrome gave me a DNS error). I recreated project using "vdistdocs" as project name and problem was solved.

I don't know what happens in Bitbucket, but in Github you must allow webhooks updates to ReadTheDocs through your Settings --> Webhooks Github page.

When your project appears in your project pages you can click on them to enter its settings. There you can set type of markup used in your documentation. Markdown is specifically identified and RST is any of the Sphink-like options. In Advanced Settings you can set to create a PDF or ePUB version of your documentation with each build and you can set if your project should be publicly available. Apart from that, ReadTheDocs has a lot of black magic features to be configured through its settings, but I think the simplest configuration you would need includes the parameters I've explained.

For me, the hardest part to configure ReadTheDocs was to figure out which document file structure was needed to let ReadTheDocs render them right. There are two possible setups depending whether your documentation is Markdown or RST based.

For Markdown documentation you should include a mkdocs.yml file in your repository. That file contains the very basic configuration needed by ReadTheDocs to render our documentation. An example of this file could be the one from vdist project:

site_name: vdist
theme: readthedocs
pages:
- [index.md, Home]
- [whatisvdist.md, What is vdist]
- [usecases.md, Use cases]
- [howtoinstall.md, How to install]
- [howtouse.md, How to use]
- [howtocustomize.md, How to customize]
- [buildenvironment.md, Optimizing your build environment]
- [qanda.md, Questions and Answers]
- [howtocontribute.md, How to contribute]
- [releasenotes.md, Release notes]
- [roadmap.md, Development roadmap]


As you can see, mkdocs.yml is simple. It only contains your project name, to be displayed as the title of your documentation site, visual theme to apply to rendered documentation and a table of content linking your markdown pages with each section. That table of contents is the one that will appear as a left sidebar when documentation is rendered.

Using mkdocs file as a guideline, ReadTheDocs will search for a doc o docs folder inside your repository, rendering its contents, if none of those folders is found ReadTheDocs will search in top level forlder of your project.

For RST documentation, process is similar but what it is used as a guideline is a conf.py file generated with Sphinx. Here is a nice tutorial about how to setup sphinx to create files to be imported in ReadTheDocs.

The only problem I've found about ReadTheDocs is not exactly related to them but with their compatibility with RST documentation generated for Github wikis. Problem is that Github wikis expects your RST links to be formatted diferently as ReadTheDocs does. There is a good stackoverflow discussion about the topic. You should aware of it to structure in advance your documentation with the workaround explained there if you want to publish on both Github and ReadTheDocs.

14 June 2017

Packaging Python programs - DEB (and RPM) packages

Native way to distribute Python code is through PyPI, as I explained in a previous article. But to be honest PyPI has some drawbacks that limit its use to developing environments.

Problem with PyPI is that it does not implement a proper package management so pip uninstall doesn't work properly, and there's no way to rollback to a previous state. Besides pip installation procedures often build packages from source which can be painfully slow for a complete virtualenv.

Deploying Python application on production environments should be fast and should be done in a manner that clean uninstalling would be always available.

Debian has a really solid and stable package managing system. It can check dependencies either in installation and unistallation, and run pre and post installation scripts. That's why is one of the most used package managing system in Linux ecosystem.

This article is going to cover some ways to package our Python applications in Debian packages easily installable in Debian/Ubuntu Linux distros.

Debian package format

Although other linux packaging systems rely on binary formats, Debian packages use a simply ar archive to group in a single file a pair of tar archives, compressed with gzip or bzip. One of those two archives contains a folder tree with all application files and the another one the package configuration files.

Package configuration files are bare text files so the classic way of creating Debian packages involves only a bit of folder creation, textual editing of some configuration files and in it's simplest form just a command to be run:

dante@Camelot:~/project-directory$ debuild -us -uc


You can fin a good tutorial about the topic here.

The thing is not complex but can be tricky and you have to create and edit many textual files. It's not hard to see why developers have created so many tools to automate the task. We are going to see some of them specialised in packaging Python applications.

STDEB

If you are already used to python packaging for PyPI it shouldn't be hard for you to grasp stdeb concepts. If you don't know what I'm speaking about then you should read article I linked at the beggining of this one.

As stdeb call some Debian/Ubuntu tools under the hood you have to use one of those distros or any of their derivatives. If you are in a Debian/Ubuntu, you can download stdeb from standard linux repositories:

dante@Camelot:~$ sudo aptitude search stdeb
p   python-stdeb    - Python to Debian source package conversion utility                                 p   python3-stdeb   - Python to Debian source package conversion plugins for distutils                  
dante@Camelot:~$ sudo aptitude install python-stdeb python3-stdeb

But if you want to get a newer version you can also install stdeb from PyPI as a Python package.

The best workflow to use stdeb involves creating the necessary files to create a PyPI package, that way stdeb can use your setup.py file to get the info for creating debian package configuration files.

So, let's suppose you have your application ready to be packaged for PyPI with your setup.py already done. For this example, I've cloned this git repository.

To make stdeb generate a source debian package you just have to do:

dante@Camelot:~/geolocate$ python3 setup.py --command-packages=stdeb.command sdist_dsc

The --command-packages stuff can be cumbersome but the thing doesn't work without it, so rely on me and include it.

After some verbosy output you'll realize some folders are added to your working directory. You have to focus on the one called deb_dist. That folder contains mainly three files: a .dsc file, a .orig.tar.gz one and a .diff.gz. Those three files together are what we call a debian source package.

Inside deb_dist folder it you'll find another one called as your project with current version appended. That folder contains all generated data to compile a binary debian package, so move there and run next command:

dante@Camelot:~/geolocate/deb_dist/glocate-1.3-0$ dpkg-buildpackage -rfakeroot -uc -us

The fakeroot thing is just a flag to be able to build a debian package withount being logged as root user. That is the command recommended in stdeb documentation, but I've realized that you can use debuild command seen before:

dante@Camelot:~/geolocate/deb_dist/glocate-1.3-0$ debuild -uc -us

In the end debuild is a wrapper for dpkg-buildpackage that automates some things you should do manually otherwise.

After either of those commands you should find a source debian package in your deb_dist folder:

dante@Camelot:~/geolocate/deb_dist/glocate-1.3-0$ ls *.deb python3-glocate_1.3.0-1_all.deb

Those are the "two step" method, you can reduce it to "just one step" method running next command:

dante@Camelot:~/geolocate$ python3 setup.py --command-packages=stdeb.command bdist_deb

Although both methods ends with a .deb package in debian_dist folder, you should be aware that generated debian package is architecture dependent (althought it package names include a "_all" tag), that means that if you generate the package in an amd64 Ubuntu box chances are that you will face problems if you try to install resulting package in another Ubuntu box with a different architecture. To avoid this problem you have two options:
  • Use virtual machines to compile a debian package in every target debian architecture.
  • Use a PPA repository: Ubuntu offers personal hosting space for packaging project. You just upload there your source package (the content of deb_dist folder after running "python3 setup.py --command-packages=stdeb.command sdist_dsc"), and Ubuntu server compile it to each Ubuntu target architecture (mainly x86 and amd64). After compilation, created packages are available in your personal PPA until you remove them or replace them with newer versions.
If your Python applications just use built-in packages then your packaging trip ends here but if you use additional packages, for instance downloaded from PyPI, chances are that your compiled package doesn't include them properly.

Following our example, if you check geolocate's setup.py you should see that it depends of these additional packages:

install_requires=["geoip2>=2.1.0", "maxminddb>=1.1.1", "requests>=2.5.0", "wget>=2.2"]

Lets see if these dependencies have been included in generated debian package metadata:

dante@Camelot:~/geolocate/deb_dist$ dpkg -I python3-glocate_1.3.0-1_all.deb [...] Depends: python3, python3-requests, python3:any (>= 3.3.2-2~) [...]


Obviously they have not been included. In fact our compiling command warned us that dependency check failed at building time. If we check building output we'll find this output:
[...]
I: dh_python3 pydist:184: Cannot find package that provides geoip2. Please add package that provides it to Build-Depends or add "geoip2 python3-geoip2-fixme"
line to debian/py3dist-overrides or add proper  dependency to Depends by hand and ignore this info.
I: dh_python3 pydist:184: Cannot find package that provides maxminddb. Please add package that provides it to Build-Depends or add "maxminddb python3-maxmindd
b-fixme" line to debian/py3dist-overrides or add proper  dependency to Depends by hand and ignore this info.
I: dh_python3 pydist:184: Cannot find package that provides wget. Please add package that provides it to Build-Depends or add "wget python3-wget-fixme" line t
o debian/py3dist-overrides or add proper  dependency to Depends by hand and ignore this info.
[...]
Problem is that stdeb didn't identify which linux packages include those PyPI packages. So we have to set them manually.

To manually configure stdeb you have to create a file called stdeb.cfg in the same folder than setup.py. You usually will create a [DEFAULT] section where you'll put your configuration but you can create too [package_name] sections, where package_name is specified as the name argument to the setup() command.

For instance, if we find out that geoip2 PyPI library is included inside python3-geoip Ubuntu repository's package, and requests PyPI library is included inside python3-requests we could create a stdeb.cfg with this content:
[DEFAULT]
X-Python3-Version: >= 3.4
Depends3: python3-geoip (>=1.3.1), python3-requests
All tags follow the same format as they would have if they were inserted in debian/control. For depends tag, format specification is here.

With that configuration, the given dependencies are included in generated debian package:

dante@Camelot:~/geolocate/deb_dist$ dpkg -I python3-glocate_1.3.0-1_all.deb
[...]
Depends: python3, python3-requests, python3:any (>= 3.3.2-2~), python3-geoip (>= 1.3.1)
[...]

What I haven't found out yet is a way to change the architecture tag of generated package so it is no longer generated with "Architecture: all".

At first glance, stdeb looks great and indeed it is but actually it has too some serious drawbacks.

Problem is that stdeb limits you to use only libraries and python packages from your standard linux repository. If you develop your application using PyPI libraries, chances are that when you try to find which linux package includes your PyPI library you'll find that those packages contain only older versions than those you downloaded from PyPI. Worse even, many PyPI libraries have not been ported to standard linux repositories so you won't find any package to match your dependency. For instance, geolocate needs to use geoip2 (v.2.1.0) which is easily downloadable from PyPI but only Ubuntu 15.04 has an available package called python3-geoip, but this one comes with version 1.3.2 of geoip. Will geolocate work with geoip version provided by python3-geoip package? probably won't. Other geolocate dependencies are even missing in standard repository, like PyPI wget python library.
 
It's clear that if you like to use PyPI libraries stdeb may not be your best option. But if you develop using only libraries available through your standard package manager then stdeb will probably save you the day.

FPM

FPM is a Ruby tool similar to stdeb. It's main advantage is that you can use FPM to create many kinds of packages installer, not only Debian ones, currently RPM packages (for Red Hat distros). And what is even more interesting FPM allows package conversions, for example from RPM to DEB.

FPM has no package in Ubuntu main repository, so you have to download it from Ruby's equivalent of PyPI. To get that you should install first Ruby packages:


dante@Camelot:~/geolocate$ sudo aptitude install ruby-dev gcc make

Afterwards you can install FPM from Ruby repositories:


dante@Camelot:~/geolocate$ gem install fpm

Creating a DEB package is pretty simple, just set FPM source as python, its target as deb and give it your package setup.py file path:


dante@Camelot:~/geolocate$ fpm -s python -t deb ./setup.py

FPM takes dependency names from setup.py and prepend them with the tag you set with --python-package-name-prefix flag (if not set then python prefix is used):


dante@Camelot:~/geolocate$ dpkg -I python-glocate_1.3.0_all.deb
[...]
Depends: python-geoip2 (>= 2.1.0), python-maxminddb (>= 1.1.1), python-requests (>= 2.5.0), python-wget (>= 2.2)
[...]

Problem here is similar than in stdeb: those dependencies doesn't exists in standard Ubuntu repository. In case dependencies would exists but with different names than those autogenerated by FPM, then you could set them manually:


dante@Camelot:~/geolocate$ fpm -s python -t deb --no-auto-depends -d "python3-geoip>=1.3.1, python3-wget" ./setup.py
[...]>
dante@Camelot:~/geolocate$ dpkg -I python-glocate_1.3.0_all.deb 
[...]
Depends: python3-geoip>=1.3.1, python3-wget
[...] 


Another useful flag is "-a native". This flag sets package architecture to the one of your system, so it is not set any longer to "_all" in your package name

FPM is a great tool. It allows to create RPM packages and it is very configurable but in my opinion it has one serious drawbacks: as happened with stdeb, it is useless if your applications imports a library available in PyPI but not in standard operating system repositories.

DH-VIRTUALENV

Up to this point it should be clear the main problem to package a Python a application is ensuring to meet its dependencies at installation, because developer may have used PyPI libraries not available through Linux standard repositories at user end.

The guys at Spotify developed a packager to address this problem: dh-virtualenv.

This assumes that if you are using PyPI to develop then you are likely using virtualenvs. So dh-virtualenv includes the entire virtualenv into the package so you have not to install them in user end.

Nevertheless, in my humble opinion dh-virtualenv has a serious drawback: it is not as cleaner to use as stdeb or fpm (because you have to create manually a debian folder and a rules file) and you end using debuild as in the beginning of the article.

VDIST

Main concepts of vdist are similar to those seen in dh-virtualenv but vdist uses a combination of docker and fpm to create operating system standard package. This tool lets you build linux packages from your Python applications while aiming to build an isolated environment for your Python project using virtualenv. At first glance vdist may looks complex but its documentation it really clear and helpful, and actually is quite simple to use and automate.

If your main problem while packaging python application is to ensure dependencies are present at user end, vdist solves this making your application self contained and self sufficient so it does not depend on OS provided packages of Python modules. This means that packages generated by vdist contain your application, all python dependencies needed by your application, and a Python interpreter. That python interpreter allows to run your application with the interpreter of your choice not with the one shipped with the OS you're deploying on.

To ensure the host used to build the package keeps its system packages intact, vdist uses docker to create a clean OS image at build time and install there needed dependencies before your application is being packaged on top of it. Thanks to this your build machines will always be reverted to it's original state. To load your application into docker image, vdist downloads application source code from a git repository, so having your application in Bitbucket or Github is a good idea. Downloaded source code is placed in a virtualenv created inside your docker image. Pypy dependencies will be installed inside virtualenv

Main dependency for using vdist is having docker installed and its daemon running. To install it in Ubuntu you just need to do the following:


dante@Camelot:~/geolocate$ sudo aptitude install docker.io python-docker python3-docker


After installing docker, remember to add your user to docker group:


dante@Camelot:~/geolocate$ sudo usermod -a -G docker dante


You may need to restart your system to be sure the group is really updated.

Easiest way to install vdist is to install it using your standard package tools. Vdist packages are hosted at Bintray, so to install them from there you should include Bintray in your system repositories before anything else. To do it in Ubuntu just type:


dante@Camelot:~$ sudo apt-get update
dante@Camelot:~$ sudo apt-get install apt-transport-https
dante@Camelot:~$ sudo echo "deb [trusted=yes] https://dl.bintray.com/dante-signal31/deb generic main" | tee -a /etc/apt/sources.list
dante@Camelot:~$ sudo apt-key adv --keyserver pgp.mit.edu --recv-keys 379CE192D401AB61

Once added bintray in your repositories you can install and update vdist like any other system package. For instance, in Ubuntu:


dante@Camelot:~$ sudo apt-get update
dante@Camelot:~$ sudo apt-get install vdist


If you are in a system where you don't have permission to install system packages you may find interesting installing it from PyPI repository inside a virtualenv created ad-hoc for packaging your application:


(env) dante@Camelot:~/geolocate$ pip install vdist


After installing vdist you are provided with a console command called... vdist. Just be aware that if you have installed vdist inside a virtualenv, that console command only will be available inside that virtualenv.

You have many ways to use vdist, I think the easiest way to use it is creating a configuration file and making vdist read it. Vdist is used to package itself so its configuration file is a good example:

[DEFAULT]
app = vdist
version = 1.1.0
source_git = https://github.com/dante-signal31/${app}, master
fpm_args = --maintainer dante.signal31@gmail.com -a native --url
    https://github.com/dante-signal31/${app} --description
    "vdist (Virtualenv Distribute) is a tool that lets you build OS packages
     from your Python applications, while aiming to build an
     isolated environment for your Python project by utilizing virtualenv. This
     means that your application will not depend on OS provided packages of
     Python modules, including their versions."
    --license MIT --category net
requirements_path = /REQUIREMENTS.txt
compile_python = True
python_version = 3.5.3
output_folder = ./package_dist/
after_install = packaging/postinst.sh
after_remove = packaging/postuninst.sh

[Ubuntu-package]
profile = ubuntu-trusty
runtime_deps = libssl1.0.0, docker.io
build_deps =

[Centos7-package]
profile = centos7
runtime_deps = openssl, docker-ce

Vdist documentation is good enough to know what each parameter is useful for. Just note that you can have just in one configuration file parameters for every package you want to build . Just keep common parameters in [DEFAULT] section and put distribution dependent parameters in separate sections (they can be called as you want but you'd better use expressive names).

Once you have your configuration file you can launch vdist like this (guess configuration file is called configuration_file):


dante@Camelot:~/$ vdist batch configuration_file

Then you'll start to see a bunch of screen output while vdists builds your packages. Generated packages will be placed in folder set in output_folder configuration file parameter.

The smallest package size produced by vdist is about 50 MB because an entire python distribucion has to be included in that package. That size is what you pay for your application self-containment. Discounted those 50 MB, all the rest is due your application and it dependencies. At first glance it may seem big but nowadays is quite usual size for any compiled application you may find out there.

I think vdist is the most complete packaging solution available to deploy python apps in Linux boxes. With it you can deploy even in linux boxes with no python installed at all, giving you a valuable isolation in client end and easying your final user life to install your app.

Disclaimer: I started using vdist to write this article and I've ended being the current main contributor to its development, so feel free to comment any further improvement you feel could be interesting.