Showing posts with label programming. Show all posts
Showing posts with label programming. Show all posts

17 October 2021

How to write manpages with Markdown and Pandoc


No decent console application is released without a man page documenting how to use it. However, man page inners are rather arcane, but being a 1971 file format it has held up quite well.

Nowadays you have two standard ways to learn how to use a console command (google apart ;-) ): typing application command followed by "--help" to get a quick glance of application usage, or typing "man" followed by application name to get a detailed information about application usage. 

To implement "--help" approach in your application you can manually include "--help" parsing and output or you'd better use an argument parsing library like Python's ArgParse.

The "man" approach needs you to write a man page for your application.

Standard way to produce man pages is using troff formating commands. Linux has it's own troff implementation called groff. You have a feel about how is standard man pages format you can type next source inside a file called, for instance, corrupt.1:

.TH CORRUPT 1 .SH NAME corrupt \- modify files by randomly changing bits .SH SYNOPSIS .B corrupt [\fB\-n\fR \fIBITS\fR] [\fB\-\-bits\fR \fIBITS\fR] .IR file ... .SH DESCRIPTION .B corrupt modifies files by toggling a randomly chosen bit. .SH OPTIONS .TP .BR \-n ", " \-\-bits =\fIBITS\fR Set the number of bits to modify. Default is one bit.

Once saved, that file can be displayed by man. Assuming you are in the same folder than corrupt.1 you can type:

dante@Camelot:~/$ man -l corrupt.1

Output is: 

CORRUPT(1)                                                 General Commands Manual 

NAME
corrupt - modify files by randomly changing bits

SYNOPSIS
corrupt [-n BITS] [--bits BITS] file...

DESCRIPTION
corrupt modifies files by toggling a randomly chosen bit.

OPTIONS
-n, --bits=BITS
Set the number of bits to modify. Default is one bit.

CORRUPT(1)


 

You can opt to write specific man pages for your application following for instance this small cheat sheet

Nevertheless, troff/groff format is rather cumbersome and I feel keeping two sources of usage documentation (your README.md and your man page) is prone to errors and an effort waste. So, I follow a different approach: I write and keep updated my README.md and afterwards I convert it to a man page. Sure you need to keep a quite standard format for your README, and it won't be the fanciest one, but at least you will avoid to type the same things twice, in two different files, with different formatting tags.

The key stone to make that conversion is a tool called Pandoc. That tool is the Swiss army knife of document conversion. You can use it to convert between many documents format like word (docx), openoffice (odt), epub... or markdown (md) and groff man. Pandoc uses to be available at your distribution standard package manager repositories, so at an ubuntu distribution you just need:

dante@Camelot:~/$ sudo apt install pandoc

Once Pandoc is installed, you have to observe some conventions with your README.md to make further conversion easier. To illustrate this article explanations, we will follow as an example the file README.md for my project Cifra.

As you can see, linked README.md contains GitHub badges at the very beginning but those are going to be removed in conversion process we are going to explain later in this article. Everything else is structured to comply with contents a man page is supposed to have.

Maybe, most suspecting line is the first one. It's not usual to find a line like this in GitHub README:

Actually, that line is metadata for man page reader. After "%" contains document title (usually application name), manual section, a version, a "|" separator and finally a header. Manual section is 1 for user commands, 2 for system calls and 3 for C library functions. Your applications will fit in section 1 99% of times. I've not included a version for Cifra in that line, but you could have done. Besides, header indicates what set of documentation this manual page belongs to.

After that line, every section use to be included in man pages, but the only ones that should be included at least are:

  • Name: The name of the command.
  • Synopsis: A one-liner summarizing command line arguments and options.
  • Description: Describes in detail how to use your application command.

Other sections you can add are:

  • Options: Command line options.
  • Examples: About command usage.
  • Files: Useful if your application includes configuration files.
  • Environment: Here you explain if your application uses any environment variable.
  • Bugs: Detected bugs should be reported? There you can link your GitHub issues page.
  • Authors: Who is this masterpiece of code author?
  • See also: References to other man pages.
  • Copyright | License: A good place to include your application license text.

Once you have decided which sections include in manpage, you should write them following a format easily convertible by pandoc in a manpage with an standard structure. That's why main sections are always marked at level one of indentation ("#"). A problem with indentation is that while you can get further subheaders in markdown using "#" character  ("#" for main titles, "##" for subheaders, "###" for sections, and so on), those subheaders are only recognized by Pandoc up to sublevel 2. To get more sublevels I've opted for "pandoc lists", a format you can use in your markdown that is recognized afterwards by pandoc:

 

In lines 42 and 48, you have what pandoc call are "line blocks". Those are lines beginning with a vertical bar ( | ) followed by a space. Those spaces between vertical bar and command will be keep in man page converted text by pandoc.

Everything else in that README.md is good classic markdown format.

Let guess we have written our entire README.md and we want to perform conversion. To do that you can use an script like this from a temporal folder:

 

Lines 3 and 4 clean any file used in a previous conversion, while line 5 actually copies README.md from source folder.

Line 6 removes every GitHub badge you could have in your README.md. 

Line 7 is where we call pandoc to perform conversion.

In that point you could have a file that can be opened with man:

dante@Camelot:~/$ man -l man/cifra.1

But man pages use to be compressed in gzip format, as you can see with your system man pages:

dante@Camelot:~/$ ls /usr/share/man/man1

That's why script compress at line 8 your generated man page. And while generated man page is now a compressed gzip file, it is still readable by man:

dante@Camelot:~/$ man -l man/cifra.1.gz

Although those are some steps, you can see they are easily scriptable both as a local script, as an step of your continuous integration workflow.

As you can see, with these easy steps you only need to keep updated your README.md file because man page can be generated from it.


18 September 2021

How to parse console arguments in your Python application with ArgParse

There is one common pattern for every console application: it has to manage user arguments. Few console applications runs with no user arguments, instead most applications needs user provided arguments to run properly. Think of ping, it needs at least one argument: IP address or URL to be pinged:

dante@Camelot:~/$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=113 time=3.73 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=113 time=3.83 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=113 time=3.92 ms
^C
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2005ms
rtt min/avg/max/mdev = 3.733/3.828/3.920/0.076 ms

dante@Camelot:~$

When you run a python application you get in sys.argv list every argument user entered when calling your application from console. Console command is split using whitespaces as separator a each resulting token is a sys.argv list element. First element of that list is your application name. So, if we implemented our own python version of ping sys.argv 0 index element would be "ping" and 1 index element would be "8.8.8.8".

You could do your own argument parsing just assessing sys.argv content by yourself. Most developers do it that way... at the begining. If you go that way you'll soon realize that it is not so easy to perform a clean management of user argument and that you have to repeat a lot of boilerplate in your applications. so, there are some libraries to easy argument parsing for you. One I like a lot is argparse.

Argparse is a built-in module of standard python distribution so you don't have to download it from Pypi. Once you understand its concepts it's easy, very flexible and it manages for you many edge use cases. It one of those modules you really miss when developing in other languages.

First concept you have to understand to use argparse is Argument Parser concept. For this module, an Argument Parser is a fixed word that is followed by positional or optional user provided arguments. Default Argument Parser is the precisely the application name. In our ping example, "ping" command is an Argument Parser. Every Argument Parser can be followed by Arguments, those can be of two types:

  • Positional arguments: They cannot be avoided. User need to enter them or command is assumed as wrong. They should be entered in an specific order. In our ping example "8.8.8.8" is a positional argument. We can meet many other examples, "cp" command for instance needs to positional arguments source file to copy and destination for copied file.
  • Optional arguments: They can be entered or not. They are marked by tags. Abbreviated tags use an hyphen and a char, long tags use double hyphens and a word. Some optional arguments admit a value and others not (they are boolean, true if they are used or false if not).

Here you can see some of the arguments cat command accepts:

dante@Camelot:~/$ cat --help
Usage: cat [OPTION]... [FILE]...
Concatenate FILE(s) to standard output.

With no FILE, or when FILE is -, read standard input.

-A, --show-all equivalent to -vET
-b, --number-nonblank number nonempty output lines, overrides -n
-e equivalent to -vE
-E, --show-ends display $ at end of each line
-n, --number number all output lines
-s, --squeeze-blank suppress repeated empty output lines
-t equivalent to -vT
-T, --show-tabs display TAB characters as ^I
-u (ignored)
-v, --show-nonprinting use ^ and M- notation, except for LFD and TAB
--help display this help and exit
--version output version information and exit

Examples:
cat f - g Output f's contents, then standard input, then g's contents.
cat Copy standard input to standard output.

GNU coreutils online help: <https://www.gnu.org/software/coreutils/>
Full documentation at: <https://www.gnu.org/software/coreutils/cat>
or available locally via: info '(coreutils) cat invocation'

dante@Camelot:~$


There are more complex commands that include what is called Subparsers. Subparsers appear when your command have other "fixed" words, like verbs. For instance, "docker" command have many subparsers: "docker build", "docker run", "docker ps", etc. Every subparser can accept its own set of arguments. A subparser can accept other subparsers too, so you can get pretty complex command trees.

To show how to use argparse module works, I'm going to use my own configuration for my application Cifra. Take a look to it's cifra_launcher.py file at GitHub.

In its __main__ section you can see how arguments are processed at high level of abstraction:

You can see we are using sys.argv to get arguments provided by user but discarding first one because is the root command itself: "cifra".

I like using sys.argv as a default parameter of main() call because that way I can call main() from my functional tests using an specific list of arguments.

Arguments are passed to parse_arguments, that is a my function to do all argparse magic. There you can find the root configuration for argparse:

There you configure root parser, the one linked to your base command, defining a description of you command and a final note (epilog) for your command. Those texts will appear when your user call your command with --help argument.

My application Cifra has some subparsers, like docker command has. To allow a parser to have subparsers you must do:

Once you have allowed your parser to have subparser you can start to create those subparsers. Cifra has 4 subparsers at this level:

 

 

We'll see that argparse returns a dict-like object after its parsing. If user called "cifra dictionary" then "mode" key from that dict-like object will have a "dictionary" value.

Every subparser can have its own subparser, actually "dictionary" subparser has more subparser adding more branches to the command tree. So you can call "cifra dictionary create", "cifra dictionary update", "cifra dictionary delete", etc.

Once you feel a parser does not need more subparsers you can add its respective arguments. For "cipher" subparser we have these ones:

In these arguments, "algorithm" and "key" are positional so they are compulsory. Arguments placed at that position will populate values for "algorithm" and "key" keys in dict-like object returned by argparse.

Metavar parameter is the string we want to be used to represent this parameter when help is called with --help argument. Help parameter as you can guess is the tooltip used to explain this parameter when --helps argument is used.

Type parameter is used to preprocess argument given by user. By default arguments are interpreted like strings, but if you use type=int argument will be converted to an int (or throw an error if it can't be done). Actually str or int are functions, you can use your own ones. For file_to_cipher parameter I used _check_is_file function:

As you can see, this function check if provided arguments points to a valid path name and if that happens returns provided string or raises an argparse error otherwise.

Remember parser optional parameters are always those preceded by hyphens. If those parameters are followed by an argument provided by users, that argument will be the value for dict-like object returned by arparser key called like optional parameter long name. In this example line 175 means that when user types "--ciphered_file myfile.txt", argparser will return a key called "ciphered_file" with value "myfile.txt".

Sometimes, your optional parameters won't need values. For example, they could be a boolean parameter to do verbose output:

parser.add_argument("--verbose", help="increase output verbosity", action="store_true", default=False)

With action="store_true" the dict-like object will have a "verbose" key with a boolean value: true if --verbose was called or false otherwise.

Before returning from parse_arguments I like to filter returned dict-like object content:


In this section parsed_arguments is the dict-like object that argparse returns after parsing provided user arguments with arg_parser.parse_args(args). This object includes those uncalled parameters with None value. I could leave it that way and check afterwards if values are None before using them, but a I fell somewhat cleaner remove those None'd values keys and just check if those keys are present or not. That's why I filter dict-like object at line 235.

Back to main() function, after parsing provided arguments you can base your further logic depending on parsed arguments dict contents. For example, for cifra:

This way, argparser lets you deal with provided user arguments in a clean way giving you for free help messages and error messages, depending on whether user called --help or entered a wrong or incomplete command. Actually generated help messages are so clean that I use them for my readme.md repository file and as a base point for my man pages (what that is an story for another article).

10 August 2021

How to use JFrog Artifactory to distribute your packages


In a previous article I reviewed some ways to generate a debian or rpm packages for your developed applications. But after packaging your applications you must find an easy way to let your users get your package. You can allways leave then at your releases section at Github to let users download them but downside of that is that any update will be cumbersome to be announced and installed. It's far better to use a package repository to let user install your package using a package manager like apt or yum.

You can try to get your package into official distro repository but chances are you won't fulfill their requirements, so a personal repository is your most likely way to go. 

For long time I used Bintray to host my deb and rpm packages, but Bintray ended it's service as of march of 2021. I've had to look for an alternative. Finally I've found JFrog Artifactory, the official heir of Bintray.

JFrog Artifactory has a free tier for open source projects. If your project is not so popular to exceed 50 GB of monthly download Artifactory should be more than enough for your personal projects.

The only downside is Artifactory is a more complex (and complete) service than Bintray, so it's harder to get it running if you're just a hobbyist. I'm going to explain what I've learnt so far so you have an easier time than me with Artifactory.

Once registered on the platform, you enter the quick setup menu:

There, you can create a repository of any of supported packages. For this article I'm going for Debian. Click on Debian icon and select to create a new repository.

In next window you are asked to give a name (a prefix) for repository:

 


In the screenshot I'm calling vdist my repository. Artifactory creates a virtual, a remote and a local repositories. As far as I know the only repository useful at this point is the local one, so when following this article be sure to always select "debian-local" option.

Next window is deceptively simple as it can make you think you are ready to upload packages following instructions given in Deploy and Resolve tags:

 
Problem is that you need to configure some things before your repository is fully operational, as I've learnt the hard way.

First, you need to allow anonymous access to your repository to allow people download your packages. What is most confusing here is that anonymous access is configured (you can see that permission in Administration > Identity and Access > Permissions) but apparently it does not seem to work at all, so when you try to access to your repository using apt you only get a unauthorized error. The gotcha here is that first you need to enable globally anonymous access at Administration > Security > Settings:

 


 

Only after checking that option you will end getting unauthorized error.

To configure client linuxboxes to use your repository, just include in your /etc/sources.list file:

 deb https://dlabninja.jfrog.io/artifactory/<REPOSITORY_NAME>-debian-local <DISTRIBUTION> <COMPONENT>

In my example REPOSITORY_NAME is vdist, DISTRIBUTION is the name of distribution you are targeting (for example, in Ubuntu, it could be trusty) and por COMPONENT I use main. By the way, dlabninja is the tenant name I gave myself when I registered at Artifactory, yours is going to be different.

You may think you're ready to start uploading packages, but I'm afraid you're not yet. If you try to use apt to access your repository at this point you're going to get an error saying your repository is not signed so accessing to it is forbidden. To fix that you must create a pair of GPG keys to sign your packages and upload them to Artifactory.

To create a GPG key you can type at your console:



Type an identification name, an email and a password when asked. For name I use the repository one. Take note of the password you used If you forget it there is no way to recover it. Big string beginning with "F4F316" and ending with "010E55" is my key id, yours will be similar. That string will be useful to identify your key in gpg commands.

You can list your keys:


To upload your generated key, you first need to export it to a file. That export need to be split in two files: first you export your public key and afterwards your private key:

 

With first command I exported vdist public key and with second the private one. Note I've given an explanatory extension to exported files. This is a good moment to store these keys in a safe place.

To upload those files to Artifactory you need to go to Administration > Artifactory > Security > Keys management:

 

There, select "+ Add keys" at "Signing keys" tab. In the opening window enter the name for this signing key (in this case "vdist"), drag and drop over it exported key files and enter private key password. When done you'll have your key properly imported in Artifactory and ready to be used.

To configure imported GPG keys with a repository go to Administration > Repositories, select you repository and "Advanced" tab. There you have a "Primary key name" combo where you can select your key. Don't forget to click "Save & Finish" before leaving or any change will be lost:

 

Once done, you won't get an unsigned repository error with apt, but you'll still get an error:

(Click to enlarge)

Package manager complains because although repository is GPG signed it does not recognize its public key. To solve it we must upload our public key to one of the free PGP registries so our users can download and import it. For this matter I send my public keys to ubuntu.keyserver.com:


Once a public registry has a public key they sync with others to share them. Our user must import that public key and tell her package manager that public key is trusted. To do it we must be sudo:

Obviously we must do the same thing If we want to try to install our own package.

After that, sudo apt update will work smoothly:


Finally we are really ready to upload our first package to our repository. You have two way to do it: manually and programmatically.

You can upload packages manually through Artifactory web interface going to Artifactory > Artifacts > Selecting repository (in my example vdist-debian-local) > Deploy (button at the upper right corner). It opens a pop up window where you can drag an drop your package. Make sure that "Target repository" field is properly set to your repository (it is easy to send your package to the wrong repository).

Besides, Artifactory let you upload packages from command line what makes it perfect to do it programmatically in continuous integration workflow. You can see needed command in Artifactory > Artifacts > Set me up (button at the upper right corner). It opens a pop-up window with a tab called "Deploy" where you can find commands needed to deploy packages to a given repository:

As you can see commands has place holders for many fields. If you are not sure what to place at USERNAME and PASSWORD fields, go to configure tab, type your administrative password there and return to Deploy tab to see how those fields have been completed to you.

07 August 2021

How to use Travis continuous integration with your python projects

When your project begin to grow testing it, building it and deploying it becomes more complex.

You usually start doing everything manually but at some point you realize its better to keep everything in scripts. Self made building scripts may be enough for many personal projects but sometimes your functional test last longer than you want to stay in front of your computer. Leaving functional test to others is the solution for that problem and then is when Travis-CI appears.

Travis-CI provides an automation environment to run your functional tests and any script you want afterwards depending on your tests success or failure. You may know Jenkins, actually Travis-CI is similar but simpler and easier. If you are in an enterprise production environment you probably will use Jenkins for continuous integration but if you're with a personal project Travis-CI may be more than enough. Nowadays Travis-CI is free for personal open source projects.

When you register in Travis-CI you are asked to connect to a GitHub account, so you cannot use Travis without one. From there on you will always be asked to log in GitHub to enter your Travis account.

Once inside your Travis account you can click the "+" icon to link any of your GitHub repositories that you want to link to Travis. After you switch every Github repository you want to build from Travis you must include in their root a file called .travis.yml (notice dot at the very beginning), its content tell Travis what to do when he detects a git push over repository.

To have something as an example let assess travis configuration for vdist. You have many other examples in Travis documentation but I think vdist building process covers enough Travis features to give you a good taste of what can you achieve with it. Namely, we are going to study an steady snapshot of vdist's .travis.yml file. So, please, keep an open window with that .travis.yml code while you read this article. I'm explaining an high level overview of workflow and aferwards we're going to explain in deep every particular step of that workflow.

Workflow that travis.yml describes is this:

vdist build workflow (click on it to see it bigger)

When I push to staging branch at Github, Travis is notified by Github through a webhook and then it downloads latest version of your source code and looks for a .travis.yml in its root. With that file Travis can know which workflow to follow.

Namely with vdist, Travis looks for my functional test and run them using a Python 3.6 interpreter and the one marked as nightly in Python repositories. That way I can check my code runs fine in my target Python version (3.6) and I can find in advance any trouble that I can have with next planned release of Python. If any Python 3.6 test fails building process stops and I'm emailed with a warning. If any nightly Python version fails I'm emailed with a warning but building process continues because I only support stable Python releases. That way I can know if I have to workout any future problem with next python release but I let build proces continues if tests succeed with current stable Python version.

If tests suceed staging branch is merged with master branch at Github. Then Github activates two webhooks to next sites:
Those webhooks are one of nicest Github features because they let you integrate many services, from different third party vendors, with your Github workflow.

While Github merges branches and activates webhooks, Travis starts packaging process and deploys generated packages to some public repositories. Packages are generated in three main flavours: wheel package, deb package and rpm package. Wheel pakages are deployed to Pypi, while deb and rpm one are deployed to my Bintray repository and to vdist Github releases page.

That is the overall process. Lets see how all of this is actually implemented in Travis using vdist's .travis.yml.


Travis modes


When Travis is activated by a push in your repository it begins what is called as a build.

A build generates one or more jobs. Every job clones your repository into a virtual environment and then carries out a series of phases. A job finishes when it accomplishes all of its phases. A build finishes when all of its jobs are finished.

Travis default mode involves a lifecycle with these main phases for its jobs:

  1. before_install
  2. install
  3. before-script
  4. script
  5. after_sucess or after_failure
  6. before_deploy
  7. deploy
  8. after_deploy
  9. after_script
Actually there are more optional phases and you don't have even to implement everyone I listed. Actually only script phase is really compulsory. Script phase is where you usually run your functional tests. If you are successful with your tests, phases from after_success to after_script are run but if you are unsuccessful only after_failure is run. Install phase is where you install your dependencies to be ready to run your tests. Deploy phase is where you upload your generated packages to your different repositories so you usually use before_deploy phase to run commands needed to generate those packages.


 

 

Why do we say that a build can have one or more jobs? Because you can set what is called a build matrix. A build matrix is generated when you set you want to test your code with multiple runtimes and or multiple environment variables. For instance, you could set that you want your code tested against python 2.7, 3.6 and a development version of 3.7, so in that case a build matrix with three jobs are generated.

Problem with this mode is that build matrix generates complete jobs, so each one runs an script (test) and a deploy phases. But the thing is that sometimes you just want to run multiples test but just one a deploy phase. For example, guess we are building a python project whose code is prepared to be run both in python 2.7 and 3.6, in that case we would like to test or code against python 2.7 and 3.6 but, on success, generate just one package to upload it to pypi. Oddly, that kind of workflow seems not to be natively supported by Travis. If you try to use its default mode to test against python 2.7 and 3.6 you may find that you generate and deploy your resulting package twice.

 
 
 

 
Thankfully, Travis has what they call stage mode that, although is still officially in beta, works really well and solves the problem I described with default mode. 

In stage mode Travis introduces the stage concept. A stage is formally a group of jobs that run in parallel as part of a sequencial build process composed of multiple stages. Whereas default mode runs jobs parallely from start to end, stage mode organizes work across sequencial stages and inside those stages is where parallel jobs can be run.
 
 

 

In our example and stage can be created to run parallely two jobs to test both python 2.7 and 3.6 and later, in case of success, another stage can be launched to create and deploy a single package.

As this is exactly what I needed for vdist, this mode (stage mode) is the one I'm going to show in this article.


Initial configuration

Take a look to our .travis.yaml example. From lines 1 to 11 you have:



In those lines you set Travis general setup.

You first set your project native language using tag "language" so Travis can provide a virtual environment with proper dependencies installed.

Travis provides two kinds of virtual environments: default virtual environment is a docker linux container, that is lightweight and so it is very quick to be launched; second virtual environment is a full weight virtualized linux image, that takes longer to be launched but sometimes allow you things that containers don't. For instance, vdist uses docker for its building process (that's why I use docker "services" tag), so I have to use Travis full weight virtual environment. Otherwise, if you try running docker inside a docker container you're going to realize it does not work. So, to launch a full weight virtual environment you should set a "sudo: enabled" tag.

By default Travis uses a rather old linux version (Ubuntu Trusty). By the time I wrote this article there were promises about a near availability of a newer version. They say keeping environment at the latest ubuntu release takes too much resources so they update them less frequently. When update arrives you can ask to use it changing "dist: trusty" for whatever version they make available.

Sometimes you will find that using an old linux environment does not provide you with dependencies you actually need. To help with that Travis team try to maintain a customizedly updated Trusty image available. To use that specially updated version you should use "group: travis_latest" tag.


Test matrix

From lines 14 to 33 you have:

 


There, under "python:" tag, you set under which versions of python interpreter you want to run your tests.

You might need to run test depending not only on python interpreter versions but depending of multiples environment variables. Setting them under "matrix:" is your way to go.

You can set some conditions to be tested and be warned if they fail but not to make end the entire job. Those conditions use "allow_failures" tag. In this case I let my build continue if test with a under development (nightly) version of python fails, that way I'm warned that my application can fail with a future release of python but I let it be packaged while tests with official release of python work.

You can set global environment variables to be used by your building scripts using "global:" tag.

If any of those variables have values dangerous to be seen in a public repository you can cypher them using travis tools. First make sure you have travis tools installed. As it is a Ruby client you first have to install that interpreter:

dante@Camelot:~/$ sudo apt-get install python-software-properties dante@Camelot:~/$ sudo apt-add-repository ppa:brightbox/ruby-ng dante@Camelot:~/$ sudo apt-get update dante@Camelot:~/$ sudo apt-get install ruby2.1 ruby-switch dante@Camelot:~/$ sudo ruby-switch --set ruby2.1


Then you can use Ruby package manager to install Travis tool.

dante@Camelot:~/$ sudo gem install travis


With travis tool installed you can now ask it to cypher whatever value you want.

dante@Camelot:~/$ sudo travis encrypt MY_PASSWORD=my_super_secret_password


It will output a "secure:" tag followed by a apparently random string. You can now copy "secure:" tag and cyphered string to your travis.yaml. What we've done here is using travis public key to cypher our string. When Travis is reading our travis.yaml file it will use its private key to decypher every "secure:" tag it finds.


Branch filtering

 


This code comes from lines 36 to 41 of our .travis.yaml example. By default Travis activates for every push in every branch in your repository but usually you want to restrict activation to feature branches only. In my case I activate builds in pushes just over "staging" branch.


Notifications


 

As you can see at lines 44 to 51, you can setup which email recipient should be notified either or both in success or failure of tests:



Testing

From line 54 to 67 we get to testing, the real core of our job:




As you can see, actually it comprises three phases: "before_install", "install" and "script".

You can use those phases in the way more comfortable for you. I've used "before_install" to install all system packages my tests need, while I've used "install" to install all python dependencies.

You launch your tests at "script" phase. In my case I use pytest to run my tests. Be aware  that Travis waits for 10 minutes to receive any screen output from your test. If none is received then Travis thinks that test got stuck and cancel it. This behavior can be a problem if is normal for your test to stay silent for longer than 10 minutes. In that case you should launch your tests using "travis_wait N" command where N is the number of minutes we want our waiting time to extend by. In my case my tests are really long so I ask travis to wait 30 minutes before giving up.


Stages

In our example files stages definitions are from line 71 to 130.

Actually configuration so far is not so different than it would be if we were using Travis default mode. Where big differences really begin is when we find a "jobs:" tag because it marks the beginning of stages definition. From there every "stage" tag marks the start of an individual stage definition.

As we said, stages are executed sequentially following the same order they have in travis.yaml configuration. Be aware that if any of those stages fails the job is ended at that point.

You may ask yourself why testing is not defined as a stage, actually it could be and you should if you wanted alter the usual order and not execute tests at the very beginning. If test are not explicitly defined as a stage then they are executed at the begining, as the first stage.

Let assess vdist stages.

Stage to merge staging branch into master one

From line 77 to 81:

 


If tests have been successful we are pretty sure code is clean so we merge it into master. Merging into master has the side effect of launching ReadTheDocs and Docker Hub webhooks.

To keep my main travis.yaml clean i've taken all the code needed to do the merge to an outside script. Actually that script runs automatically the same console commands we would run to perform the merge manually.


Stage to build and deploy to an artifact service

So far you've checked your code is OK, now it's time to bundle it in a package and upload it to wherever your user are going to download it.

There are many ways to build and package your code (as many as programming languages) and many services to host your packages. For the usual packages hosting services Travis has predefined deployment jobs that automated much of the work. You have an example of that in lines 83-94:

 



There you can se a predefined deployment job for python packages consisting in two steps: packaging code to a wheel package and uploading it to Pypi service.

Arguments to provide to a predefined deployment jobs varies for each of them. Read Travis instructions for each one to learn how to configure the one you need.

But there are times the package hosting service is not included in the Travis supported list linked before. When that happens things get more manual, as you can ever use an script tag to run custom shell commands. You can see it in lines 95-102:


In those lines you can see how I call some scripts defined in folders of my source code. Those scripts use my own tools to package my code in rpm an deb packages. Although in following lines I've used Travis predefined deployment jobs to upload generated packages to hosting services, I could have scripted that too instead. All package hosting services provide a way to upload packages using console commands, so you always have that way if Travis does not provides a predefined job to do it.


 Debugging


Often your Travis setup won't work on the first run so you'll need debug it.

To debug your Travis build your first step is reading your build output log. If log is so big that Travis does not show it entirely in browser then you can download raw log and display it in your text editor. Search in your log for any unexpected error. Later try to run your script in a virtual machine with the same version Travis uses. If error found in Travis build log repeats in your local virtual machine then you have all you need to find out what does not work.

Things get complicated when error does not happen in your local virtual machine. Then error resides in any particularity in Travis environment that cannot be replicated in your virtual machine, so you must enter in Travis environment while building and debug there.

Travis enables by default debugging for private repositories, if you have one then you'll find a debug button just below your restart build one:


If you push that button build will start but, after the very initial steps, it will pause and open a listening ssh port for you to connect. Here is an example output of what you see after using that button:

Debug build initiated by BanzaiMan
Setting up debug tools.
Preparing debug sessions.
Use the following SSH command to access the interactive debugging environment:
ssh DwBhYvwgoBQ2dr7iQ5ZH34wGt@ny2.tmate.io
This build is running in quiet mode. No session output will be displayed.
This debug build will stay alive for 30 minutes.


In last example you would connect with ssh to DwBhYvwgoBQ2dr7iQ5ZH34wGt@ny2.tmate.io to get access to a console in Travis build virtual machine. Once inside your build virtual machine you can run every build step calling next commands:

travis_run_before_install
travis_run_install
travis_run_before_script
travis_run_script
travis_run_after_success
travis_run_after_failure
travis_run_after_script

Those command will activate respective build steps. When expected error appear at last you can debug environment to find out what is wrong.

Problem is that debug button is not available for public repositories, but don't panic you still can use that feature but you'll need to do some extra steps. To enable debug feature you should ask for it to Travis support through email to support@travis.ci.com. They will grant you in just a few hours.

Once you receive confirmation from Travis support about debug is enabled, i'm  afraid you won't see debug button yet. The point is that although debug feature is enabled, for public repositories you can call it through an api call only. You can launch that api call from console with this command:

dante@Camelot:~/project-directory$ curl -s -X POST \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "Travis-API-Version: 3" \ -H "Authorization: token ********************" \ -d "{\"quiet\": true}" \ https://api.travis-ci.org/job/{id}/debug

To get your token you should run this commands before:

dante@Camelot:~/project-directory$ travis login We need your GitHub login to identify you. This information will not be sent to Travis CI, only to api.github.com. The password will not be displayed. Try running with --github-token or --auto if you don't want to enter your password anyway. Username: dante-signal31 Password for dante-signal31: ***************************************** Successfully logged in as dante-signal31! dante@Camelot:~/project-directory$ travis token Your access token is **********************


In order to get a valid number for {id} you need to enter to the last log of the job you want to repeat and expand "Build system information" section. There, you can find the number you need at "Job id:" line.

Just after launching api call you can ssh to your Travis build virtual machine and start your debugging.

To end debugging session you can either close all your ssh windows or cancel build from web user interface.

But you should be aware about a potential danger. Why this debug feature is not available by default for public repositories? because ssh listening server has no authentication so anyone who knows where to connect will get console access to the same virtual machine. How an attacker would know where to connect? watching your build logs, if your repository is not private your travis logs are public by default at realtime, did you remember?. If and attacker is watching your logs and in that very moment you start a debug session she will see the same ssh connection string than you and will be able to connect to virtual machine. Inside Travis virtual machine the attacker can echo your environment variables and any secrets you have in them. Good news are that any new ssh connection will be attached to the same console session so if an intruder sneaks you will see her commands echoing in your session, giving you a chance to cancel debug session before she gets anything critical.

11 July 2017

Publishing your documentation


Documenting your application is paramount. You cannot expect many people will use your application if they don't know how.

There are many ways to document your application. You could write a text file to be included among your sources, but that is its very problem: it must be found among many other files so they can be easily overlooked. Besides, it has to be simply formatted unless you want your users to have a specific reader: acrobat, word, etc.

Another way, and the one we're going to cover here, is using a markup language to write a textual file to be used as source by a builder to create a richly formated web based documentation. Here there're many markup options but the two main ones are Markdown and ReStructuredTex (RST): the later is more extensible and the de facto standard in Python world, but the former is simpler, easy to use and the default option in Github.

To host generated web documentation you could use any web server but there are many free services out there for open source projects. Here we are going to cover Github wikis and ReadTheDocs service.

Github wikis


Easiest of all is Github that allows documentation hosting throught its wiki pages. Github wikis allows you to use both markdown and RST, editing page directly on Github through a browser or downloading wiki pages as a git repository, apart from the one of your source code. Just keep in mind that although you can make branches, in your wiki git repository, only
pushes to master branch will be rendered and displayed on Github. URL to clone your wiki repository can be found in Github wiki tab. Besides, be aware that if you push your pages files you are supossed to give page files a standard extension: .md for Markdown or .rst for reStructuredText.

Besides documentation editing can be perfomed collaboratively asigning permissions to different users.

To create a new wiki in your Github repository just click on wiki tab inside your project page and add a new page. You can include images already included in any folder inside your repository, add linkssidebars and even footers.

The page called Home is going to be your default landing page. I haven't found a way to order pages in default toolbar so, to keep a kind of index, I use to include in my home page a table of contents with links to pages. Of course you could create a custom sidebar too with correct order pages but it will be under pages sidebar which I don't know yet how to hide it.

ReadTheDocs


Github wikis are easy to use and possibly powerful enough for most projects, but sometimes you need a nicer formatting or a documentation site not related to your code repository on Github, there is where ReadTheDocs comes into play. ReadTheDocs can be connected throught webhooks to Github repository, so any pushes on that repository is going to fire documentation build process automatically. Besides it can keep multiple versions of your documentation, which is useful if your
application usage differs depending on the version the user is using, and it can build a PDF or ePUB files of your documentation to make it available to be downloaded.

Once you register in ReadTheDocs you should go to Settings --> Connected Services to link ReadTheDocs account to the one you use in Github or Bitbucket. After that you should go to My Projects --> Import a Project to create the connection with the repository where your documentation is. You are going to be asked about which name to put for documentation project. That name is going to be used to assign a public URL to your project.

Nevertheless, be aware that you cannot change your project name later (you only could remove project and recreate later with another name). Why you would want to change your project name? maybe you realize another name better or any of it's character happens to be troublesome, for instance I created a project called "vdist-" (note hyphen), after some time I realized that generated url (http://vdist-.readthedocs.io/en/latest/) didn't rendered at all in some browsers and from some locations (my PC chrome browser got page right but my smartphone chrome gave me a DNS error). I recreated project using "vdistdocs" as project name and problem was solved.

I don't know what happens in Bitbucket, but in Github you must allow webhooks updates to ReadTheDocs through your Settings --> Webhooks Github page.

When your project appears in your project pages you can click on them to enter its settings. There you can set type of markup used in your documentation. Markdown is specifically identified and RST is any of the Sphink-like options. In Advanced Settings you can set to create a PDF or ePUB version of your documentation with each build and you can set if your project should be publicly available. Apart from that, ReadTheDocs has a lot of black magic features to be configured through its settings, but I think the simplest configuration you would need includes the parameters I've explained.

For me, the hardest part to configure ReadTheDocs was to figure out which document file structure was needed to let ReadTheDocs render them right. There are two possible setups depending whether your documentation is Markdown or RST based.

For Markdown documentation you should include a mkdocs.yml file in your repository. That file contains the very basic configuration needed by ReadTheDocs to render our documentation. An example of this file could be the one from vdist project:

site_name: vdist
theme: readthedocs
pages:
- [index.md, Home]
- [whatisvdist.md, What is vdist]
- [usecases.md, Use cases]
- [howtoinstall.md, How to install]
- [howtouse.md, How to use]
- [howtocustomize.md, How to customize]
- [buildenvironment.md, Optimizing your build environment]
- [qanda.md, Questions and Answers]
- [howtocontribute.md, How to contribute]
- [releasenotes.md, Release notes]
- [roadmap.md, Development roadmap]


As you can see, mkdocs.yml is simple. It only contains your project name, to be displayed as the title of your documentation site, visual theme to apply to rendered documentation and a table of content linking your markdown pages with each section. That table of contents is the one that will appear as a left sidebar when documentation is rendered.

Using mkdocs file as a guideline, ReadTheDocs will search for a doc o docs folder inside your repository, rendering its contents, if none of those folders is found ReadTheDocs will search in top level forlder of your project.

For RST documentation, process is similar but what it is used as a guideline is a conf.py file generated with Sphinx. Here is a nice tutorial about how to setup sphinx to create files to be imported in ReadTheDocs.

The only problem I've found about ReadTheDocs is not exactly related to them but with their compatibility with RST documentation generated for Github wikis. Problem is that Github wikis expects your RST links to be formatted diferently as ReadTheDocs does. There is a good stackoverflow discussion about the topic. You should aware of it to structure in advance your documentation with the workaround explained there if you want to publish on both Github and ReadTheDocs.

14 June 2017

Packaging Python programs - DEB (and RPM) packages

Native way to distribute Python code is through PyPI, as I explained in a previous article. But to be honest PyPI has some drawbacks that limit its use to developing environments.

Problem with PyPI is that it does not implement a proper package management so pip uninstall doesn't work properly, and there's no way to rollback to a previous state. Besides pip installation procedures often build packages from source which can be painfully slow for a complete virtualenv.

Deploying Python application on production environments should be fast and should be done in a manner that clean uninstalling would be always available.

Debian has a really solid and stable package managing system. It can check dependencies either in installation and unistallation, and run pre and post installation scripts. That's why is one of the most used package managing system in Linux ecosystem.

This article is going to cover some ways to package our Python applications in Debian packages easily installable in Debian/Ubuntu Linux distros.

Debian package format

Although other linux packaging systems rely on binary formats, Debian packages use a simply ar archive to group in a single file a pair of tar archives, compressed with gzip or bzip. One of those two archives contains a folder tree with all application files and the another one the package configuration files.

Package configuration files are bare text files so the classic way of creating Debian packages involves only a bit of folder creation, textual editing of some configuration files and in it's simplest form just a command to be run:

dante@Camelot:~/project-directory$ debuild -us -uc


You can fin a good tutorial about the topic here.

The thing is not complex but can be tricky and you have to create and edit many textual files. It's not hard to see why developers have created so many tools to automate the task. We are going to see some of them specialised in packaging Python applications.

STDEB

If you are already used to python packaging for PyPI it shouldn't be hard for you to grasp stdeb concepts. If you don't know what I'm speaking about then you should read article I linked at the beggining of this one.

As stdeb call some Debian/Ubuntu tools under the hood you have to use one of those distros or any of their derivatives. If you are in a Debian/Ubuntu, you can download stdeb from standard linux repositories:

dante@Camelot:~$ sudo aptitude search stdeb
p   python-stdeb    - Python to Debian source package conversion utility                                 p   python3-stdeb   - Python to Debian source package conversion plugins for distutils                  
dante@Camelot:~$ sudo aptitude install python-stdeb python3-stdeb

But if you want to get a newer version you can also install stdeb from PyPI as a Python package.

The best workflow to use stdeb involves creating the necessary files to create a PyPI package, that way stdeb can use your setup.py file to get the info for creating debian package configuration files.

So, let's suppose you have your application ready to be packaged for PyPI with your setup.py already done. For this example, I've cloned this git repository.

To make stdeb generate a source debian package you just have to do:

dante@Camelot:~/geolocate$ python3 setup.py --command-packages=stdeb.command sdist_dsc

The --command-packages stuff can be cumbersome but the thing doesn't work without it, so rely on me and include it.

After some verbosy output you'll realize some folders are added to your working directory. You have to focus on the one called deb_dist. That folder contains mainly three files: a .dsc file, a .orig.tar.gz one and a .diff.gz. Those three files together are what we call a debian source package.

Inside deb_dist folder it you'll find another one called as your project with current version appended. That folder contains all generated data to compile a binary debian package, so move there and run next command:

dante@Camelot:~/geolocate/deb_dist/glocate-1.3-0$ dpkg-buildpackage -rfakeroot -uc -us

The fakeroot thing is just a flag to be able to build a debian package withount being logged as root user. That is the command recommended in stdeb documentation, but I've realized that you can use debuild command seen before:

dante@Camelot:~/geolocate/deb_dist/glocate-1.3-0$ debuild -uc -us

In the end debuild is a wrapper for dpkg-buildpackage that automates some things you should do manually otherwise.

After either of those commands you should find a source debian package in your deb_dist folder:

dante@Camelot:~/geolocate/deb_dist/glocate-1.3-0$ ls *.deb python3-glocate_1.3.0-1_all.deb

Those are the "two step" method, you can reduce it to "just one step" method running next command:

dante@Camelot:~/geolocate$ python3 setup.py --command-packages=stdeb.command bdist_deb

Although both methods ends with a .deb package in debian_dist folder, you should be aware that generated debian package is architecture dependent (althought it package names include a "_all" tag), that means that if you generate the package in an amd64 Ubuntu box chances are that you will face problems if you try to install resulting package in another Ubuntu box with a different architecture. To avoid this problem you have two options:
  • Use virtual machines to compile a debian package in every target debian architecture.
  • Use a PPA repository: Ubuntu offers personal hosting space for packaging project. You just upload there your source package (the content of deb_dist folder after running "python3 setup.py --command-packages=stdeb.command sdist_dsc"), and Ubuntu server compile it to each Ubuntu target architecture (mainly x86 and amd64). After compilation, created packages are available in your personal PPA until you remove them or replace them with newer versions.
If your Python applications just use built-in packages then your packaging trip ends here but if you use additional packages, for instance downloaded from PyPI, chances are that your compiled package doesn't include them properly.

Following our example, if you check geolocate's setup.py you should see that it depends of these additional packages:

install_requires=["geoip2>=2.1.0", "maxminddb>=1.1.1", "requests>=2.5.0", "wget>=2.2"]

Lets see if these dependencies have been included in generated debian package metadata:

dante@Camelot:~/geolocate/deb_dist$ dpkg -I python3-glocate_1.3.0-1_all.deb [...] Depends: python3, python3-requests, python3:any (>= 3.3.2-2~) [...]


Obviously they have not been included. In fact our compiling command warned us that dependency check failed at building time. If we check building output we'll find this output:
[...]
I: dh_python3 pydist:184: Cannot find package that provides geoip2. Please add package that provides it to Build-Depends or add "geoip2 python3-geoip2-fixme"
line to debian/py3dist-overrides or add proper  dependency to Depends by hand and ignore this info.
I: dh_python3 pydist:184: Cannot find package that provides maxminddb. Please add package that provides it to Build-Depends or add "maxminddb python3-maxmindd
b-fixme" line to debian/py3dist-overrides or add proper  dependency to Depends by hand and ignore this info.
I: dh_python3 pydist:184: Cannot find package that provides wget. Please add package that provides it to Build-Depends or add "wget python3-wget-fixme" line t
o debian/py3dist-overrides or add proper  dependency to Depends by hand and ignore this info.
[...]
Problem is that stdeb didn't identify which linux packages include those PyPI packages. So we have to set them manually.

To manually configure stdeb you have to create a file called stdeb.cfg in the same folder than setup.py. You usually will create a [DEFAULT] section where you'll put your configuration but you can create too [package_name] sections, where package_name is specified as the name argument to the setup() command.

For instance, if we find out that geoip2 PyPI library is included inside python3-geoip Ubuntu repository's package, and requests PyPI library is included inside python3-requests we could create a stdeb.cfg with this content:
[DEFAULT]
X-Python3-Version: >= 3.4
Depends3: python3-geoip (>=1.3.1), python3-requests
All tags follow the same format as they would have if they were inserted in debian/control. For depends tag, format specification is here.

With that configuration, the given dependencies are included in generated debian package:

dante@Camelot:~/geolocate/deb_dist$ dpkg -I python3-glocate_1.3.0-1_all.deb
[...]
Depends: python3, python3-requests, python3:any (>= 3.3.2-2~), python3-geoip (>= 1.3.1)
[...]

What I haven't found out yet is a way to change the architecture tag of generated package so it is no longer generated with "Architecture: all".

At first glance, stdeb looks great and indeed it is but actually it has too some serious drawbacks.

Problem is that stdeb limits you to use only libraries and python packages from your standard linux repository. If you develop your application using PyPI libraries, chances are that when you try to find which linux package includes your PyPI library you'll find that those packages contain only older versions than those you downloaded from PyPI. Worse even, many PyPI libraries have not been ported to standard linux repositories so you won't find any package to match your dependency. For instance, geolocate needs to use geoip2 (v.2.1.0) which is easily downloadable from PyPI but only Ubuntu 15.04 has an available package called python3-geoip, but this one comes with version 1.3.2 of geoip. Will geolocate work with geoip version provided by python3-geoip package? probably won't. Other geolocate dependencies are even missing in standard repository, like PyPI wget python library.
 
It's clear that if you like to use PyPI libraries stdeb may not be your best option. But if you develop using only libraries available through your standard package manager then stdeb will probably save you the day.

FPM

FPM is a Ruby tool similar to stdeb. It's main advantage is that you can use FPM to create many kinds of packages installer, not only Debian ones, currently RPM packages (for Red Hat distros). And what is even more interesting FPM allows package conversions, for example from RPM to DEB.

FPM has no package in Ubuntu main repository, so you have to download it from Ruby's equivalent of PyPI. To get that you should install first Ruby packages:


dante@Camelot:~/geolocate$ sudo aptitude install ruby-dev gcc make

Afterwards you can install FPM from Ruby repositories:


dante@Camelot:~/geolocate$ gem install fpm

Creating a DEB package is pretty simple, just set FPM source as python, its target as deb and give it your package setup.py file path:


dante@Camelot:~/geolocate$ fpm -s python -t deb ./setup.py

FPM takes dependency names from setup.py and prepend them with the tag you set with --python-package-name-prefix flag (if not set then python prefix is used):


dante@Camelot:~/geolocate$ dpkg -I python-glocate_1.3.0_all.deb
[...]
Depends: python-geoip2 (>= 2.1.0), python-maxminddb (>= 1.1.1), python-requests (>= 2.5.0), python-wget (>= 2.2)
[...]

Problem here is similar than in stdeb: those dependencies doesn't exists in standard Ubuntu repository. In case dependencies would exists but with different names than those autogenerated by FPM, then you could set them manually:


dante@Camelot:~/geolocate$ fpm -s python -t deb --no-auto-depends -d "python3-geoip>=1.3.1, python3-wget" ./setup.py
[...]>
dante@Camelot:~/geolocate$ dpkg -I python-glocate_1.3.0_all.deb 
[...]
Depends: python3-geoip>=1.3.1, python3-wget
[...] 


Another useful flag is "-a native". This flag sets package architecture to the one of your system, so it is not set any longer to "_all" in your package name

FPM is a great tool. It allows to create RPM packages and it is very configurable but in my opinion it has one serious drawbacks: as happened with stdeb, it is useless if your applications imports a library available in PyPI but not in standard operating system repositories.

DH-VIRTUALENV

Up to this point it should be clear the main problem to package a Python a application is ensuring to meet its dependencies at installation, because developer may have used PyPI libraries not available through Linux standard repositories at user end.

The guys at Spotify developed a packager to address this problem: dh-virtualenv.

This assumes that if you are using PyPI to develop then you are likely using virtualenvs. So dh-virtualenv includes the entire virtualenv into the package so you have not to install them in user end.

Nevertheless, in my humble opinion dh-virtualenv has a serious drawback: it is not as cleaner to use as stdeb or fpm (because you have to create manually a debian folder and a rules file) and you end using debuild as in the beginning of the article.

VDIST

Main concepts of vdist are similar to those seen in dh-virtualenv but vdist uses a combination of docker and fpm to create operating system standard package. This tool lets you build linux packages from your Python applications while aiming to build an isolated environment for your Python project using virtualenv. At first glance vdist may looks complex but its documentation it really clear and helpful, and actually is quite simple to use and automate.

If your main problem while packaging python application is to ensure dependencies are present at user end, vdist solves this making your application self contained and self sufficient so it does not depend on OS provided packages of Python modules. This means that packages generated by vdist contain your application, all python dependencies needed by your application, and a Python interpreter. That python interpreter allows to run your application with the interpreter of your choice not with the one shipped with the OS you're deploying on.

To ensure the host used to build the package keeps its system packages intact, vdist uses docker to create a clean OS image at build time and install there needed dependencies before your application is being packaged on top of it. Thanks to this your build machines will always be reverted to it's original state. To load your application into docker image, vdist downloads application source code from a git repository, so having your application in Bitbucket or Github is a good idea. Downloaded source code is placed in a virtualenv created inside your docker image. Pypy dependencies will be installed inside virtualenv

Main dependency for using vdist is having docker installed and its daemon running. To install it in Ubuntu you just need to do the following:


dante@Camelot:~/geolocate$ sudo aptitude install docker.io python-docker python3-docker


After installing docker, remember to add your user to docker group:


dante@Camelot:~/geolocate$ sudo usermod -a -G docker dante


You may need to restart your system to be sure the group is really updated.

Easiest way to install vdist is to install it using your standard package tools. Vdist packages are hosted at Bintray, so to install them from there you should include Bintray in your system repositories before anything else. To do it in Ubuntu just type:


dante@Camelot:~$ sudo apt-get update
dante@Camelot:~$ sudo apt-get install apt-transport-https
dante@Camelot:~$ sudo echo "deb [trusted=yes] https://dl.bintray.com/dante-signal31/deb generic main" | tee -a /etc/apt/sources.list
dante@Camelot:~$ sudo apt-key adv --keyserver pgp.mit.edu --recv-keys 379CE192D401AB61

Once added bintray in your repositories you can install and update vdist like any other system package. For instance, in Ubuntu:


dante@Camelot:~$ sudo apt-get update
dante@Camelot:~$ sudo apt-get install vdist


If you are in a system where you don't have permission to install system packages you may find interesting installing it from PyPI repository inside a virtualenv created ad-hoc for packaging your application:


(env) dante@Camelot:~/geolocate$ pip install vdist


After installing vdist you are provided with a console command called... vdist. Just be aware that if you have installed vdist inside a virtualenv, that console command only will be available inside that virtualenv.

You have many ways to use vdist, I think the easiest way to use it is creating a configuration file and making vdist read it. Vdist is used to package itself so its configuration file is a good example:

[DEFAULT]
app = vdist
version = 1.1.0
source_git = https://github.com/dante-signal31/${app}, master
fpm_args = --maintainer dante.signal31@gmail.com -a native --url
    https://github.com/dante-signal31/${app} --description
    "vdist (Virtualenv Distribute) is a tool that lets you build OS packages
     from your Python applications, while aiming to build an
     isolated environment for your Python project by utilizing virtualenv. This
     means that your application will not depend on OS provided packages of
     Python modules, including their versions."
    --license MIT --category net
requirements_path = /REQUIREMENTS.txt
compile_python = True
python_version = 3.5.3
output_folder = ./package_dist/
after_install = packaging/postinst.sh
after_remove = packaging/postuninst.sh

[Ubuntu-package]
profile = ubuntu-trusty
runtime_deps = libssl1.0.0, docker.io
build_deps =

[Centos7-package]
profile = centos7
runtime_deps = openssl, docker-ce

Vdist documentation is good enough to know what each parameter is useful for. Just note that you can have just in one configuration file parameters for every package you want to build . Just keep common parameters in [DEFAULT] section and put distribution dependent parameters in separate sections (they can be called as you want but you'd better use expressive names).

Once you have your configuration file you can launch vdist like this (guess configuration file is called configuration_file):


dante@Camelot:~/$ vdist batch configuration_file

Then you'll start to see a bunch of screen output while vdists builds your packages. Generated packages will be placed in folder set in output_folder configuration file parameter.

The smallest package size produced by vdist is about 50 MB because an entire python distribucion has to be included in that package. That size is what you pay for your application self-containment. Discounted those 50 MB, all the rest is due your application and it dependencies. At first glance it may seem big but nowadays is quite usual size for any compiled application you may find out there.

I think vdist is the most complete packaging solution available to deploy python apps in Linux boxes. With it you can deploy even in linux boxes with no python installed at all, giving you a valuable isolation in client end and easying your final user life to install your app.

Disclaimer: I started using vdist to write this article and I've ended being the current main contributor to its development, so feel free to comment any further improvement you feel could be interesting.