18 September 2021

How to parse console arguments in your Python application

There is one common pattern for every console application: it has to manage user arguments. Few console applications runs with no user arguments, instead most applications needs user provided arguments to run properly. Think of ping, it needs at least one argument: IP address or URL to be pinged:

dante@Camelot:~/$ ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=113 time=3.73 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=113 time=3.83 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=113 time=3.92 ms
^C
--- 8.8.8.8 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2005ms
rtt min/avg/max/mdev = 3.733/3.828/3.920/0.076 ms

dante@Camelot:~$

When you run a python application you in sys.argv list every argument user entered when calling your application from console. Console command is split using whitespaces as separator a each resulting token is a sys.argv list element. First element of that list is your application name. So, if we implemented our own python version of ping sys.argv 0 index element would be "ping" and 1 index element would be "8.8.8.8".

You could do your own argument parsing just assessing sys.argv content by yourself. Most developers do it that way... at the begining. If you go that way you'll soon realize that it is not so easy to perform a clean management of user argument and that you have to repeat a lot of boilerplate in your applications. so, there are some libraries to easy argument parsing for you. One I like a lot is argparse.

Argparse is a built-in module of standard python distribution so you don't have to download it from Pypi. Once you understand its concepts it's easy, very flexible and it manages for you many edge use cases. It one of those modules you really miss when developing in other languages.

First concept you have to understand to use argparse is Argument Parser concept. For this module, an Argument Parser is a fixed word that is followed by positional or optional user provided arguments. Default Argument Parser is the precisely the application name. In our ping example, "ping" command is an Argument Parser. Every Argument Parser can be followed by Arguments, those can be of two types:

  • Positional arguments: They cannot be avoided. User need to enter them or command is assumed as wrong. They should be entered in an specific order. In our ping example "8.8.8.8" is a positional argument. We can meet many other examples, "cp" command for instance needs to positional arguments source file to copy and destination for copied file.
  • Optional arguments: They can be entered or not. They are marked by tags. Abbreviated tags use an hyphen and a char, long tags use double hyphens and a word. Some optional arguments admit a value and others not (they are boolean, true if they are used or false if not).

Here you can see some of the arguments cat command accepts:

dante@Camelot:~/$ cat --help
Usage: cat [OPTION]... [FILE]...
Concatenate FILE(s) to standard output.

With no FILE, or when FILE is -, read standard input.

-A, --show-all equivalent to -vET
-b, --number-nonblank number nonempty output lines, overrides -n
-e equivalent to -vE
-E, --show-ends display $ at end of each line
-n, --number number all output lines
-s, --squeeze-blank suppress repeated empty output lines
-t equivalent to -vT
-T, --show-tabs display TAB characters as ^I
-u (ignored)
-v, --show-nonprinting use ^ and M- notation, except for LFD and TAB
--help display this help and exit
--version output version information and exit

Examples:
cat f - g Output f's contents, then standard input, then g's contents.
cat Copy standard input to standard output.

GNU coreutils online help: <https://www.gnu.org/software/coreutils/>
Full documentation at: <https://www.gnu.org/software/coreutils/cat>
or available locally via: info '(coreutils) cat invocation'

dante@Camelot:~$


There are more complex commands that include what is called Subparsers. Subparsers appear when your command have other "fixed" words, like verbs. For instance, "docker" command have many subparsers: "docker build", "docker run", "docker ps", etc. Every subparser can accept its own set of arguments. A subparser can accept other subparsers too, so you can get pretty complex command trees.

To show how to use argparse module works, I'm going to use my own configuration for my application Cifra. Take a look to it's cifra_launcher.py file at GitHub.

In its __main__ section you can see how arguments are processed at high level of abstraction:

You can see we are using sys.argv to get arguments provided by user but discarding first one because is the root command itself: "cifra".

I like using sys.argv as a default parameter of main() call because that way I can call main() from my functional tests using an specific list of arguments.

Arguments are passed to parse_arguments, that is a my function to do all argparse magic. There you can find the root configuration for argparse:

There you configure root parser, the one linked to your base command, defining a description of you command and a final note (epilog) for your command. Those texts will appear when your user call your command with --help argument.

My application Cifra has some subparsers, like docker command has. To allow a parser to have subparsers you must do:

Once you have allowed your parser to have subparser you can start to create those subparsers. Cifra has 4 subparsers at this level:

 

 

We'll see that argparse returns a dict-like object after its parsing. If user called "cifra dictionary" then "mode" key from that dict-like object will have a "dictionary" value.

Every subparser can have its own subparser, actually "dictionary" subparser has more subparser adding more branches to the command tree. So you can call "cifra dictionary create", "cifra dictionary update", "cifra dictionary delete", etc.

Once you feel a parser does not need more subparsers you can add its respective arguments. For "cipher" subparser we have these ones:

In these arguments, "algorithm" and "key" are positional so they are compulsory. Arguments placed at that position will populate values for "algorithm" and "key" keys in dict-like object returned by argparse.

Metavar parameter is the string we want to be used to represent this parameter when help is called with --help argument. Help parameter as you can guess is the tooltip used to explain this parameter when --helps argument is used.

Type parameter is used to preprocess argument given by user. By default arguments are interpreted like strings, but if you use type=int argument will be converted to an int (or throw an error if it can't be done). Actually str or int are functions, you can use your own ones. For file_to_cipher parameter I used _check_is_file function:

As you can see, this function check if provided arguments points to a valid path name and if that happens returns provided string or raises an argparse error otherwise.

Remember parser optional parameters are always those preceded by hyphens. If those parameters are followed by an argument provided by users, that argument will be the value for dict-like object returned by arparser key called like optional parameter long name. In this example line 175 means that when user types "--ciphered_file myfile.txt", argparser will return a key called "ciphered_file" with value "myfile.txt".

Sometimes, your optional parameters won't need values. For example, they could be a boolean parameter to do verbose output:

parser.add_argument("--verbose", help="increase output verbosity", action="store_true", default=False)

With action="store_true" the dict-like object will have a "verbose" key with a boolean value: true if --verbose was called or false otherwise.

Before returning from parse_arguments I like to filter returned dict-like object content:


In this section parsed_arguments is the dict-like object that argparse returns after parsing provided user arguments with arg_parser.parse_args(args). This object includes those uncalled parameters with None value. I clould leave it that way and check afterwars if values are None before using them, but a I fell somewhat cleaner remove those None'd values keys and just check if those keys are present or not. That's why I filter dict-like object at line 235.

Back to main() function, after parsing provided arguments you can base your further logic depending on parsed arguments dict contents. For example, for cifra:

This way, argparser lets you deal with provided user arguments in a clean way giving you for free help messages and error messages, depending on whether user called --help or entered a wrong or incomplete command. Actually generated help messages are so clean that I use them for my readme.md repository file and as a base point for my man pages (what that is an story for another article).

29 August 2021

How to use Docker containers


Virtualization is all about deception.

With heavy virtualization (i.e. VMware, Virtualbox, Xen) a guest operating system is deceived to think it is running in a dedicated hardware, while it is actually shared. 

With light virtualization (i.e Docker) an application is deceived to think it is using a dedicated operating system kernel, while it is actually shared too. It happens that in Linux everything but the kernel is considered an application so with Docker you can make multiple linux distribution share the same kernel (the one from host system). As this is a higher abstraction level of deception than the kind of VMware it consumes less resources, that's why is called light virtualization. It's so light that many applications are distributed in docker packages (called containers) so application is bundled along and operating system and its dependencies to be run all at once in another system and don't mess with its respective dependencies.

Sure there are things light virtualization cannot do but they are not many. For standard "level-7" application development you won't find any limitation using Docker virtualization.

Installation

In the past you could install docker from your distribution package repository. Nowadays you may find docker package in your usual package repository. But that no longer works. If you want to install docker in your computer you should ignore those packages available in your standard repositories and use official Docker repositories.

Docker provides package repositories for many distributions. For instance, here you can find instructions to install docker in an ubuntu distribution. Just be aware that if you are using an Ubuntu derivative, like Linux Mint, you're going to need to customize those instructions to set your version in apt sources list.

Once you have installed docker in your computer, you can run a Hello World app, bundled in a container, to check everything works:

dante@Camelot:~/$ sudo docker run hello-world
[sudo] password for dante:
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
b8dfde127a29: Pull complete
Digest: sha256:7d91b69e04a9029b99f3585aaaccae2baa80bcf318f4a5d2165a9898cd2dc0a1
Status: Downloaded newer image for hello-world:latest

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
https://hub.docker.com/

For more examples and ideas, visit:
https://docs.docker.com/get-started/

dante@Camelot:~$

Be aware that although you might we able to run docker containers you may not be able to download and install them without sudo. If you are not comfortable working like that you can add yourself to docker group:

dante@Camelot:~/$ sudo usermod -aG docker $USER
[sudo] password for dante:

dante@Camelot:~$

You may need to log out and re log to activate change. This way you can run docker commands as an unprivileged user.

Usage 

You can make your own custom containers, but that is a topic for another article. In this article we are going to use containers customized by others.

To find available containers, head to Docker Hub and type any application or Linux distribution in its search field. Output will show you many options. If you're looking for a raw linux distribution you'd better use those tagged as "Official image". 

Guess you want to try an app in an Ubuntu Xenial, then select Ubuntu is search output and take a look to "Supported tags..." sections. There you can find how different versions are named to be downloaded. In our case we would take note of "xenial" or "16.04" tags.

Now that you know what to download, let's do it with docker pull:

dante@Camelot:~/$ docker pull ubuntu:xenial
xenial: Pulling from library/ubuntu
528184910841: Pull complete
8a9df81d603d: Pull complete
636d9303bf66: Pull complete
672b5bdcef61: Pull complete
Digest: sha256:6a3ac136b6ca623d6a6fa20a7622f098b2fae1ac05f0114386ef439d8ca89a4a
Status: Downloaded newer image for ubuntu:xenial
docker.io/library/ubuntu:xenial


dante@Camelot:~$

What you've done is download what is called an image. An image is a base package with an specific linux distribution and applications. From that base package you can derive your own custom packages or run instances of that base packages, those instances are what we call containers.

If you want to check how many images you have locally available just run docker images:

dante@Camelot:~/$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
ubuntu xenial 38b3fa4640d4 4 weeks ago 135MB
hello-world latest d1165f221234 5 months ago 13.3kB



dante@Camelot:~$

You can start instances (aka containers) from those images using docker run:

dante@Camelot:~/$ docker run --name ubuntu_container_1 ubuntu:xenial
dante@Camelot:~$

Using --name you can assign an specific name to your container to identify it from other containers started from the same image.

Problem starting containers this way is that they close inmediately. If you check your containers status using docker ps

dante@Camelot:~/$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c7baad2f7e56 ubuntu:xenial "/bin/bash" 12 seconds ago Exited (0) 11 seconds ago ubuntu_container_1
3ba89f1f37c6 hello-world "/hello" 9 hours ago Exited (0) 9 hours ago focused_zhukovsky

dante@Camelot:~$

I've used an -a flag to show every container, not only the active ones. That way you can see that ubuntu_container_1 ended its activity almost at once since start. That happens because docker containers are designed to run an specific application and close themselves when that application ends. We did not said which application to run in the container so it just closed.

Before trying anything else let's delete previous container, using docker rm, to start from scratch:

dante@Camelot:~/$ docker rm ubuntu_container_1
ubuntu_container_1

dante@Camelot:~$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3ba89f1f37c6 hello-world "/hello" 10 hours ago Exited (0) 10 hours ago focused_zhukovsky
dante@Camelot:~$

Now we want to keep our container alive to access its console. One way is this:

dante@Camelot:~/$ docker run -d -ti --name ubuntu_container_1 ubuntu:xenial
a14e6bcac57dd04cc777a4eac787a8465acd5b8d379591976c56de1d0acc2798

dante@Camelot:~$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a14e6bcac57d ubuntu:xenial "/bin/bash" 8 seconds ago Up 7 seconds ubuntu_container_1
3ba89f1f37c6 hello-world "/hello" 10 hours ago Exited (0) 10 hours ago focused_zhukovsky
dante@Camelot:~$

We've used -d flag to run container in the background and -ti to start and interactive shell and keep it open. Doing so we can see that this time container stays up. But we are not yet in container console, to access it we must do docker attach to connect with that container shell:

dante@Camelot:~/$ dante@Camelot:~$ docker attach ubuntu_container_1
root@a14e6bcac57d:/#

Now you can see that shell has changed its left identifier. You can leave container console using exit, but that stops container. To leave container keeping it active use Ctrl+p followed by Ctrl+q instead.

You can pause an idle container to save resources and resume it later using docker stop and docker start:

dante@Camelot:~/$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9727742a40bf ubuntu:xenial "/bin/bash" 4 minutes ago Up 4 minutes ubuntu_container_1
3ba89f1f37c6 hello-world "/hello" 10 hours ago Exited (0) 10 hours ago focused_zhukovsky
dante@Camelot:~$ docker stop ubuntu_container_1
ubuntu_container_1
dante@Camelot:~$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9727742a40bf ubuntu:xenial "/bin/bash" 7 minutes ago Exited (127) 6 seconds ago ubuntu_container_1
3ba89f1f37c6 hello-world "/hello" 10 hours ago Exited (0) 10 hours ago focused_zhukovsky
dante@Camelot:~$ docker start ubuntu_container_1
ubuntu_container_1
dante@Camelot:~$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9727742a40bf ubuntu:xenial "/bin/bash" 8 minutes ago Up 4 seconds ubuntu_container_1
3ba89f1f37c6 hello-world "/hello" 10 hours ago Exited (0) 10 hours ago focused_zhukovsky
dante@Camelot:~$

Saving your changes

Chances are that you want export changes made to an existing container so you can start new container with those changes already applied.

Best way is using dockerfiles, but I'll leave that for a further article. A quick and dirty way is to commit your changes to a new image with docker commit:

dante@Camelot:~/$ docker commit ubuntu_container_1 custom_ubuntu
sha256:e2005a0ec8302f8948958a90e2abc1e8957a15155c6e6bbf7300eb1709d4ae70
dante@Camelot:~$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
custom_ubuntu latest e2005a0ec830 5 seconds ago 331MB
ubuntu xenial 38b3fa4640d4 4 weeks ago 135MB
ubuntu latest 1318b700e415 4 weeks ago 72.8MB
hello-world latest d1165f221234 5 months ago 13.3kB

dante@Camelot:~$

In this example we created a new image called custom_ubuntu. Using that new image we can create new instances with the changes made so far to ubuntu_container_1.

We aware that commiting changes from a running container stops it to avoid data corruption. After commiting is ended container is resumed.


Running services from containers

So far you have a lightweight virtual machine and you have access to its console, but that is not enough as you'll want offer services from that container.

Guess you have configured a SSH server in your container and you want to access it from your LAN. In that case you need container ports to be mapped by its host and offered to LAN.

An important gotcha here is that port mapping should be configured when a container is first time started with docker run using -p flag:

dante@Camelot:~/$ docker run -d -ti -p 8888:22 --name custom_ubuntu_container custom_ubuntu
5ea2604da8c696af397cd1b1f98d7426dd68182e5edd81a56a6b739849ba1e84

dante@Camelot:~$

Now you can access to the container SSH service through host 8888 port.


Sharing files with containers

Our containers won't be isolated islands. They may need files from us or the may retrieve files to us.

One way to do that file sharing is starting containers mounting a host folder as a shared folder:

dante@Camelot:~/$ docker run -d -ti -v $(pwd)/docker_share:/root/shared ubuntu:focal
ee64e7392a77b4d0b35adadad467fe5d6a105b84290945e346f82199116b7c56

dante@Camelot:~$

Here, host folder docker_share will be accessible from container at /root/shared container path. Be aware you should enter absolute paths. I use $(pwd) as a shortcut to enter host current working folder.

Once your container is started, every file placed at its /root/shared folder will be visible from host, even after container is stopped. The other way round, that is placing a file from host to be seen at container, is possible but you will need to do sudo:

dante@Camelot:~/$ cp docker.png docker_share/.
cp: cannot create regular file 'docker_share/./docker.png': Permission denied

dante@Camelot:~/$ sudo cp docker.png docker_share/.
[sudo] password for dante:

dante@Camelot:~$

Another way of sharing is using built-in copy command:

dante@Camelot:~/$ docker cp docker.png vigorous_kowalevski:/root/shared 
dante@Camelot:~$

Here, we have copied docker.png host file to vigorous_kowalevski container /root/shared folder. Note that we didn't need sudo to run docker cp.

Other way round is also possible, just change argument order:

dante@Camelot:~/$ docker cp vigorous_kowalevski:/etc/apt/sources.list sources.list 
dante@Camelot:~$

Here we copied container sources.list to host.


From here

So far you know how to deal with docker containers. Next step is creating your own custom images using dockerfiles and sharing them through Docker Hub. I'm going to explain those topics in a further article.

 

10 August 2021

How to use JFrog Artifactory to distribute your packages


In a previous article I reviewed some ways to generate a debian or rpm packages for your developed applications. But after packaging your applications you must find an easy way to let your users get your package. You can allways leave then at your releases section at Github to let users download them but downside of that is that any update will be cumbersome to be announced and installed. It's far better to use a package repository to let user install your package using a package manager like apt or yum.

You can try to get your package into official distro repository but chances are you won't fulfill their requirements, so a personal repository is your most likely way to go. 

For long time I used Bintray to host my deb and rpm packages, but Bintray ended it's service as of march of 2021. I've had to look for an alternative. Finally I've found JFrog Artifactory, the official heir of Bintray.

JFrog Artifactory has a free tier for open source projects. If your project is not so popular to exceed 50 GB of monthly download Artifactory should be more than enough for your personal projects.

The only downside is Artifactory is a more complex (and complete) service than Bintray, so it's harder to get it running if you're just a hobbyist. I'm going to explain what I've learnt so far so you have an easier time than me with Artifactory.

Once registered on the platform, you enter the quick setup menu:

There, you can create a repository of any of supported packages. For this article I'm going for Debian. Click on Debian icon and select to create a new repository.

In next window you are asked to give a name (a prefix) for repository:

 


In the screenshot I'm calling vdist my repository. Artifactory creates a virtual, a remote and a local repositories. As far as I know the only repository useful at this point is the local one, so when following this article be sure to always select "debian-local" option.

Next window is deceptively simple as it can make you think you are ready to upload packages following instructions given in Deploy and Resolve tags:

 
Problem is that you need to configure some things before your repository is fully operational, as I've learnt the hard way.

First, you need to allow anonymous access to your repository to allow people download your packages. What is most confusing here is that anonymous access is configured (you can see that permission in Administration > Identity and Access > Permissions) but apparently it does not seem to work at all, so when you try to access to your repository using apt you only get a unauthorized error. The gotcha here is that first you need to enable globally anonymous access at Administration > Security > Settings:

 


 

Only after checking that option you will end getting unauthorized error.

To configure client linuxboxes to use your repository, just include in your /etc/sources.list file:

 deb https://dlabninja.jfrog.io/artifactory/<REPOSITORY_NAME>-debian-local <DISTRIBUTION> <COMPONENT>

In my example REPOSITORY_NAME is vdist, DISTRIBUTION is the name of distribution you are targeting (for example, in Ubuntu, it could be trusty) and por COMPONENT I use main. By the way, dlabninja is the tenant name I gave myself when I registered at Artifactory, yours is going to be different.

You may think you're ready to start uploading packages, but I'm afraid you're not yet. If you try to use apt to access your repository at this point you're going to get an error saying your repository is not signed so accessing to it is forbidden. To fix that you must create a pair of GPG keys to sign your packages and upload them to Artifactory.

To create a GPG key you can type at your console:



Type an identification name, an email and a password when asked. For name I use the repository one. Take note of the password you used If you forget it there is no way to recover it. Big string beginning with "F4F316" and ending with "010E55" is my key id, yours will be similar. That string will be useful to identify your key in gpg commands.

You can list your keys:


To upload your generated key, you first need to export it to a file. That export need to be split in two files: first you export your public key and afterwards your private key:

 

With first command I exported vdist public key and with second the private one. Note I've given an explanatory extension to exported files. This is a good moment to store these keys in a safe place.

To upload those files to Artifactory you need to go to Administration > Artifactory > Security > Keys management:

 

There, select "+ Add keys" at "Signing keys" tab. In the opening window enter the name for this signing key (in this case "vdist"), drag and drop over it exported key files and enter private key password. When done you'll have your key properly imported in Artifactory and ready to be used.

To configure imported GPG keys with a repository go to Administration > Repositories, select you repository and "Advanced" tab. There you have a "Primary key name" combo where you can select your key. Don't forget to click "Save & Finish" before leaving or any change will be lost:

 

Once done, you won't get an unsigned repository error with apt, but you'll still get an error:

(Click to enlarge)

Package manager complains because although repository is GPG signed it does not recognize its public key. To solve it we must upload our public key to one of the free PGP registries so our users can download and import it. For this matter I send my public keys to ubuntu.keyserver.com:


Once a public registry has a public key they sync with others to share them. Our user must import that public key and tell her package manager that public key is trusted. To do it we must be sudo:

Obviously we must do the same thing If we want to try to install our own package.

After that, sudo apt update will work smoothly:


Finally we are really ready to upload our first package to our repository. You have two way to do it: manually and programmatically.

You can upload packages manually through Artifactory web interface going to Artifactory > Artifacts > Selecting repository (in my example vdist-debian-local) > Deploy (button at the upper right corner). It opens a pop up window where you can drag an drop your package. Make sure that "Target repository" field is properly set to your repository (it is easy to send your package to the wrong repository).

Besides, Artifactory let you upload packages from command line what makes it perfect to do it programmatically in continuous integration workflow. You can see needed command in Artifactory > Artifacts > Set me up (button at the upper right corner). It opens a pop-up window with a tab called "Deploy" where you can find commands needed to deploy packages to a given repository:

As you can see commands has place holders for many fields. If you are not sure what to place at USERNAME and PASSWORD fields, go to configure tab, type your administrative password there and return to Deploy tab to see how those fields have been completed to you.

07 August 2021

How to use Travis continuous integration with your python projects

When your project begin to grow testing it, building it and deploying it becomes more complex.

You usually start doing everything manually but at some point you realize its better to keep everything in scripts. Self made building scripts may be enough for many personal projects but sometimes your functional test last longer than you want to stay in front of your computer. Leaving functional test to others is the solution for that problem and then is when Travis-CI appears.

Travis-CI provides an automation environment to run your functional tests and any script you want afterwards depending on your tests success or failure. You may know Jenkins, actually Travis-CI is similar but simpler and easier. If you are in an enterprise production environment you probably will use Jenkins for continuous integration but if you're with a personal project Travis-CI may be more than enough. Nowadays Travis-CI is free for personal open source projects.

When you register in Travis-CI you are asked to connect to a GitHub account, so you cannot use Travis without one. From there on you will always be asked to log in GitHub to enter your Travis account.

Once inside your Travis account you can click the "+" icon to link any of your GitHub repositories that you want to link to Travis. After you switch every Github repository you want to build from Travis you must include in their root a file called .travis.yml (notice dot at the very beginning), its content tell Travis what to do when he detects a git push over repository.

To have something as an example let assess travis configuration for vdist. You have many other examples in Travis documentation but I think vdist building process covers enough Travis features to give you a good taste of what can you achieve with it. Namely, we are going to study an steady snapshot of vdist's .travis.yml file. So, please, keep an open window with that .travis.yml code while you read this article. I'm explaining an high level overview of workflow and aferwards we're going to explain in deep every particular step of that workflow.

Workflow that travis.yml describes is this:

vdist build workflow (click on it to see it bigger)

When I push to staging branch at Github, Travis is notified by Github through a webhook and then it downloads latest version of your source code and looks for a .travis.yml in its root. With that file Travis can know which workflow to follow.

Namely with vdist, Travis looks for my functional test and run them using a Python 3.6 interpreter and the one marked as nightly in Python repositories. That way I can check my code runs fine in my target Python version (3.6) and I can find in advance any trouble that I can have with next planned release of Python. If any Python 3.6 test fails building process stops and I'm emailed with a warning. If any nightly Python version fails I'm emailed with a warning but building process continues because I only support stable Python releases. That way I can know if I have to workout any future problem with next python release but I let build proces continues if tests succeed with current stable Python version.

If tests suceed staging branch is merged with master branch at Github. Then Github activates two webhooks to next sites:
Those webhooks are one of nicest Github features because they let you integrate many services, from different third party vendors, with your Github workflow.

While Github merges branches and activates webhooks, Travis starts packaging process and deploys generated packages to some public repositories. Packages are generated in three main flavours: wheel package, deb package and rpm package. Wheel pakages are deployed to Pypi, while deb and rpm one are deployed to my Bintray repository and to vdist Github releases page.

That is the overall process. Lets see how all of this is actually implemented in Travis using vdist's .travis.yml.


Travis modes


When Travis is activated by a push in your repository it begins what is called as a build.

A build generates one or more jobs. Every job clones your repository into a virtual environment and then carries out a series of phases. A job finishes when it accomplishes all of its phases. A build finishes when all of its jobs are finished.

Travis default mode involves a lifecycle with these main phases for its jobs:

  1. before_install
  2. install
  3. before-script
  4. script
  5. after_sucess or after_failure
  6. before_deploy
  7. deploy
  8. after_deploy
  9. after_script
Actually there are more optional phases and you don't have even to implement everyone I listed. Actually only script phase is really compulsory. Script phase is where you usually run your functional tests. If you are successful with your tests, phases from after_success to after_script are run but if you are unsuccessful only after_failure is run. Install phase is where you install your dependencies to be ready to run your tests. Deploy phase is where you upload your generated packages to your different repositories so you usually use before_deploy phase to run commands needed to generate those packages.


 

 

Why do we say that a build can have one or more jobs? Because you can set what is called a build matrix. A build matrix is generated when you set you want to test your code with multiple runtimes and or multiple environment variables. For instance, you could set that you want your code tested against python 2.7, 3.6 and a development version of 3.7, so in that case a build matrix with three jobs are generated.

Problem with this mode is that build matrix generates complete jobs, so each one runs an script (test) and a deploy phases. But the thing is that sometimes you just want to run multiples test but just one a deploy phase. For example, guess we are building a python project whose code is prepared to be run both in python 2.7 and 3.6, in that case we would like to test or code against python 2.7 and 3.6 but, on success, generate just one package to upload it to pypi. Oddly, that kind of workflow seems not to be natively supported by Travis. If you try to use its default mode to test against python 2.7 and 3.6 you may find that you generate and deploy your resulting package twice.

 
 
 

 
Thankfully, Travis has what they call stage mode that, although is still officially in beta, works really well and solves the problem I described with default mode. 

In stage mode Travis introduces the stage concept. A stage is formally a group of jobs that run in parallel as part of a sequencial build process composed of multiple stages. Whereas default mode runs jobs parallely from start to end, stage mode organizes work across sequencial stages and inside those stages is where parallel jobs can be run.
 
 

 

In our example and stage can be created to run parallely two jobs to test both python 2.7 and 3.6 and later, in case of success, another stage can be launched to create and deploy a single package.

As this is exactly what I needed for vdist, this mode (stage mode) is the one I'm going to show in this article.


Initial configuration

Take a look to our .travis.yaml example. From lines 1 to 11 you have:



In those lines you set Travis general setup.

You first set your project native language using tag "language" so Travis can provide a virtual environment with proper dependencies installed.

Travis provides two kinds of virtual environments: default virtual environment is a docker linux container, that is lightweight and so it is very quick to be launched; second virtual environment is a full weight virtualized linux image, that takes longer to be launched but sometimes allow you things that containers don't. For instance, vdist uses docker for its building process (that's why I use docker "services" tag), so I have to use Travis full weight virtual environment. Otherwise, if you try running docker inside a docker container you're going to realize it does not work. So, to launch a full weight virtual environment you should set a "sudo: enabled" tag.

By default Travis uses a rather old linux version (Ubuntu Trusty). By the time I wrote this article there were promises about a near availability of a newer version. They say keeping environment at the latest ubuntu release takes too much resources so they update them less frequently. When update arrives you can ask to use it changing "dist: trusty" for whatever version they make available.

Sometimes you will find that using an old linux environment does not provide you with dependencies you actually need. To help with that Travis team try to maintain a customizedly updated Trusty image available. To use that specially updated version you should use "group: travis_latest" tag.


Test matrix

From lines 14 to 33 you have:

 


There, under "python:" tag, you set under which versions of python interpreter you want to run your tests.

You might need to run test depending not only on python interpreter versions but depending of multiples environment variables. Setting them under "matrix:" is your way to go.

You can set some conditions to be tested and be warned if they fail but not to make end the entire job. Those conditions use "allow_failures" tag. In this case I let my build continue if test with a under development (nightly) version of python fails, that way I'm warned that my application can fail with a future release of python but I let it be packaged while tests with official release of python work.

You can set global environment variables to be used by your building scripts using "global:" tag.

If any of those variables have values dangerous to be seen in a public repository you can cypher them using travis tools. First make sure you have travis tools installed. As it is a Ruby client you first have to install that interpreter:

dante@Camelot:~/$ sudo apt-get install python-software-properties dante@Camelot:~/$ sudo apt-add-repository ppa:brightbox/ruby-ng dante@Camelot:~/$ sudo apt-get update dante@Camelot:~/$ sudo apt-get install ruby2.1 ruby-switch dante@Camelot:~/$ sudo ruby-switch --set ruby2.1


Then you can use Ruby package manager to install Travis tool.

dante@Camelot:~/$ sudo gem install travis


With travis tool installed you can now ask it to cypher whatever value you want.

dante@Camelot:~/$ sudo travis encrypt MY_PASSWORD=my_super_secret_password


It will output a "secure:" tag followed by a apparently random string. You can now copy "secure:" tag and cyphered string to your travis.yaml. What we've done here is using travis public key to cypher our string. When Travis is reading our travis.yaml file it will use its private key to decypher every "secure:" tag it finds.


Branch filtering

 


This code comes from lines 36 to 41 of our .travis.yaml example. By default Travis activates for every push in every branch in your repository but usually you want to restrict activation to feature branches only. In my case I activate builds in pushes just over "staging" branch.


Notifications


 

As you can see at lines 44 to 51, you can setup which email recipient should be notified either or both in success or failure of tests:



Testing

From line 54 to 67 we get to testing, the real core of our job:




As you can see, actually it comprises three phases: "before_install", "install" and "script".

You can use those phases in the way more comfortable for you. I've used "before_install" to install all system packages my tests need, while I've used "install" to install all python dependencies.

You launch your tests at "script" phase. In my case I use pytest to run my tests. Be aware  that Travis waits for 10 minutes to receive any screen output from your test. If none is received then Travis thinks that test got stuck and cancel it. This behavior can be a problem if is normal for your test to stay silent for longer than 10 minutes. In that case you should launch your tests using "travis_wait N" command where N is the number of minutes we want our waiting time to extend by. In my case my tests are really long so I ask travis to wait 30 minutes before giving up.


Stages

In our example files stages definitions are from line 71 to 130.

Actually configuration so far is not so different than it would be if we were using Travis default mode. Where big differences really begin is when we find a "jobs:" tag because it marks the beginning of stages definition. From there every "stage" tag marks the start of an individual stage definition.

As we said, stages are executed sequentially following the same order they have in travis.yaml configuration. Be aware that if any of those stages fails the job is ended at that point.

You may ask yourself why testing is not defined as a stage, actually it could be and you should if you wanted alter the usual order and not execute tests at the very beginning. If test are not explicitly defined as a stage then they are executed at the begining, as the first stage.

Let assess vdist stages.

Stage to merge staging branch into master one

From line 77 to 81:

 


If tests have been successful we are pretty sure code is clean so we merge it into master. Merging into master has the side effect of launching ReadTheDocs and Docker Hub webhooks.

To keep my main travis.yaml clean i've taken all the code needed to do the merge to an outside script. Actually that script runs automatically the same console commands we would run to perform the merge manually.


Stage to build and deploy to an artifact service

So far you've checked your code is OK, now it's time to bundle it in a package and upload it to wherever your user are going to download it.

There are many ways to build and package your code (as many as programming languages) and many services to host your packages. For the usual packages hosting services Travis has predefined deployment jobs that automated much of the work. You have an example of that in lines 83-94:

 



There you can se a predefined deployment job for python packages consisting in two steps: packaging code to a wheel package and uploading it to Pypi service.

Arguments to provide to a predefined deployment jobs varies for each of them. Read Travis instructions for each one to learn how to configure the one you need.

But there are times the package hosting service is not included in the Travis supported list linked before. When that happens things get more manual, as you can ever use an script tag to run custom shell commands. You can see it in lines 95-102:


In those lines you can see how I call some scripts defined in folders of my source code. Those scripts use my own tools to package my code in rpm an deb packages. Although in following lines I've used Travis predefined deployment jobs to upload generated packages to hosting services, I could have scripted that too instead. All package hosting services provide a way to upload packages using console commands, so you always have that way if Travis does not provides a predefined job to do it.


 Debugging


Often your Travis setup won't work on the first run so you'll need debug it.

To debug your Travis build your first step is reading your build output log. If log is so big that Travis does not show it entirely in browser then you can download raw log and display it in your text editor. Search in your log for any unexpected error. Later try to run your script in a virtual machine with the same version Travis uses. If error found in Travis build log repeats in your local virtual machine then you have all you need to find out what does not work.

Things get complicated when error does not happen in your local virtual machine. Then error resides in any particularity in Travis environment that cannot be replicated in your virtual machine, so you must enter in Travis environment while building and debug there.

Travis enables by default debugging for private repositories, if you have one then you'll find a debug button just below your restart build one:


If you push that button build will start but, after the very initial steps, it will pause and open a listening ssh port for you to connect. Here is an example output of what you see after using that button:

Debug build initiated by BanzaiMan
Setting up debug tools.
Preparing debug sessions.
Use the following SSH command to access the interactive debugging environment:
ssh DwBhYvwgoBQ2dr7iQ5ZH34wGt@ny2.tmate.io
This build is running in quiet mode. No session output will be displayed.
This debug build will stay alive for 30 minutes.


In last example you would connect with ssh to DwBhYvwgoBQ2dr7iQ5ZH34wGt@ny2.tmate.io to get access to a console in Travis build virtual machine. Once inside your build virtual machine you can run every build step calling next commands:

travis_run_before_install
travis_run_install
travis_run_before_script
travis_run_script
travis_run_after_success
travis_run_after_failure
travis_run_after_script

Those command will activate respective build steps. When expected error appear at last you can debug environment to find out what is wrong.

Problem is that debug button is not available for public repositories, but don't panic you still can use that feature but you'll need to do some extra steps. To enable debug feature you should ask for it to Travis support through email to support@travis.ci.com. They will grant you in just a few hours.

Once you receive confirmation from Travis support about debug is enabled, i'm  afraid you won't see debug button yet. The point is that although debug feature is enabled, for public repositories you can call it through an api call only. You can launch that api call from console with this command:

dante@Camelot:~/project-directory$ curl -s -X POST \ -H "Content-Type: application/json" \ -H "Accept: application/json" \ -H "Travis-API-Version: 3" \ -H "Authorization: token ********************" \ -d "{\"quiet\": true}" \ https://api.travis-ci.org/job/{id}/debug

To get your token you should run this commands before:

dante@Camelot:~/project-directory$ travis login We need your GitHub login to identify you. This information will not be sent to Travis CI, only to api.github.com. The password will not be displayed. Try running with --github-token or --auto if you don't want to enter your password anyway. Username: dante-signal31 Password for dante-signal31: ***************************************** Successfully logged in as dante-signal31! dante@Camelot:~/project-directory$ travis token Your access token is **********************


In order to get a valid number for {id} you need to enter to the last log of the job you want to repeat and expand "Build system information" section. There, you can find the number you need at "Job id:" line.

Just after launching api call you can ssh to your Travis build virtual machine and start your debugging.

To end debugging session you can either close all your ssh windows or cancel build from web user interface.

But you should be aware about a potential danger. Why this debug feature is not available by default for public repositories? because ssh listening server has no authentication so anyone who knows where to connect will get console access to the same virtual machine. How an attacker would know where to connect? watching your build logs, if your repository is not private your travis logs are public by default at realtime, did you remember?. If and attacker is watching your logs and in that very moment you start a debug session she will see the same ssh connection string than you and will be able to connect to virtual machine. Inside Travis virtual machine the attacker can echo your environment variables and any secrets you have in them. Good news are that any new ssh connection will be attached to the same console session so if an intruder sneaks you will see her commands echoing in your session, giving you a chance to cancel debug session before she gets anything critical.