Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

17 December 2021

How to create your own custom Actions for GitHub Actions

In my article about GitHub Actions we reviewed how you can build entire workflows just using premade bricks, called Actions, that you can find at GitHub Marketplace

That marketplace has many ready to use Actions but chances are that sooner than later you'll need to do something that has no action at markeplace. Sure, you can still use your own scripts. In that article I used a custom script (./ci_scripts/ at section "Sharing data between steps and jobs". Problem with that approach is that if you want to do the same task in another project you need to copy your scripts between project repositories and adapt them. You'd better transform your scripts to Actions to be easily reusable not only in your projects but publicly with other people projects.

Every Action you can find at marketplace is made in one of the ways I'm going to explain here. Actually if you enter to any Action marketplace page you will find a link, at right hand side, to that Action repository so you can asses it and learn how it works.

There are 3 main methods to build your own GitHub Actions:

  • Composite Actions: They are the simpler, and quicker, but a condition to use this way is that your Action should be based in a self-sufficient script that needs no additional dependency to be installed. It should run only with an standard linux distribution offers.
  • Docker Actions: If you need any dependency to make your script work then you'll need t follow this way.
  • Javascript Actions: Well... you can write your own Actions with javascript, but I don't like that language so I'm not going to include it in this article.
The problem with your Actions dependencies is that they can pollute the workflow environment where your Action is going to be used. Your action dependencies can even collide with those of the app being built. That's why we need to encapsulate our Action and its dependencies to be independent of the environment of the app being built. Of course, this problem does not apply if your Action is intended to setup the workflow environment installing something. There are Actions to, for example, installing and setup Pandoc to be used by the workflow. Problem arises when your Action is intended to do one specific task not related to installing something (for example copying files) and it does install something under the table, as that con compromise the workflow environment. So, best option if you need to install anything to make your Action work is installing it in a docker container and make your Action script run from inside that container, entirely independent of workflow environment. 

Composite Actions

If your Action just needs a bunch of bash commands or a python script exclusively using its built-in standard library then composite Actions is your way to go.

As an example of a composite action, we are going to review how my Action rust-app-version works. That Action looks for a rust Cargo.toml configuration file and read which version is set there for the rust app. That version string is offered as the Action output and you can use that output in your workflow, for instance, to tag a new release at GitHub. This action only uses modules available at standard python distribution. It does not need to install anything at user runner. It's true that there is a requirements.txt at rust-app-version repository but those are only dependencies for unit testing.

To have your own composite Action you first need a GitHub repository to host it. There you can place the few files really needed for your action.

At the very root of your repository you need a file called "action.yml". This file is really important as it models your Action. Your users should be able to think about your action as a black box. Something where you enter some inputs and you receive any output. Those inputs and outputs are defined in action.yml.

If we read the action.yml file at rust-app-version we can see that this action only needs an input called "cargo_toml_folder" and actually that input is optional as it can receive a value of "." if it is omitted when this action is called:

Outputs are somewhat different as the must refer to the output of an specific step in your action:

In last section we specify that this action is going to have just one output called "app_version" and that output is going to be the output called "version" of an step with an id value of "get-version".

Those inputs and outputs define what your action consumes and offers, i.e. what your action does. How your action does it is defined under "runs:" tag. There you set that your Action is a composite one and you call a sequence of steps. This particular example only has one step but you can have as many steps as you need:

Take note of line 22 where that steps receives a name: "get-version". That name is important to refer to this step from outputs configuration.

Line 24 is where your command is run. I only executed one command. If you needed multiple commands to be executed inside the same step, then you should use a bar after run: "run: |". With that bar you mark that next few lines (indented under "run:" tag) are lines separated commands to be executed sequentially.

Command at line 24 is interesting because of 3 points:
  • It calls an script located at our Action repository. To refer to our Action repository root use the github.action_path environment variable. The great thing is that although our script is hosted at its repository, GitHub runs it so that it can view the repository of the workflow from where it is called. Our script will see the workflow repository files as it was run from its root.
  • At the end of the line you may see how inputs are used through inputs context.
  • The weirdest thing of that line is how you setup that step output. You set a bash step output doing an echo "::set-output name=<ouput_name>::<output_value>". In this case name is version and its value is what prints to console. Be aware that output_name is used to retrieve that output after step ends through ${{ steps.<id>.outputs.<output_name> }}, in this case ${{ steps.get_version.outputs.version }}
Apart from that, you only need to setup your Action metadata. That is done in the first few lines:

Be aware that "name:" is the name your action will have at GitHub Marketplace. The another parameter, "description:", its the short explanation that will be shown along name in the search results at Markeplace. And "branding:" is only the icon (from Feather icon suite) and color that will represent your action at Markeplace.

With those 24 lines at action.yml and your script at its respective path (here at rust_app_version/ subfolder), you can use your action. You just need to push the button that will appear in your repository to publish your action at Marketplace. Nevertheless, you'd better read this article to the end because I have some recommendations that may be helpful for you.

Once published, it becomes visible for other GitHub users and a Marketplace page is created for your action. To use an Action like this you only need to include in your workflow a configuration like this:

Docker actions

If your Action needs to install any dependency then you should package that Action inside a docker container. That way your Action dependencies won't mess with your user workflow dependencies.

As an example of a docker action, we are going to review how my Action markdown2man works. That action takes a file and converts it to a man page. Using it you don't have to keep two sources to document your console application usage. Instead of that you may document your app usage only with and convert that file to a man page.

To do that conversion markdown2man needs Pandoc package installed. But Pandoc has its respective dependencies, so installing them at user runner may break his workflow. Instead of it, we are going to install those dependencies in a docker image and run our script from that image. Remember that docker lets you execute scripts from container interacting with host files.

As with composite Actions, we need to create an action.yml at Action repository root. There we set our metadata, input and outputs like we do with composite actions. The difference here is that this specific markdown2man Action does not emit any output, so that section is omitted. Section for "runs:" is different too:

In that section we specify this Action is a docker one (at "using:"). There are two ways use a docker image in your action: 
  • Generate an specific image for that action and store it at GitHub docker registry. In that case you use the "image: Dockerfile" tag.
  • Use a prebuilt image from DockerHub registry. To do that you use the "image: <dockerhub_user>:<docker-image-tag>" tag.
If the image you are going to build is exclusively intended to be used at GitHub Action I would follow Dockerfile option. Here, with markdown2man we follow the Dockerfile approach so a docker image is build any time Action is run after a Dockerfile update. Generated image is cached at GitHub registry to be offered quicker to further Actions. Remember a Dockerfile is a kind of a recipe to "cook" an image, so commands that file contains are only executed when the image is built ("cooked"). Once build, the only command that is run is the one you set at entrypoint tag, passing in arguments set at "docker run".                                                                                                                                                                          The "args:" tag has every parameter to be passed to our script at the container. You will probably use your input here to be passed to our script. Be aware that as it happened in composite action, here user repository files are visible to our container.

As you may suspect by now, docker actions are more involved than composite Actions because of the added complexity of creating the Dockerfile. The Dockerfile for markdown2man is pretty simple. As markdown2man script is a python one, we make our image derive from the official docker image for version 3.8:

Afterwards, we set image metadata:

To configure your image, for example installing things, you use RUN commands.

ENV command generates environment variables to be used in your Dockerfile commands:

You use COPY command to copy your requirements.txt from your repository and include it in your generated image. Your scripts are copied fro your Action repository to container following the same approach:

After script files are copied, I like to make then executable and link them from /usr/bin/ folder to include it at the system path:

After that, you set your script as the image entrypoint so this script is run once image is started and that script is provided with arguments you set at the "args:" tag at action.yml file.

You can try that image at your computer building that image from the Dockerfile and running that image as a container:

dante@Camelot:~/$ docker run -ti -v ~/your_project/:/work/ dantesignal31/markdown2man:latest /work/ mancifra


For local testing you need to mount your project folder as volume (-v flag) if your scripts to process any file form that repository. Last two argument in the example (work/ and mancifra) are the arguments that must be passed to entrypoint.

And that's all. Once you have tested everything you can publish your Action and use it in your workflows:

With a call like that a man file called cifra.2.gz should be created at man folder. If manpage_folder does not exist then markdown2man creates it for you.

Your Actions are first class code

Although your Action will likely be small sized, you should take of them as you would with your full-blown apps. Be aware that many people will find and your Actions through Marketplace in their workflows. An error in your Action can brreak many workflows so be diligent and test your Action as you would with any other app.

So, with my Actions I follow the same approach as in other projects and I set up a GitHub workflow to run tests against any pushes in a staging branch. Only once those tests succeed I merge staging with main branch and generate a new release for the Action.

Lets use the markdown2man workflow as example. There you can read that we have two test types:

  • Unit tests: They check the python script markdown2man is based on.

  • Integration tests: They check markdown2man behaviour as a GitHub Action. Although your Action was not published yet you can install it from a workflow in the same repository (lines 42-48). So, what I do is calling the Action from the very same staging branch we are testing and I use that Action with a markdown I have ready at test folder. If a proper man page file is generated then integration test is passed (line 53). Having the chance to test an Action against its own repository is great as it lets you test your Action as people would use it without needing to publish it.

In addition to testing it, you should write a for your action in order to explain in detail how to use your Action. In that document you should include at least this information:

  • A description of what the action does.
  • Required input and output arguments.
  • Optional input and output arguments.
  • Any secret your action needs.
  • Any environment variable your action uses.
  • An example of how to use your action in a workflow.

And you should add too a LICENSE file explaining the usage terms for your Action.


The strong point of GitHub Action is the high degree of reusability and sharing it promotes. Every time you find yourself repeating the same bunch of commands you are encouraged to make and Action with those commands and share it through Marketplace. Doing that way you get a piece of functionality easier to use throughout your workflows than copy-pasting commands and you contribute to improve the Marketplace so that others can benefit too from that Action. 

Thanks to this philosophy GitHub Marketplace has grown to a huge amount of Actions, ready to use and to help you to save you from implementing that functionality by your own.

25 September 2021

How to create Python executables with PyOxidizer

Python is a great development language. But it lacks of a proper distribution and packaging support if you want end user get your application. Pypi and wheel packages are more intendend for developers to install their own apps dependencies, but and end user will feel as painful to use pip and virtualenv to install and run a python application.

There are some projects trying to solve that problem as PyInstaller, py2exe or cx_Freeze. I maintain vdist, that its closely related with this problem and tries to solve it creating debian, rpm and archlinux packages from python applications. In this article I'm going to analyze PyOxidizer. This tool is written in a language I'm really loving (Rust), and follows and approach somewhat similar to vdist as it bundles your application along a python distribution but besides compiles the entire bundle into an executable binary.

To structure this tutorial, I'm going to build and compile my Cifra project. You can clone it at this point to follow this tutorial step by step.

First thing to be aware is that PyOxidizer bundles your application with a customized Python 3.8 or 3.9 distribution, so your app should be compatible with one of those. You will need a C compiler/toolchain to build with PyOxidizer, If you don't have one PyOxidizer outputs instructions to install one. PyOxidizer uses Rust toolchain too, but it downloads it in the background for you so it's not a dependency you should worry about.

PyOxidizer installation

You have some ways to install PyOxidizer (downloading a compiled release from its GitHub Page, compiling from source) but I feel straighter and cleaner to install into your project's virtualenv using pip.

Guess you have Cifra project cloned (in my case at path ~/Project/cifra), inside it you created a virtualenv inside a venv folder and you activated that virtualenv, then you can install PyOxidizer inside that virtualenv doing:

(venv)dante@Camelot:~/Projects/cifra$ pip install pyoxidizer
Collecting pyoxidizer
Downloading pyoxidizer-0.17.0-py3-none-manylinux2010_x86_64.whl (9.9 MB)
|████████████████████████████████| 9.9 MB 11.5 MB/s
Installing collected packages: pyoxidizer
Successfully installed pyoxidizer-0.17.0


You can check you have pyoxidizer properly installed doing:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer --help
PyOxidizer 0.17.0
Gregory Szorc <>
Build and distribute Python applications

pyoxidizer [FLAGS] [SUBCOMMAND]

-h, --help
Prints help information

Use a system install of Rust instead of a self-managed Rust installation

-V, --version
Prints version information

Enable verbose output

add Add PyOxidizer to an existing Rust project. (EXPERIMENTAL)
analyze Analyze a built binary
build Build a PyOxidizer enabled project
cache-clear Clear PyOxidizer's user-specific cache
find-resources Find resources in a file or directory
help Prints this message or the help of the given subcommand(s)
init-config-file Create a new PyOxidizer configuration file.
init-rust-project Create a new Rust project embedding a Python interpreter
list-targets List targets available to resolve in a configuration file
python-distribution-extract Extract a Python distribution archive to a directory
python-distribution-info Show information about a Python distribution archive
python-distribution-licenses Show licenses for a given Python distribution
run Run a target in a PyOxidizer configuration file
run-build-script Run functionality that a build script would perform


PyOxidizer configuration

Now create an initial PyOxidizer configuration file at your project root folder:

(venv)dante@Camelot:~/Projects/cifra$ cd ..
(venv) dante@Camelot:~/Projects$ pyoxidizer init-config-file cifra
writing cifra/pyoxidizer.bzl

A new PyOxidizer configuration file has been created.
This configuration file can be used by various `pyoxidizer`

For example, to build and run the default Python application:

$ cd cifra
$ pyoxidizer run

The default configuration is to invoke a Python REPL. You can
edit the configuration file to change behavior.


Generated pyoxidizer.bzl is written in Starlark, a python dialect for configuration files, so we'll feel at home there. Nevertheless, that configuration file is rather long and although it is fully commented at first it is not clear how to align all the moving parts. PyOxidizer is pretty complete too but it can be rather overwhelming for anyone that only wants to get started. Filtering the entire PyOxidizer documentation site, I've found this section as the most helpful to customize PyOxidizer configuration file.

There are many approaches to build binaries using PyOxidizer. Let's assume you have a setuptools file for your project, then we can ask PyOxidizer run that inside it's customized python. To do that add next line before return line of pyoxidizer.bzl make_exe() function:

# Run cifra's own and include installed files in binary bundle.

In package_path parameter you have to provide your path. I've provided a CWD argument because I'm assuming that pyoxidizer.bzl and are at the same folder.

Next thing to customize at pyoxidizer.bzl is that it defaults to build a Windows MSI executable. At Linux we want an ELF executable output. So, first comment line near the end that registers a target to build a msi installer:

#register_target("msi_installer", make_msi, depends=["exe"])

We need also an entry point for our application. It would be nice if PyOxidicer would take entry_points parameter configuration but it doesn't. Instead we have to provide it manually through pyoxidizer.bzl configuration file. In our example just find the line at make_exe() function where python_config variable is created and place after:

python_config.run_command = "from cifra.cifra_launcher import main; main()"

Building executable binaries

Just now, you can run "pyoxidizer build" and pyoxidizer will begin to bundle our application.

But Cifra has a very specific problem at this point. If you try to run build over cifra with configuration so far, you will get next error:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build
error[PYOXIDIZER_PYTHON_EXECUTABLE]: adding PythonExtensionModule<name=sqlalchemy.cimmutabledict>

Caused by:
extension module sqlalchemy.cimmutabledict cannot be loaded from memory but memory loading required
--> ./pyoxidizer.bzl:272:5
272 | exe.add_python_resources(exe.setup_py_install(package_path=CWD))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ add_python_resources()

error: adding PythonExtensionModule<name=sqlalchemy.cimmutabledict>

Caused by:
extension module sqlalchemy.cimmutabledict cannot be loaded from memory but memory loading required


PyOxidizer tries to embed every dependency of your application inside produced binary (in-memory mode). This is nice because you end with an unique distributable binary file and performance to load those dependencies is improved. Problem is that not every package out there admits to be embedded that way. Here SQLAlchemy fails to be embedded. 

In that case, those dependencies should be stored next to produced binary (filesystem-relative mode). PyOxidizer will link those dependencies inside binary using their places relative to produced binary, so when distributing we must pack produced binary and its dependencies files keeping their relative places.

One way to deal with this problem is asking PyOxidizer to keep things in-memory whenever it can and fallback to filesystem-relative when not. To do that uncomment next two lines from make_exe() function at configuration file:

# Use in-memory location for adding resources by default.
policy.resources_location = "in-memory"

# Use filesystem-relative location for adding resources by default.
# policy.resources_location = "filesystem-relative:prefix"

# Attempt to add resources relative to the built binary when
# `resources_location` fails.
policy.resources_location_fallback = "filesystem-relative:prefix"

Doing this you may make things work in your application, but for Cifra things keep failing despite build go further:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build
adding extra file prefix/sqlalchemy/ to .
installing files to /home/dante/Projects/cifra/./build/x86_64-unknown-linux-gnu/debug/install
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "cifra.cifra_launcher", line 31, in <module>
File "cifra.attack.vigenere", line 21, in <module>
File "cifra.tests.test_simple_attacks", line 2, in <module>
File "pytest", line 5, in <module>
File "_pytest.assertion", line 9, in <module>
File "_pytest.assertion.rewrite", line 34, in <module>
File "_pytest.assertion.util", line 13, in <module>
File "_pytest._code", line 2, in <module>
File "_pytest._code.code", line 1223, in <module>
AttributeError: module 'pluggy' has no attribute '__file__'
error: cargo run failed


This error seems somewhat related to this PyOxidizer nuance

To deal with this problem I must keep everything filesystem related:

# Use in-memory location for adding resources by default.
# policy.resources_location = "in-memory"

# Use filesystem-relative location for adding resources by default.
policy.resources_location = "filesystem-relative:prefix"

# Attempt to add resources relative to the built binary when
# `resources_location` fails.
# policy.resources_location_fallback = "filesystem-relative:prefix"

Now build go smoothly:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build
installing files to /home/dante/Projects/cifra/./build/x86_64-unknown-linux-gnu/debug/install
(venv)dante@Camelot:~/Projects/cifra$ls build/x86_64-unknown-linux-gnu/debug/install/
cifra prefix

As you can see, PyOxidizer generated an ELF binary for our application and stored all of its dependencies in prefix folder:

(venv)dante@Camelot:~/Projects/cifra$ ls build/x86_64-unknown-linux-gnu/debug/install/prefix concurrent config-3 py toml multiprocessing __pycache__ sqlalchemy sqlite3
asyncio greenlet pydoc_data turtledemo
attr ctypes pyparsing curses _pytest unittest html pytest urllib http dbm idlelib packaging venv importlib __phello__ distutils _distutils_hack iniconfig pip
cifra email wsgiref encodings pkg_resources ensurepip json setuptools xml xmlrpc lib2to3 test_common
collections pluggy logging zoneinfo tkinter


If you want to name that folder with a more self-explanatory name, just change "prefix" for whatever you want in configuration file. For instance, to name it "lib":

# Use in-memory location for adding resources by default.
# policy.resources_location = "in-memory"

# Use filesystem-relative location for adding resources by default.
policy.resources_location = "filesystem-relative:lib"

# Attempt to add resources relative to the built binary when
# `resources_location` fails.
# policy.resources_location_fallback = "filesystem-relative:prefix"

Let's see how our dependencies folder changed:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build
installing files to /home/dante/Projects/cifra/./build/x86_64-unknown-linux-gnu/debug/install
(venv)dante@Camelot:~/Projects/cifra$ls build/x86_64-unknown-linux-gnu/debug/install/
cifra lib

Building for many distributions

So far, you have get a binary that can be run in any Linux distribution like the one you used to build that binary. Problem is that our binary can fail if its run in other distributions:

(venv)dante@Camelot:~/Projects/cifra$ ls build/x86_64-unknown-linux-gnu/debug/install/
cifra lib
(venv) dante@Camelot:~/Projects/cifra$ docker run -d -ti -v $(pwd)/build/x86_64-unknown-linux-gnu/debug/install/:/work ubuntu
(venv) dante@Camelot:~/Projects/cifra$ docker attach 36ad
root@36ad7e78c00f:/# ls /work
cifra lib
root@36ad7e78c00f:/# /work/cifra --help
usage: cifra [-h] {dictionary,cipher,decipher,attack} ...

Console command to crypt and decrypt texts using classic methods. It also performs crypto attacks against those methods.

positional arguments:
Available modes
dictionary Manage dictionaries to perform crypto attacks.
cipher Cipher a text using a key.
decipher Decipher a text using a key.
attack Attack a ciphered text to get its plain text

optional arguments:
-h, --help show this help message and exit

Follow cifra development at: <>
root@36ad7e78c00f:/# exit


Here our built binary runs in another ubuntu because my personal box (Camelot) is an ubuntu (actually Linux Mint). Our generated binary will run right in other machines with the same distribution like the one I used to build binary. 

But let's see what happens if we run our binary in a different distribution:

(venv)dante@Camelot:~/Projects/cifra$ docker run -d -ti -v $(pwd)/build/x86_64-unknown-linux-gnu/debug/install/:/work centos
Unable to find image 'centos:latest' locally
latest: Pulling from library/centos
a1d0c7532777: Pull complete
Digest: sha256:a27fd8080b517143cbbbab9dfb7c8571c40d67d534bbdee55bd6c473f432b177
Status: Downloaded newer image for centos:latest
(venv) dante@Camelot:~/Projects/cifra$ docker attach 376
[root@376c6f0d085d /]# ls /work
cifra lib
[root@376c6f0d085d /]# /work/cifra --help
/work/cifra: /lib64/ version `GLIBC_2.29' not found (required by /work/cifra)
[root@376c6f0d085d /]# exit


As you can see, our binary fails on Centos. That happens because libraries are not named the same across distributions so our compiled binary fails to load shared dependencies it needs to run.

To solve this you need to build a fully statically linked binary, so it has no external dependencies at all. 

To build that kind of binaries we need to update Rust toolchain to build for that kind of targets:

(venv)dante@Camelot:~/Projects/cifra$ rustup target add x86_64-unknown-linux-musl
info: downloading component 'rust-std' for 'x86_64-unknown-linux-musl'
info: installing component 'rust-std' for 'x86_64-unknown-linux-musl'
30.4 MiB / 30.4 MiB (100 %) 13.9 MiB/s in 2s ETA: 0s


To make PyOxidizer build a binary like that you are supossed to do:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build --target-triple x86_64-unknown-linux-musl
Processing greenlet-1.1.1.tar.gz
Writing /tmp/easy_install-futc9qp4/greenlet-1.1.1/setup.cfg
Running greenlet-1.1.1/ -q bdist_egg --dist-dir /tmp/easy_install-futc9qp4/greenlet-1.1.1/egg-dist-tmp-zga041e7
no previously-included directories found matching 'docs/_build'
warning: no files found matching '*.py' under directory 'appveyor'
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '*.pyd' found anywhere in distribution
warning: no previously-included files matching '*.so' found anywhere in distribution
warning: no previously-included files matching '.coverage' found anywhere in distribution
error: Setup script exited with error: command 'musl-clang' failed: No such file or directory
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: "command [\"/home/dante/.cache/pyoxidizer/python_distributions/python.70974f0c6874/python/install/bin/python3.9\", \"\", \"install\", \"--prefix\", \"/tmp/pyoxidizer-setup-py-installvKsIUS/install\", \"--no-compile\"] exited with code 1" }', pyoxidizer/src/py_packaging/
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


As you can see, I get an error that I've been unable to solve. Because of that I've filled an issue at PyOxidizer GitHub page. PyOxidizer author kindly answered my issue pointing I needed musl-clang command in my system. I've not found any package with musl-clang, so I guess it's something you have to compile from source. I've tried to google it but I haven't found a clear way to get that command up and running (I have to admit I'm not a C/C++ ninja). So, I guess I'll have to use pyoxidizer without static compiling until musl-clang dependency dissapears ot pyoxidizer documentation explains how to get that command.


Althought I haven't got it work to create statically linked binaries, PyOxidizer feels like a promising tool. I think I can use at vdist to speed up packaging process. As in vdist I use specific containers for every type of package inability to create statically linked binaries doesn't seem a blocker. It seems actively developed and has a lot of contributors, so it seems a good chance to help to simplify python packaging and deployment of python applications.

18 September 2021

How to parse console arguments in your Python application with ArgParse

There is one common pattern for every console application: it has to manage user arguments. Few console applications runs with no user arguments, instead most applications needs user provided arguments to run properly. Think of ping, it needs at least one argument: IP address or URL to be pinged:

dante@Camelot:~/$ ping
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=113 time=3.73 ms
64 bytes from icmp_seq=2 ttl=113 time=3.83 ms
64 bytes from icmp_seq=3 ttl=113 time=3.92 ms
--- ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2005ms
rtt min/avg/max/mdev = 3.733/3.828/3.920/0.076 ms


When you run a python application you get in sys.argv list every argument user entered when calling your application from console. Console command is split using whitespaces as separator a each resulting token is a sys.argv list element. First element of that list is your application name. So, if we implemented our own python version of ping sys.argv 0 index element would be "ping" and 1 index element would be "".

You could do your own argument parsing just assessing sys.argv content by yourself. Most developers do it that way... at the begining. If you go that way you'll soon realize that it is not so easy to perform a clean management of user argument and that you have to repeat a lot of boilerplate in your applications. so, there are some libraries to easy argument parsing for you. One I like a lot is argparse.

Argparse is a built-in module of standard python distribution so you don't have to download it from Pypi. Once you understand its concepts it's easy, very flexible and it manages for you many edge use cases. It one of those modules you really miss when developing in other languages.

First concept you have to understand to use argparse is Argument Parser concept. For this module, an Argument Parser is a fixed word that is followed by positional or optional user provided arguments. Default Argument Parser is the precisely the application name. In our ping example, "ping" command is an Argument Parser. Every Argument Parser can be followed by Arguments, those can be of two types:

  • Positional arguments: They cannot be avoided. User need to enter them or command is assumed as wrong. They should be entered in an specific order. In our ping example "" is a positional argument. We can meet many other examples, "cp" command for instance needs to positional arguments source file to copy and destination for copied file.
  • Optional arguments: They can be entered or not. They are marked by tags. Abbreviated tags use an hyphen and a char, long tags use double hyphens and a word. Some optional arguments admit a value and others not (they are boolean, true if they are used or false if not).

Here you can see some of the arguments cat command accepts:

dante@Camelot:~/$ cat --help
Usage: cat [OPTION]... [FILE]...
Concatenate FILE(s) to standard output.

With no FILE, or when FILE is -, read standard input.

-A, --show-all equivalent to -vET
-b, --number-nonblank number nonempty output lines, overrides -n
-e equivalent to -vE
-E, --show-ends display $ at end of each line
-n, --number number all output lines
-s, --squeeze-blank suppress repeated empty output lines
-t equivalent to -vT
-T, --show-tabs display TAB characters as ^I
-u (ignored)
-v, --show-nonprinting use ^ and M- notation, except for LFD and TAB
--help display this help and exit
--version output version information and exit

cat f - g Output f's contents, then standard input, then g's contents.
cat Copy standard input to standard output.

GNU coreutils online help: <>
Full documentation at: <>
or available locally via: info '(coreutils) cat invocation'


There are more complex commands that include what is called Subparsers. Subparsers appear when your command have other "fixed" words, like verbs. For instance, "docker" command have many subparsers: "docker build", "docker run", "docker ps", etc. Every subparser can accept its own set of arguments. A subparser can accept other subparsers too, so you can get pretty complex command trees.

To show how to use argparse module works, I'm going to use my own configuration for my application Cifra. Take a look to it's file at GitHub.

In its __main__ section you can see how arguments are processed at high level of abstraction:

You can see we are using sys.argv to get arguments provided by user but discarding first one because is the root command itself: "cifra".

I like using sys.argv as a default parameter of main() call because that way I can call main() from my functional tests using an specific list of arguments.

Arguments are passed to parse_arguments, that is a my function to do all argparse magic. There you can find the root configuration for argparse:

There you configure root parser, the one linked to your base command, defining a description of you command and a final note (epilog) for your command. Those texts will appear when your user call your command with --help argument.

My application Cifra has some subparsers, like docker command has. To allow a parser to have subparsers you must do:

Once you have allowed your parser to have subparser you can start to create those subparsers. Cifra has 4 subparsers at this level:



We'll see that argparse returns a dict-like object after its parsing. If user called "cifra dictionary" then "mode" key from that dict-like object will have a "dictionary" value.

Every subparser can have its own subparser, actually "dictionary" subparser has more subparser adding more branches to the command tree. So you can call "cifra dictionary create", "cifra dictionary update", "cifra dictionary delete", etc.

Once you feel a parser does not need more subparsers you can add its respective arguments. For "cipher" subparser we have these ones:

In these arguments, "algorithm" and "key" are positional so they are compulsory. Arguments placed at that position will populate values for "algorithm" and "key" keys in dict-like object returned by argparse.

Metavar parameter is the string we want to be used to represent this parameter when help is called with --help argument. Help parameter as you can guess is the tooltip used to explain this parameter when --helps argument is used.

Type parameter is used to preprocess argument given by user. By default arguments are interpreted like strings, but if you use type=int argument will be converted to an int (or throw an error if it can't be done). Actually str or int are functions, you can use your own ones. For file_to_cipher parameter I used _check_is_file function:

As you can see, this function check if provided arguments points to a valid path name and if that happens returns provided string or raises an argparse error otherwise.

Remember parser optional parameters are always those preceded by hyphens. If those parameters are followed by an argument provided by users, that argument will be the value for dict-like object returned by arparser key called like optional parameter long name. In this example line 175 means that when user types "--ciphered_file myfile.txt", argparser will return a key called "ciphered_file" with value "myfile.txt".

Sometimes, your optional parameters won't need values. For example, they could be a boolean parameter to do verbose output:

parser.add_argument("--verbose", help="increase output verbosity", action="store_true", default=False)

With action="store_true" the dict-like object will have a "verbose" key with a boolean value: true if --verbose was called or false otherwise.

Before returning from parse_arguments I like to filter returned dict-like object content:

In this section parsed_arguments is the dict-like object that argparse returns after parsing provided user arguments with arg_parser.parse_args(args). This object includes those uncalled parameters with None value. I could leave it that way and check afterwards if values are None before using them, but a I fell somewhat cleaner remove those None'd values keys and just check if those keys are present or not. That's why I filter dict-like object at line 235.

Back to main() function, after parsing provided arguments you can base your further logic depending on parsed arguments dict contents. For example, for cifra:

This way, argparser lets you deal with provided user arguments in a clean way giving you for free help messages and error messages, depending on whether user called --help or entered a wrong or incomplete command. Actually generated help messages are so clean that I use them for my repository file and as a base point for my man pages (what that is an story for another article).

15 April 2020

Programming in Python 3 by Mark Summerfield

Python is an extremely powerful but easy to learn programming language. If you have prior knowledge of any programming language you can learn Python in just few hours and you can be a proficient developers in just some days.

However this simplicity can be dangerous too if you stay at the basics and don´t go further into this wonderful language. It's easy to keep using it just as an script language and think that that is all with Python. But actually Python is complete development language that you can use with almost anything you would need with an ease and expressiveness not easy to find in any other language.

This books focus in Python 3, the new generation of this language that now is its standard. Such an an evolution lap was not able to offer full backwards compatibility with former Python 2.7 branch. But actually this books gives a good guidance to promote your Python code to 3 branch. Book's content is extremely complete covering from basics (flow control, strings, files) to advanced topics (decorator, context manager, functors, abstract classes and metaclasses and a huge etc). Those topics will prove to be very useful to help you to translate easily mental concepts to code. All those concepts are explained with clean code, well commented and easy to understand.

As a summary, this is a good book both for newbies who want to start from the begining and for those with good expertise that want to get full advantage from Python. Besides, once finished this book is a good language reference to keep at hand on your bookshelf.

14 June 2017

Packaging Python programs - DEB (and RPM) packages

Native way to distribute Python code is through PyPI, as I explained in a previous article. But to be honest PyPI has some drawbacks that limit its use to developing environments.

Problem with PyPI is that it does not implement a proper package management so pip uninstall doesn't work properly, and there's no way to rollback to a previous state. Besides pip installation procedures often build packages from source which can be painfully slow for a complete virtualenv.

Deploying Python application on production environments should be fast and should be done in a manner that clean uninstalling would be always available.

Debian has a really solid and stable package managing system. It can check dependencies either in installation and unistallation, and run pre and post installation scripts. That's why is one of the most used package managing system in Linux ecosystem.

This article is going to cover some ways to package our Python applications in Debian packages easily installable in Debian/Ubuntu Linux distros.

Debian package format

Although other linux packaging systems rely on binary formats, Debian packages use a simply ar archive to group in a single file a pair of tar archives, compressed with gzip or bzip. One of those two archives contains a folder tree with all application files and the another one the package configuration files.

Package configuration files are bare text files so the classic way of creating Debian packages involves only a bit of folder creation, textual editing of some configuration files and in it's simplest form just a command to be run:

dante@Camelot:~/project-directory$ debuild -us -uc

You can fin a good tutorial about the topic here.

The thing is not complex but can be tricky and you have to create and edit many textual files. It's not hard to see why developers have created so many tools to automate the task. We are going to see some of them specialised in packaging Python applications.


If you are already used to python packaging for PyPI it shouldn't be hard for you to grasp stdeb concepts. If you don't know what I'm speaking about then you should read article I linked at the beggining of this one.

As stdeb call some Debian/Ubuntu tools under the hood you have to use one of those distros or any of their derivatives. If you are in a Debian/Ubuntu, you can download stdeb from standard linux repositories:

dante@Camelot:~$ sudo aptitude search stdeb
p   python-stdeb    - Python to Debian source package conversion utility                                 p   python3-stdeb   - Python to Debian source package conversion plugins for distutils                  
dante@Camelot:~$ sudo aptitude install python-stdeb python3-stdeb

But if you want to get a newer version you can also install stdeb from PyPI as a Python package.

The best workflow to use stdeb involves creating the necessary files to create a PyPI package, that way stdeb can use your file to get the info for creating debian package configuration files.

So, let's suppose you have your application ready to be packaged for PyPI with your already done. For this example, I've cloned this git repository.

To make stdeb generate a source debian package you just have to do:

dante@Camelot:~/geolocate$ python3 --command-packages=stdeb.command sdist_dsc

The --command-packages stuff can be cumbersome but the thing doesn't work without it, so rely on me and include it.

After some verbosy output you'll realize some folders are added to your working directory. You have to focus on the one called deb_dist. That folder contains mainly three files: a .dsc file, a .orig.tar.gz one and a .diff.gz. Those three files together are what we call a debian source package.

Inside deb_dist folder it you'll find another one called as your project with current version appended. That folder contains all generated data to compile a binary debian package, so move there and run next command:

dante@Camelot:~/geolocate/deb_dist/glocate-1.3-0$ dpkg-buildpackage -rfakeroot -uc -us

The fakeroot thing is just a flag to be able to build a debian package withount being logged as root user. That is the command recommended in stdeb documentation, but I've realized that you can use debuild command seen before:

dante@Camelot:~/geolocate/deb_dist/glocate-1.3-0$ debuild -uc -us

In the end debuild is a wrapper for dpkg-buildpackage that automates some things you should do manually otherwise.

After either of those commands you should find a source debian package in your deb_dist folder:

dante@Camelot:~/geolocate/deb_dist/glocate-1.3-0$ ls *.deb python3-glocate_1.3.0-1_all.deb

Those are the "two step" method, you can reduce it to "just one step" method running next command:

dante@Camelot:~/geolocate$ python3 --command-packages=stdeb.command bdist_deb

Although both methods ends with a .deb package in debian_dist folder, you should be aware that generated debian package is architecture dependent (althought it package names include a "_all" tag), that means that if you generate the package in an amd64 Ubuntu box chances are that you will face problems if you try to install resulting package in another Ubuntu box with a different architecture. To avoid this problem you have two options:
  • Use virtual machines to compile a debian package in every target debian architecture.
  • Use a PPA repository: Ubuntu offers personal hosting space for packaging project. You just upload there your source package (the content of deb_dist folder after running "python3 --command-packages=stdeb.command sdist_dsc"), and Ubuntu server compile it to each Ubuntu target architecture (mainly x86 and amd64). After compilation, created packages are available in your personal PPA until you remove them or replace them with newer versions.
If your Python applications just use built-in packages then your packaging trip ends here but if you use additional packages, for instance downloaded from PyPI, chances are that your compiled package doesn't include them properly.

Following our example, if you check geolocate's you should see that it depends of these additional packages:

install_requires=["geoip2>=2.1.0", "maxminddb>=1.1.1", "requests>=2.5.0", "wget>=2.2"]

Lets see if these dependencies have been included in generated debian package metadata:

dante@Camelot:~/geolocate/deb_dist$ dpkg -I python3-glocate_1.3.0-1_all.deb [...] Depends: python3, python3-requests, python3:any (>= 3.3.2-2~) [...]

Obviously they have not been included. In fact our compiling command warned us that dependency check failed at building time. If we check building output we'll find this output:
I: dh_python3 pydist:184: Cannot find package that provides geoip2. Please add package that provides it to Build-Depends or add "geoip2 python3-geoip2-fixme"
line to debian/py3dist-overrides or add proper  dependency to Depends by hand and ignore this info.
I: dh_python3 pydist:184: Cannot find package that provides maxminddb. Please add package that provides it to Build-Depends or add "maxminddb python3-maxmindd
b-fixme" line to debian/py3dist-overrides or add proper  dependency to Depends by hand and ignore this info.
I: dh_python3 pydist:184: Cannot find package that provides wget. Please add package that provides it to Build-Depends or add "wget python3-wget-fixme" line t
o debian/py3dist-overrides or add proper  dependency to Depends by hand and ignore this info.
Problem is that stdeb didn't identify which linux packages include those PyPI packages. So we have to set them manually.

To manually configure stdeb you have to create a file called stdeb.cfg in the same folder than You usually will create a [DEFAULT] section where you'll put your configuration but you can create too [package_name] sections, where package_name is specified as the name argument to the setup() command.

For instance, if we find out that geoip2 PyPI library is included inside python3-geoip Ubuntu repository's package, and requests PyPI library is included inside python3-requests we could create a stdeb.cfg with this content:
X-Python3-Version: >= 3.4
Depends3: python3-geoip (>=1.3.1), python3-requests
All tags follow the same format as they would have if they were inserted in debian/control. For depends tag, format specification is here.

With that configuration, the given dependencies are included in generated debian package:

dante@Camelot:~/geolocate/deb_dist$ dpkg -I python3-glocate_1.3.0-1_all.deb
Depends: python3, python3-requests, python3:any (>= 3.3.2-2~), python3-geoip (>= 1.3.1)

What I haven't found out yet is a way to change the architecture tag of generated package so it is no longer generated with "Architecture: all".

At first glance, stdeb looks great and indeed it is but actually it has too some serious drawbacks.

Problem is that stdeb limits you to use only libraries and python packages from your standard linux repository. If you develop your application using PyPI libraries, chances are that when you try to find which linux package includes your PyPI library you'll find that those packages contain only older versions than those you downloaded from PyPI. Worse even, many PyPI libraries have not been ported to standard linux repositories so you won't find any package to match your dependency. For instance, geolocate needs to use geoip2 (v.2.1.0) which is easily downloadable from PyPI but only Ubuntu 15.04 has an available package called python3-geoip, but this one comes with version 1.3.2 of geoip. Will geolocate work with geoip version provided by python3-geoip package? probably won't. Other geolocate dependencies are even missing in standard repository, like PyPI wget python library.
It's clear that if you like to use PyPI libraries stdeb may not be your best option. But if you develop using only libraries available through your standard package manager then stdeb will probably save you the day.


FPM is a Ruby tool similar to stdeb. It's main advantage is that you can use FPM to create many kinds of packages installer, not only Debian ones, currently RPM packages (for Red Hat distros). And what is even more interesting FPM allows package conversions, for example from RPM to DEB.

FPM has no package in Ubuntu main repository, so you have to download it from Ruby's equivalent of PyPI. To get that you should install first Ruby packages:

dante@Camelot:~/geolocate$ sudo aptitude install ruby-dev gcc make

Afterwards you can install FPM from Ruby repositories:

dante@Camelot:~/geolocate$ gem install fpm

Creating a DEB package is pretty simple, just set FPM source as python, its target as deb and give it your package file path:

dante@Camelot:~/geolocate$ fpm -s python -t deb ./

FPM takes dependency names from and prepend them with the tag you set with --python-package-name-prefix flag (if not set then python prefix is used):

dante@Camelot:~/geolocate$ dpkg -I python-glocate_1.3.0_all.deb
Depends: python-geoip2 (>= 2.1.0), python-maxminddb (>= 1.1.1), python-requests (>= 2.5.0), python-wget (>= 2.2)

Problem here is similar than in stdeb: those dependencies doesn't exists in standard Ubuntu repository. In case dependencies would exists but with different names than those autogenerated by FPM, then you could set them manually:

dante@Camelot:~/geolocate$ fpm -s python -t deb --no-auto-depends -d "python3-geoip>=1.3.1, python3-wget" ./
dante@Camelot:~/geolocate$ dpkg -I python-glocate_1.3.0_all.deb 
Depends: python3-geoip>=1.3.1, python3-wget

Another useful flag is "-a native". This flag sets package architecture to the one of your system, so it is not set any longer to "_all" in your package name

FPM is a great tool. It allows to create RPM packages and it is very configurable but in my opinion it has one serious drawbacks: as happened with stdeb, it is useless if your applications imports a library available in PyPI but not in standard operating system repositories.


Up to this point it should be clear the main problem to package a Python a application is ensuring to meet its dependencies at installation, because developer may have used PyPI libraries not available through Linux standard repositories at user end.

The guys at Spotify developed a packager to address this problem: dh-virtualenv.

This assumes that if you are using PyPI to develop then you are likely using virtualenvs. So dh-virtualenv includes the entire virtualenv into the package so you have not to install them in user end.

Nevertheless, in my humble opinion dh-virtualenv has a serious drawback: it is not as cleaner to use as stdeb or fpm (because you have to create manually a debian folder and a rules file) and you end using debuild as in the beginning of the article.


Main concepts of vdist are similar to those seen in dh-virtualenv but vdist uses a combination of docker and fpm to create operating system standard package. This tool lets you build linux packages from your Python applications while aiming to build an isolated environment for your Python project using virtualenv. At first glance vdist may looks complex but its documentation it really clear and helpful, and actually is quite simple to use and automate.

If your main problem while packaging python application is to ensure dependencies are present at user end, vdist solves this making your application self contained and self sufficient so it does not depend on OS provided packages of Python modules. This means that packages generated by vdist contain your application, all python dependencies needed by your application, and a Python interpreter. That python interpreter allows to run your application with the interpreter of your choice not with the one shipped with the OS you're deploying on.

To ensure the host used to build the package keeps its system packages intact, vdist uses docker to create a clean OS image at build time and install there needed dependencies before your application is being packaged on top of it. Thanks to this your build machines will always be reverted to it's original state. To load your application into docker image, vdist downloads application source code from a git repository, so having your application in Bitbucket or Github is a good idea. Downloaded source code is placed in a virtualenv created inside your docker image. Pypy dependencies will be installed inside virtualenv

Main dependency for using vdist is having docker installed and its daemon running. To install it in Ubuntu you just need to do the following:

dante@Camelot:~/geolocate$ sudo aptitude install python-docker python3-docker

After installing docker, remember to add your user to docker group:

dante@Camelot:~/geolocate$ sudo usermod -a -G docker dante

You may need to restart your system to be sure the group is really updated.

Easiest way to install vdist is to install it using your standard package tools. Vdist packages are hosted at Bintray, so to install them from there you should include Bintray in your system repositories before anything else. To do it in Ubuntu just type:

dante@Camelot:~$ sudo apt-get update
dante@Camelot:~$ sudo apt-get install apt-transport-https
dante@Camelot:~$ sudo echo "deb [trusted=yes] generic main" | tee -a /etc/apt/sources.list
dante@Camelot:~$ sudo apt-key adv --keyserver --recv-keys 379CE192D401AB61

Once added bintray in your repositories you can install and update vdist like any other system package. For instance, in Ubuntu:

dante@Camelot:~$ sudo apt-get update
dante@Camelot:~$ sudo apt-get install vdist

If you are in a system where you don't have permission to install system packages you may find interesting installing it from PyPI repository inside a virtualenv created ad-hoc for packaging your application:

(env) dante@Camelot:~/geolocate$ pip install vdist

After installing vdist you are provided with a console command called... vdist. Just be aware that if you have installed vdist inside a virtualenv, that console command only will be available inside that virtualenv.

You have many ways to use vdist, I think the easiest way to use it is creating a configuration file and making vdist read it. Vdist is used to package itself so its configuration file is a good example:

app = vdist
version = 1.1.0
source_git =${app}, master
fpm_args = --maintainer -a native --url${app} --description
    "vdist (Virtualenv Distribute) is a tool that lets you build OS packages
     from your Python applications, while aiming to build an
     isolated environment for your Python project by utilizing virtualenv. This
     means that your application will not depend on OS provided packages of
     Python modules, including their versions."
    --license MIT --category net
requirements_path = /REQUIREMENTS.txt
compile_python = True
python_version = 3.5.3
output_folder = ./package_dist/
after_install = packaging/
after_remove = packaging/

profile = ubuntu-trusty
runtime_deps = libssl1.0.0,
build_deps =

profile = centos7
runtime_deps = openssl, docker-ce

Vdist documentation is good enough to know what each parameter is useful for. Just note that you can have just in one configuration file parameters for every package you want to build . Just keep common parameters in [DEFAULT] section and put distribution dependent parameters in separate sections (they can be called as you want but you'd better use expressive names).

Once you have your configuration file you can launch vdist like this (guess configuration file is called configuration_file):

dante@Camelot:~/$ vdist batch configuration_file

Then you'll start to see a bunch of screen output while vdists builds your packages. Generated packages will be placed in folder set in output_folder configuration file parameter.

The smallest package size produced by vdist is about 50 MB because an entire python distribucion has to be included in that package. That size is what you pay for your application self-containment. Discounted those 50 MB, all the rest is due your application and it dependencies. At first glance it may seem big but nowadays is quite usual size for any compiled application you may find out there.

I think vdist is the most complete packaging solution available to deploy python apps in Linux boxes. With it you can deploy even in linux boxes with no python installed at all, giving you a valuable isolation in client end and easying your final user life to install your app.

Disclaimer: I started using vdist to write this article and I've ended being the current main contributor to its development, so feel free to comment any further improvement you feel could be interesting.