25 September 2021

How to create Python executables with PyOxidizer

Python is a great development language. But it lacks of a proper distribution and packaging support if you want end user get your application. Pypi and wheel packages are more intendend for developers to install their own apps dependencies, but and end user will feel as painful to use pip and virtualenv to install and run a python application.

There are some projects trying to solve that problem as PyInstaller, py2exe or cx_Freeze. I maintain vdist, that its closely related with this problem and tries to solve it creating debian, rpm and archlinux packages from python applications. In this article I'm going to analyze PyOxidizer. This tool is written in a language I'm really loving (Rust), and follows and approach somewhat similar to vdist as it bundles your application along a python distribution but besides compiles the entire bundle into an executable binary.

To structure this tutorial, I'm going to build and compile my Cifra project. You can clone it at this point to follow this tutorial step by step.

First thing to be aware is that PyOxidizer bundles your application with a customized Python 3.8 or 3.9 distribution, so your app should be compatible with one of those. You will need a C compiler/toolchain to build with PyOxidizer, If you don't have one PyOxidizer outputs instructions to install one. PyOxidizer uses Rust toolchain too, but it downloads it in the background for you so it's not a dependency you should worry about.

PyOxidizer installation

You have some ways to install PyOxidizer (downloading a compiled release from its GitHub Page, compiling from source) but I feel straighter and cleaner to install into your project's virtualenv using pip.

Guess you have Cifra project cloned (in my case at path ~/Project/cifra), inside it you created a virtualenv inside a venv folder and you activated that virtualenv, then you can install PyOxidizer inside that virtualenv doing:

(venv)dante@Camelot:~/Projects/cifra$ pip install pyoxidizer
Collecting pyoxidizer
Downloading pyoxidizer-0.17.0-py3-none-manylinux2010_x86_64.whl (9.9 MB)
|████████████████████████████████| 9.9 MB 11.5 MB/s
Installing collected packages: pyoxidizer
Successfully installed pyoxidizer-0.17.0


You can check you have pyoxidizer properly installed doing:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer --help
PyOxidizer 0.17.0
Gregory Szorc <gregory.szorc@gmail.com>
Build and distribute Python applications

pyoxidizer [FLAGS] [SUBCOMMAND]

-h, --help
Prints help information

Use a system install of Rust instead of a self-managed Rust installation

-V, --version
Prints version information

Enable verbose output

add Add PyOxidizer to an existing Rust project. (EXPERIMENTAL)
analyze Analyze a built binary
build Build a PyOxidizer enabled project
cache-clear Clear PyOxidizer's user-specific cache
find-resources Find resources in a file or directory
help Prints this message or the help of the given subcommand(s)
init-config-file Create a new PyOxidizer configuration file.
init-rust-project Create a new Rust project embedding a Python interpreter
list-targets List targets available to resolve in a configuration file
python-distribution-extract Extract a Python distribution archive to a directory
python-distribution-info Show information about a Python distribution archive
python-distribution-licenses Show licenses for a given Python distribution
run Run a target in a PyOxidizer configuration file
run-build-script Run functionality that a build script would perform


PyOxidizer configuration

Now create an initial PyOxidizer configuration file at your project root folder:

(venv)dante@Camelot:~/Projects/cifra$ cd ..
(venv) dante@Camelot:~/Projects$ pyoxidizer init-config-file cifra
writing cifra/pyoxidizer.bzl

A new PyOxidizer configuration file has been created.
This configuration file can be used by various `pyoxidizer`

For example, to build and run the default Python application:

$ cd cifra
$ pyoxidizer run

The default configuration is to invoke a Python REPL. You can
edit the configuration file to change behavior.


Generated pyoxidizer.bzl is written in Starlark, a python dialect for configuration files, so we'll feel at home there. Nevertheless, that configuration file is rather long and although it is fully commented at first it is not clear how to align all the moving parts. PyOxidizer is pretty complete too but it can be rather overwhelming for anyone that only wants to get started. Filtering the entire PyOxidizer documentation site, I've found this section as the most helpful to customize PyOxidizer configuration file.

There are many approaches to build binaries using PyOxidizer. Let's assume you have a setuptools setup.py file for your project, then we can ask PyOxidizer run that setup.py inside it's customized python. To do that add next line before return line of pyoxidizer.bzl make_exe() function:

# Run cifra's own setup.py and include installed files in binary bundle.

In package_path parameter you have to provide your setup.py path. I've provided a CWD argument because I'm assuming that pyoxidizer.bzl and setup.py are at the same folder.

Next thing to customize at pyoxidizer.bzl is that it defaults to build a Windows MSI executable. At Linux we want an ELF executable output. So, first comment line near the end that registers a target to build a msi installer:

#register_target("msi_installer", make_msi, depends=["exe"])

We need also an entry point for our application. It would be nice if PyOxidicer would take setup.py entry_points parameter configuration but it doesn't. Instead we have to provide it manually through pyoxidizer.bzl configuration file. In our example just find the line at make_exe() function where python_config variable is created and place after:

python_config.run_command = "from cifra.cifra_launcher import main; main()"

Building executable binaries

Just now, you can run "pyoxidizer build" and pyoxidizer will begin to bundle our application.

But Cifra has a very specific problem at this point. If you try to run build over cifra with configuration so far, you will get next error:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build
error[PYOXIDIZER_PYTHON_EXECUTABLE]: adding PythonExtensionModule<name=sqlalchemy.cimmutabledict>

Caused by:
extension module sqlalchemy.cimmutabledict cannot be loaded from memory but memory loading required
--> ./pyoxidizer.bzl:272:5
272 | exe.add_python_resources(exe.setup_py_install(package_path=CWD))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ add_python_resources()

error: adding PythonExtensionModule<name=sqlalchemy.cimmutabledict>

Caused by:
extension module sqlalchemy.cimmutabledict cannot be loaded from memory but memory loading required


PyOxidizer tries to embed every dependency of your application inside produced binary (in-memory mode). This is nice because you end with an unique distributable binary file and performance to load those dependencies is improved. Problem is that not every package out there admits to be embedded that way. Here SQLAlchemy fails to be embedded. 

In that case, those dependencies should be stored next to produced binary (filesystem-relative mode). PyOxidizer will link those dependencies inside binary using their places relative to produced binary, so when distributing we must pack produced binary and its dependencies files keeping their relative places.

One way to deal with this problem is asking PyOxidizer to keep things in-memory whenever it can and fallback to filesystem-relative when not. To do that uncomment next two lines from make_exe() function at configuration file:

# Use in-memory location for adding resources by default.
policy.resources_location = "in-memory"

# Use filesystem-relative location for adding resources by default.
# policy.resources_location = "filesystem-relative:prefix"

# Attempt to add resources relative to the built binary when
# `resources_location` fails.
policy.resources_location_fallback = "filesystem-relative:prefix"

Doing this you may make things work in your application, but for Cifra things keep failing despite build go further:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build
adding extra file prefix/sqlalchemy/cresultproxy.cpython-39-x86_64-linux-gnu.so to .
installing files to /home/dante/Projects/cifra/./build/x86_64-unknown-linux-gnu/debug/install
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "cifra.cifra_launcher", line 31, in <module>
File "cifra.attack.vigenere", line 21, in <module>
File "cifra.tests.test_simple_attacks", line 2, in <module>
File "pytest", line 5, in <module>
File "_pytest.assertion", line 9, in <module>
File "_pytest.assertion.rewrite", line 34, in <module>
File "_pytest.assertion.util", line 13, in <module>
File "_pytest._code", line 2, in <module>
File "_pytest._code.code", line 1223, in <module>
AttributeError: module 'pluggy' has no attribute '__file__'
error: cargo run failed


This error seems somewhat related to this PyOxidizer nuance

To deal with this problem I must keep everything filesystem related:

# Use in-memory location for adding resources by default.
# policy.resources_location = "in-memory"

# Use filesystem-relative location for adding resources by default.
policy.resources_location = "filesystem-relative:prefix"

# Attempt to add resources relative to the built binary when
# `resources_location` fails.
# policy.resources_location_fallback = "filesystem-relative:prefix"

Now build go smoothly:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build
installing files to /home/dante/Projects/cifra/./build/x86_64-unknown-linux-gnu/debug/install
(venv)dante@Camelot:~/Projects/cifra$ls build/x86_64-unknown-linux-gnu/debug/install/
cifra prefix

As you can see, PyOxidizer generated an ELF binary for our application and stored all of its dependencies in prefix folder:

(venv)dante@Camelot:~/Projects/cifra$ ls build/x86_64-unknown-linux-gnu/debug/install/prefix
abc.py concurrent __future__.py _markupbase.py pty.py sndhdr.py tokenize.py
aifc.py config-3 genericpath.py mimetypes.py py socket.py token.py
_aix_support.py configparser.py getopt.py modulefinder.py _py_abc.py socketserver.py toml
antigravity.py contextlib.py getpass.py multiprocessing __pycache__ sqlalchemy traceback.py
argparse.py contextvars.py gettext.py netrc.py pyclbr.py sqlite3 tracemalloc.py
ast.py copy.py glob.py nntplib.py py_compile.py sre_compile.py trace.py
asynchat.py copyreg.py graphlib.py ntpath.py _pydecimal.py sre_constants.py tty.py
asyncio cProfile.py greenlet nturl2path.py pydoc_data sre_parse.py turtledemo
asyncore.py crypt.py gzip.py numbers.py pydoc.py ssl.py turtle.py
attr csv.py hashlib.py opcode.py _pyio.py statistics.py types.py
base64.py ctypes heapq.py operator.py pyparsing stat.py typing.py
bdb.py curses hmac.py optparse.py _pytest stringprep.py unittest
binhex.py dataclasses.py html os.py pytest string.py urllib
bisect.py datetime.py http _osx_support.py queue.py _strptime.py uuid.py
_bootlocale.py dbm idlelib packaging quopri.py struct.py uu.py
_bootsubprocess.py decimal.py imaplib.py pathlib.py random.py subprocess.py venv
bz2.py difflib.py imghdr.py pdb.py reprlib.py sunau.py warnings.py
calendar.py dis.py importlib __phello__ re.py symbol.py wave.py
cgi.py distutils imp.py pickle.py rlcompleter.py symtable.py weakref.py
cgitb.py _distutils_hack iniconfig pickletools.py runpy.py _sysconfigdata__linux_x86_64-linux-gnu.py _weakrefset.py
chunk.py doctest.py inspect.py pip sched.py sysconfig.py webbrowser.py
cifra email io.py pipes.py secrets.py tabnanny.py wsgiref
cmd.py encodings ipaddress.py pkg_resources selectors.py tarfile.py xdrlib.py
codecs.py ensurepip json pkgutil.py setuptools telnetlib.py xml
codeop.py enum.py keyword.py platform.py shelve.py tempfile.py xmlrpc
code.py filecmp.py lib2to3 plistlib.py shlex.py test_common zipapp.py
collections fileinput.py linecache.py pluggy shutil.py textwrap.py zipfile.py
_collections_abc.py fnmatch.py locale.py poplib.py signal.py this.py zipimport.py
colorsys.py formatter.py logging posixpath.py _sitebuiltins.py _threading_local.py zoneinfo
_compat_pickle.py fractions.py lzma.py pprint.py site.py threading.py
compileall.py ftplib.py mailbox.py profile.py smtpd.py timeit.py
_compression.py functools.py mailcap.py pstats.py smtplib.py tkinter


If you want to name that folder with a more self-explanatory name, just change "prefix" for whatever you want in configuration file. For instance, to name it "lib":

# Use in-memory location for adding resources by default.
# policy.resources_location = "in-memory"

# Use filesystem-relative location for adding resources by default.
policy.resources_location = "filesystem-relative:lib"

# Attempt to add resources relative to the built binary when
# `resources_location` fails.
# policy.resources_location_fallback = "filesystem-relative:prefix"

Let's see how our dependencies folder changed:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build
installing files to /home/dante/Projects/cifra/./build/x86_64-unknown-linux-gnu/debug/install
(venv)dante@Camelot:~/Projects/cifra$ls build/x86_64-unknown-linux-gnu/debug/install/
cifra lib

Building for many distributions

So far, you have get a binary that can be run in any Linux distribution like the one you used to build that binary. Problem is that our binary can fail if its run in other distributions:

(venv)dante@Camelot:~/Projects/cifra$ ls build/x86_64-unknown-linux-gnu/debug/install/
cifra lib
(venv) dante@Camelot:~/Projects/cifra$ docker run -d -ti -v $(pwd)/build/x86_64-unknown-linux-gnu/debug/install/:/work ubuntu
(venv) dante@Camelot:~/Projects/cifra$ docker attach 36ad
root@36ad7e78c00f:/# ls /work
cifra lib
root@36ad7e78c00f:/# /work/cifra --help
usage: cifra [-h] {dictionary,cipher,decipher,attack} ...

Console command to crypt and decrypt texts using classic methods. It also performs crypto attacks against those methods.

positional arguments:
Available modes
dictionary Manage dictionaries to perform crypto attacks.
cipher Cipher a text using a key.
decipher Decipher a text using a key.
attack Attack a ciphered text to get its plain text

optional arguments:
-h, --help show this help message and exit

Follow cifra development at: <https://github.com/dante-signal31/cifra>
root@36ad7e78c00f:/# exit


Here our built binary runs in another ubuntu because my personal box (Camelot) is an ubuntu (actually Linux Mint). Our generated binary will run right in other machines with the same distribution like the one I used to build binary. 

But let's see what happens if we run our binary in a different distribution:

(venv)dante@Camelot:~/Projects/cifra$ docker run -d -ti -v $(pwd)/build/x86_64-unknown-linux-gnu/debug/install/:/work centos
Unable to find image 'centos:latest' locally
latest: Pulling from library/centos
a1d0c7532777: Pull complete
Digest: sha256:a27fd8080b517143cbbbab9dfb7c8571c40d67d534bbdee55bd6c473f432b177
Status: Downloaded newer image for centos:latest
(venv) dante@Camelot:~/Projects/cifra$ docker attach 376
[root@376c6f0d085d /]# ls /work
cifra lib
[root@376c6f0d085d /]# /work/cifra --help
/work/cifra: /lib64/libm.so.6: version `GLIBC_2.29' not found (required by /work/cifra)
[root@376c6f0d085d /]# exit


As you can see, our binary fails on Centos. That happens because libraries are not named the same across distributions so our compiled binary fails to load shared dependencies it needs to run.

To solve this you need to build a fully statically linked binary, so it has no external dependencies at all. 

To build that kind of binaries we need to update Rust toolchain to build for that kind of targets:

(venv)dante@Camelot:~/Projects/cifra$ rustup target add x86_64-unknown-linux-musl
info: downloading component 'rust-std' for 'x86_64-unknown-linux-musl'
info: installing component 'rust-std' for 'x86_64-unknown-linux-musl'
30.4 MiB / 30.4 MiB (100 %) 13.9 MiB/s in 2s ETA: 0s


To make PyOxidizer build a binary like that you are supossed to do:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build --target-triple x86_64-unknown-linux-musl
Processing greenlet-1.1.1.tar.gz
Writing /tmp/easy_install-futc9qp4/greenlet-1.1.1/setup.cfg
Running greenlet-1.1.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-futc9qp4/greenlet-1.1.1/egg-dist-tmp-zga041e7
no previously-included directories found matching 'docs/_build'
warning: no files found matching '*.py' under directory 'appveyor'
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '*.pyd' found anywhere in distribution
warning: no previously-included files matching '*.so' found anywhere in distribution
warning: no previously-included files matching '.coverage' found anywhere in distribution
error: Setup script exited with error: command 'musl-clang' failed: No such file or directory
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: "command [\"/home/dante/.cache/pyoxidizer/python_distributions/python.70974f0c6874/python/install/bin/python3.9\", \"setup.py\", \"install\", \"--prefix\", \"/tmp/pyoxidizer-setup-py-installvKsIUS/install\", \"--no-compile\"] exited with code 1" }', pyoxidizer/src/py_packaging/packaging_tool.rs:336:38
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


As you can see, I get an error that I've been unable to solve. Because of that I've filled an issue at PyOxidizer GitHub page. PyOxidizer author kindly answered my issue pointing I needed musl-clang command in my system. I've not found any package with musl-clang, so I guess it's something you have to compile from source. I've tried to google it but I haven't found a clear way to get that command up and running (I have to admit I'm not a C/C++ ninja). So, I guess I'll have to use pyoxidizer without static compiling until musl-clang dependency dissapears ot pyoxidizer documentation explains how to get that command.


Althought I haven't got it work to create statically linked binaries, PyOxidizer feels like a promising tool. I think I can use at vdist to speed up packaging process. As in vdist I use specific containers for every type of package inability to create statically linked binaries doesn't seem a blocker. It seems actively developed and has a lot of contributors, so it seems a good chance to help to simplify python packaging and deployment of python applications.

18 September 2021

How to parse console arguments in your Python application with ArgParse

There is one common pattern for every console application: it has to manage user arguments. Few console applications runs with no user arguments, instead most applications needs user provided arguments to run properly. Think of ping, it needs at least one argument: IP address or URL to be pinged:

dante@Camelot:~/$ ping
PING ( 56(84) bytes of data.
64 bytes from icmp_seq=1 ttl=113 time=3.73 ms
64 bytes from icmp_seq=2 ttl=113 time=3.83 ms
64 bytes from icmp_seq=3 ttl=113 time=3.92 ms
--- ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2005ms
rtt min/avg/max/mdev = 3.733/3.828/3.920/0.076 ms


When you run a python application you get in sys.argv list every argument user entered when calling your application from console. Console command is split using whitespaces as separator a each resulting token is a sys.argv list element. First element of that list is your application name. So, if we implemented our own python version of ping sys.argv 0 index element would be "ping" and 1 index element would be "".

You could do your own argument parsing just assessing sys.argv content by yourself. Most developers do it that way... at the begining. If you go that way you'll soon realize that it is not so easy to perform a clean management of user argument and that you have to repeat a lot of boilerplate in your applications. so, there are some libraries to easy argument parsing for you. One I like a lot is argparse.

Argparse is a built-in module of standard python distribution so you don't have to download it from Pypi. Once you understand its concepts it's easy, very flexible and it manages for you many edge use cases. It one of those modules you really miss when developing in other languages.

First concept you have to understand to use argparse is Argument Parser concept. For this module, an Argument Parser is a fixed word that is followed by positional or optional user provided arguments. Default Argument Parser is the precisely the application name. In our ping example, "ping" command is an Argument Parser. Every Argument Parser can be followed by Arguments, those can be of two types:

  • Positional arguments: They cannot be avoided. User need to enter them or command is assumed as wrong. They should be entered in an specific order. In our ping example "" is a positional argument. We can meet many other examples, "cp" command for instance needs to positional arguments source file to copy and destination for copied file.
  • Optional arguments: They can be entered or not. They are marked by tags. Abbreviated tags use an hyphen and a char, long tags use double hyphens and a word. Some optional arguments admit a value and others not (they are boolean, true if they are used or false if not).

Here you can see some of the arguments cat command accepts:

dante@Camelot:~/$ cat --help
Usage: cat [OPTION]... [FILE]...
Concatenate FILE(s) to standard output.

With no FILE, or when FILE is -, read standard input.

-A, --show-all equivalent to -vET
-b, --number-nonblank number nonempty output lines, overrides -n
-e equivalent to -vE
-E, --show-ends display $ at end of each line
-n, --number number all output lines
-s, --squeeze-blank suppress repeated empty output lines
-t equivalent to -vT
-T, --show-tabs display TAB characters as ^I
-u (ignored)
-v, --show-nonprinting use ^ and M- notation, except for LFD and TAB
--help display this help and exit
--version output version information and exit

cat f - g Output f's contents, then standard input, then g's contents.
cat Copy standard input to standard output.

GNU coreutils online help: <https://www.gnu.org/software/coreutils/>
Full documentation at: <https://www.gnu.org/software/coreutils/cat>
or available locally via: info '(coreutils) cat invocation'


There are more complex commands that include what is called Subparsers. Subparsers appear when your command have other "fixed" words, like verbs. For instance, "docker" command have many subparsers: "docker build", "docker run", "docker ps", etc. Every subparser can accept its own set of arguments. A subparser can accept other subparsers too, so you can get pretty complex command trees.

To show how to use argparse module works, I'm going to use my own configuration for my application Cifra. Take a look to it's cifra_launcher.py file at GitHub.

In its __main__ section you can see how arguments are processed at high level of abstraction:

You can see we are using sys.argv to get arguments provided by user but discarding first one because is the root command itself: "cifra".

I like using sys.argv as a default parameter of main() call because that way I can call main() from my functional tests using an specific list of arguments.

Arguments are passed to parse_arguments, that is a my function to do all argparse magic. There you can find the root configuration for argparse:

There you configure root parser, the one linked to your base command, defining a description of you command and a final note (epilog) for your command. Those texts will appear when your user call your command with --help argument.

My application Cifra has some subparsers, like docker command has. To allow a parser to have subparsers you must do:

Once you have allowed your parser to have subparser you can start to create those subparsers. Cifra has 4 subparsers at this level:



We'll see that argparse returns a dict-like object after its parsing. If user called "cifra dictionary" then "mode" key from that dict-like object will have a "dictionary" value.

Every subparser can have its own subparser, actually "dictionary" subparser has more subparser adding more branches to the command tree. So you can call "cifra dictionary create", "cifra dictionary update", "cifra dictionary delete", etc.

Once you feel a parser does not need more subparsers you can add its respective arguments. For "cipher" subparser we have these ones:

In these arguments, "algorithm" and "key" are positional so they are compulsory. Arguments placed at that position will populate values for "algorithm" and "key" keys in dict-like object returned by argparse.

Metavar parameter is the string we want to be used to represent this parameter when help is called with --help argument. Help parameter as you can guess is the tooltip used to explain this parameter when --helps argument is used.

Type parameter is used to preprocess argument given by user. By default arguments are interpreted like strings, but if you use type=int argument will be converted to an int (or throw an error if it can't be done). Actually str or int are functions, you can use your own ones. For file_to_cipher parameter I used _check_is_file function:

As you can see, this function check if provided arguments points to a valid path name and if that happens returns provided string or raises an argparse error otherwise.

Remember parser optional parameters are always those preceded by hyphens. If those parameters are followed by an argument provided by users, that argument will be the value for dict-like object returned by arparser key called like optional parameter long name. In this example line 175 means that when user types "--ciphered_file myfile.txt", argparser will return a key called "ciphered_file" with value "myfile.txt".

Sometimes, your optional parameters won't need values. For example, they could be a boolean parameter to do verbose output:

parser.add_argument("--verbose", help="increase output verbosity", action="store_true", default=False)

With action="store_true" the dict-like object will have a "verbose" key with a boolean value: true if --verbose was called or false otherwise.

Before returning from parse_arguments I like to filter returned dict-like object content:

In this section parsed_arguments is the dict-like object that argparse returns after parsing provided user arguments with arg_parser.parse_args(args). This object includes those uncalled parameters with None value. I could leave it that way and check afterwards if values are None before using them, but a I fell somewhat cleaner remove those None'd values keys and just check if those keys are present or not. That's why I filter dict-like object at line 235.

Back to main() function, after parsing provided arguments you can base your further logic depending on parsed arguments dict contents. For example, for cifra:

This way, argparser lets you deal with provided user arguments in a clean way giving you for free help messages and error messages, depending on whether user called --help or entered a wrong or incomplete command. Actually generated help messages are so clean that I use them for my readme.md repository file and as a base point for my man pages (what that is an story for another article).