17 October 2021

How to write manpages with Markdown and Pandoc


No decent console application is released without a man page documenting how to use it. However, man page inners are rather arcane, but being a 1971 file format it has held up quite well.

Nowadays you have two standard ways to learn how to use a console command (google apart ;-) ): typing application command followed by "--help" to get a quick glance of application usage, or typing "man" followed by application name to get a detailed information about application usage. 

To implement "--help" approach in your application you can manually include "--help" parsing and output or you'd better use an argument parsing library like Python's ArgParse.

The "man" approach needs you to write a man page for your application.

Standard way to produce man pages is using troff formating commands. Linux has it's own troff implementation called groff. You have a feel about how is standard man pages format you can type next source inside a file called, for instance, corrupt.1:

.TH CORRUPT 1 .SH NAME corrupt \- modify files by randomly changing bits .SH SYNOPSIS .B corrupt [\fB\-n\fR \fIBITS\fR] [\fB\-\-bits\fR \fIBITS\fR] .IR file ... .SH DESCRIPTION .B corrupt modifies files by toggling a randomly chosen bit. .SH OPTIONS .TP .BR \-n ", " \-\-bits =\fIBITS\fR Set the number of bits to modify. Default is one bit.

Once saved, that file can be displayed by man. Assuming you are in the same folder than corrupt.1 you can type:

dante@Camelot:~/$ man -l corrupt.1

Output is: 

CORRUPT(1)                                                 General Commands Manual 

NAME
corrupt - modify files by randomly changing bits

SYNOPSIS
corrupt [-n BITS] [--bits BITS] file...

DESCRIPTION
corrupt modifies files by toggling a randomly chosen bit.

OPTIONS
-n, --bits=BITS
Set the number of bits to modify. Default is one bit.

CORRUPT(1)


 

You can opt to write specific man pages for your application following for instance this small cheat sheet

Nevertheless, troff/groff format is rather cumbersome and I feel keeping two sources of usage documentation (your README.md and your man page) is prone to errors and an effort waste. So, I follow a different approach: I write and keep updated my README.md and afterwards I convert it to a man page. Sure you need to keep a quite standard format for your README, and it won't be the fanciest one, but at least you will avoid to type the same things twice, in two different files, with different formatting tags.

The key stone to make that conversion is a tool called Pandoc. That tool is the Swiss army knife of document conversion. You can use it to convert between many documents format like word (docx), openoffice (odt), epub... or markdown (md) and groff man. Pandoc uses to be available at your distribution standard package manager repositories, so at an ubuntu distribution you just need:

dante@Camelot:~/$ sudo apt install pandoc

Once Pandoc is installed, you have to observe some conventions with your README.md to make further conversion easier. To illustrate this article explanations, we will follow as an example the file README.md for my project Cifra.

As you can see, linked README.md contains GitHub badges at the very beginning but those are going to be removed in conversion process we are going to explain later in this article. Everything else is structured to comply with contents a man page is supposed to have.

Maybe, most suspecting line is the first one. It's not usual to find a line like this in GitHub README:

Actually, that line is metadata for man page reader. After "%" contains document title (usually application name), manual section, a version, a "|" separator and finally a header. Manual section is 1 for user commands, 2 for system calls and 3 for C library functions. Your applications will fit in section 1 99% of times. I've not included a version for Cifra in that line, but you could have done. Besides, header indicates what set of documentation this manual page belongs to.

After that line, every section use to be included in man pages, but the only ones that should be included at least are:

  • Name: The name of the command.
  • Synopsis: A one-liner summarizing command line arguments and options.
  • Description: Describes in detail how to use your application command.

Other sections you can add are:

  • Options: Command line options.
  • Examples: About command usage.
  • Files: Useful if your application includes configuration files.
  • Environment: Here you explain if your application uses any environment variable.
  • Bugs: Where detected bugs should be reported? There you can link your GitHub issues page.
  • Authors: Who is this masterpiece of code author?
  • See also: References to other man pages.
  • Copyright | License: A good place to include your application license text.

Once you have decided which sections include in manpage, you should write them following a format easily convertible by pandoc in a manpage with an standard structure. That's why main sections are always marked at level one of indentation ("#"). A problem with indentation is that while you can get further subheaders in markdown using "#" character  ("#" for main titles, "##" for subheaders, "###" for sections, and so on), those subheaders are only recognized by Pandoc up to sublevel 2. To get more sublevels I've opted for "pandoc lists", a format you can use in your markdown that is recognized afterwards by pandoc:

 

In lines 42 and 48, you have what pandoc call are "line blocks". Those are lines beginning with a vertical bar ( | ) followed by a space. Those spaces between vertical var and command will be keep in man page converted text by pandoc.

Everything else in that README.md is good classic markdown format.

Let guess we have written our entire README.md and we want to perform conversion. To do that you can use an script like this from a temporal folder:

 

Lines 3 and 4 clean any file used in a previous conversion, while line 5 actually copies README.md from source folder.

Line 6 removes every GitHub badget you could have in your README.md. 

Line 7 is where we call pandoc to perform conversion.

In that point you could have a file that can be opened with man:

dante@Camelot:~/$ man -l man/cifra.1

But man pages use to be compressed in gzip format, as you can see with your system man pages:

dante@Camelot:~/$ ls /usr/share/man/man1

That's why script compress at line 8 your generated man page. And while generated man page is now a compressed gzip file, it is still readable by man:

dante@Camelot:~/$ man -l man/cifra.1.gz

Although those are some steps, you can see they are easily scriptable both as a local script, as an step of your continuous integration workflow.

As you can see, with these easy steps you only need to keep updated your README.md file because man page can be generated from it.


25 September 2021

How to create Python executables with PyOxidizer


Python is a great development language. But it lacks of a proper distribution and packaging support if you want end user get your application. Pypi and wheel packages are more intendend for developers to install their own apps dependencies, but and end user will feel as painful to use pip and virtualenv to install and run a python application.

There are some projects trying to solve that problem as PyInstaller, py2exe or cx_Freeze. I maintain vdist, that its closely related with this problem and tries to solve it creating debian, rpm and archlinux packages from python applications. In this article I'm going to analyze PyOxidizer. This tool is written in a language I'm really loving (Rust), and follows and approach somewhat similar to vdist as it bundles your application along a python distribution but besides compiles the entire bundle into an executable binary.

To structure this tutorial, I'm going to build and for my Cifra project. You can clone it at this point to follow this tutorial step by step.

First thing to be aware is that PyOxidizer bundles your application with a customized Python 3.8 or 3.9 distribution, so your app should be compatible with one of those. You will need a C compiler/toolchain to build with PyOxidizer, If you don't have one PyOxidizer outputs instructions to install one. PyOxidizer uses Rust toolchain too, but it downloads it in the background for you so it's not a dependency you should worry about.

PyOxidizer installation

You have some ways to install PyOxidizer (downloading a compiled release from its GitHub Page, compiling from source) but I feel straighter and cleaner to install into your project's virtualenv using pip.

Guess you have Cifra project cloned (in may case at path ~/Project/cifra), inside it you created a virtualenv inside a venv folder and you activated that virtualenv, then you can install PyOxidizer inside that virtualenv doing:

(venv)dante@Camelot:~/Projects/cifra$ pip install pyoxidizer
Collecting pyoxidizer
Downloading pyoxidizer-0.17.0-py3-none-manylinux2010_x86_64.whl (9.9 MB)
|████████████████████████████████| 9.9 MB 11.5 MB/s
Installing collected packages: pyoxidizer
Successfully installed pyoxidizer-0.17.0

(venv)dante@Camelot:~/Projects/cifra$

You can check you have pyoxidizer properly installed doing:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer --help
PyOxidizer 0.17.0
Gregory Szorc <gregory.szorc@gmail.com>
Build and distribute Python applications

USAGE:
pyoxidizer [FLAGS] [SUBCOMMAND]

FLAGS:
-h, --help
Prints help information

--system-rust
Use a system install of Rust instead of a self-managed Rust installation

-V, --version
Prints version information

--verbose
Enable verbose output


SUBCOMMANDS:
add Add PyOxidizer to an existing Rust project. (EXPERIMENTAL)
analyze Analyze a built binary
build Build a PyOxidizer enabled project
cache-clear Clear PyOxidizer's user-specific cache
find-resources Find resources in a file or directory
help Prints this message or the help of the given subcommand(s)
init-config-file Create a new PyOxidizer configuration file.
init-rust-project Create a new Rust project embedding a Python interpreter
list-targets List targets available to resolve in a configuration file
python-distribution-extract Extract a Python distribution archive to a directory
python-distribution-info Show information about a Python distribution archive
python-distribution-licenses Show licenses for a given Python distribution
run Run a target in a PyOxidizer configuration file
run-build-script Run functionality that a build script would perform


(venv)dante@Camelot:~/Projects/cifra$


PyOxidizer configuration

Now create an initial PyOxidizer configuration file at your project root folder:

(venv)dante@Camelot:~/Projects/cifra$ cd ..
(venv) dante@Camelot:~/Projects$ pyoxidizer init-config-file cifra
writing cifra/pyoxidizer.bzl

A new PyOxidizer configuration file has been created.
This configuration file can be used by various `pyoxidizer`
commands

For example, to build and run the default Python application:

$ cd cifra
$ pyoxidizer run

The default configuration is to invoke a Python REPL. You can
edit the configuration file to change behavior.

(venv)dante@Camelot:~/Projects$

Generated pyoxidizer.bzl is written in Starlark, a python dialect for configuration files, so we'll feel at home there. Nevertheless, that configuration file is rather long and although it is fully commented at first it is not clear how to align all the moving parts. PyOxidizer is pretty complete too but it can be rather overwhelming for anyone that only wants to get started. Filtering the entire PyOxidizer documentation site, I've found this section as the most helpful to customize PyOxidizer configuration file.

There are many approaches to build binaries using PyOxidizer. Let's assume you have a setuptools setup.py file for your project, then we can ask PyOxidizer run that setup.py inside it's customized python. To do that add next line before return line of pyoxidizer.bzl make_exe() function:

# Run cifra's own setup.py and include installed files in binary bundle.
exe.add_python_resources(exe.setup_py_install(package_path=CWD))

In package_path parameter you have to provide your setup.py path. I've provided a CWD argument because I'm assuming that pyoxidizer.bzl and setup.py are at the same folder.

Next thing to customize at pyoxidizer.bzl is that it defaults to build a Windows MSI executable. At Linux we want an ELF executable output. So, first comment line near the end that registers a target to build a msi installer:

#register_target("msi_installer", make_msi, depends=["exe"])

We need also an entry point for our application. It would be nice if PyOxidicer would take setup.py entry_points parameter configuration but it doesn't. Instead we have to provide it manually through pyoxidizer.bzl configuration file. In our example just find the line at make_exe() function where python_config variable is created and place after:

python_config.run_command = "from cifra.cifra_launcher import main; main()"


Building executable binaries

Just now, you can run "pyoxidizer build" and pyoxidizer will begin to bundle our application.

But Cifra has a very specific problem at this point. If you try to run build over cifra with configuration so far, you will get next error:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build
[...]
error[PYOXIDIZER_PYTHON_EXECUTABLE]: adding PythonExtensionModule<name=sqlalchemy.cimmutabledict>

Caused by:
extension module sqlalchemy.cimmutabledict cannot be loaded from memory but memory loading required
--> ./pyoxidizer.bzl:272:5
|
272 | exe.add_python_resources(exe.setup_py_install(package_path=CWD))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ add_python_resources()


error: adding PythonExtensionModule<name=sqlalchemy.cimmutabledict>

Caused by:
extension module sqlalchemy.cimmutabledict cannot be loaded from memory but memory loading required

(venv)dante@Camelot:~/Projects$

PyOxidizer tries to embed every dependency of your application inside produced binary (in-memory mode). This is nice because you end with an unique distributable binary file and performance to load those dependencies is improved. Problem is that not every package out there admits to be embedded that way. Here SQLAlchemy fails to be embedded. 

In that case, those dependencies should be stored next to produced binary (filesystem-relative mode). PyOxidizer will link those dependencies inside binary using their places relative to produced binary, so when distributing we must pack produced binary and its dependencies files keeping their relative places.

One way to deal with this problem is asking PyOxidizer to keep things in-memory whenever it can and fallback to filesystem-relative when not. To do that uncomment next two lines from make_exe() function at configuration file:

# Use in-memory location for adding resources by default.
policy.resources_location = "in-memory"

# Use filesystem-relative location for adding resources by default.
# policy.resources_location = "filesystem-relative:prefix"

# Attempt to add resources relative to the built binary when
# `resources_location` fails.
policy.resources_location_fallback = "filesystem-relative:prefix"

Doing this you may make things work in your application, but for Cifra things keep failing despite build go further:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build
[...]
adding extra file prefix/sqlalchemy/cresultproxy.cpython-39-x86_64-linux-gnu.so to .
installing files to /home/dante/Projects/cifra/./build/x86_64-unknown-linux-gnu/debug/install
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "cifra.cifra_launcher", line 31, in <module>
File "cifra.attack.vigenere", line 21, in <module>
File "cifra.tests.test_simple_attacks", line 2, in <module>
File "pytest", line 5, in <module>
File "_pytest.assertion", line 9, in <module>
File "_pytest.assertion.rewrite", line 34, in <module>
File "_pytest.assertion.util", line 13, in <module>
File "_pytest._code", line 2, in <module>
File "_pytest._code.code", line 1223, in <module>
AttributeError: module 'pluggy' has no attribute '__file__'
error: cargo run failed

(venv)dante@Camelot:~/Projects$

This error seems somewhat related to this PyOxidizer nuance

To deal with this problem I must keep everything filesystem related:

# Use in-memory location for adding resources by default.
# policy.resources_location = "in-memory"

# Use filesystem-relative location for adding resources by default.
policy.resources_location = "filesystem-relative:prefix"

# Attempt to add resources relative to the built binary when
# `resources_location` fails.
# policy.resources_location_fallback = "filesystem-relative:prefix"

Now build go smoothly:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build
[...]
installing files to /home/dante/Projects/cifra/./build/x86_64-unknown-linux-gnu/debug/install
(venv)dante@Camelot:~/Projects/cifra$ls build/x86_64-unknown-linux-gnu/debug/install/
cifra prefix
(venv)dante@Camelot:~/Projects$

As you can see, PyOxidizer generated an ELF binary for our application and stored all of its dependencies in prefix folder:

(venv)dante@Camelot:~/Projects/cifra$ ls build/x86_64-unknown-linux-gnu/debug/install/prefix
abc.py concurrent __future__.py _markupbase.py pty.py sndhdr.py tokenize.py
aifc.py config-3 genericpath.py mimetypes.py py socket.py token.py
_aix_support.py configparser.py getopt.py modulefinder.py _py_abc.py socketserver.py toml
antigravity.py contextlib.py getpass.py multiprocessing __pycache__ sqlalchemy traceback.py
argparse.py contextvars.py gettext.py netrc.py pyclbr.py sqlite3 tracemalloc.py
ast.py copy.py glob.py nntplib.py py_compile.py sre_compile.py trace.py
asynchat.py copyreg.py graphlib.py ntpath.py _pydecimal.py sre_constants.py tty.py
asyncio cProfile.py greenlet nturl2path.py pydoc_data sre_parse.py turtledemo
asyncore.py crypt.py gzip.py numbers.py pydoc.py ssl.py turtle.py
attr csv.py hashlib.py opcode.py _pyio.py statistics.py types.py
base64.py ctypes heapq.py operator.py pyparsing stat.py typing.py
bdb.py curses hmac.py optparse.py _pytest stringprep.py unittest
binhex.py dataclasses.py html os.py pytest string.py urllib
bisect.py datetime.py http _osx_support.py queue.py _strptime.py uuid.py
_bootlocale.py dbm idlelib packaging quopri.py struct.py uu.py
_bootsubprocess.py decimal.py imaplib.py pathlib.py random.py subprocess.py venv
bz2.py difflib.py imghdr.py pdb.py reprlib.py sunau.py warnings.py
calendar.py dis.py importlib __phello__ re.py symbol.py wave.py
cgi.py distutils imp.py pickle.py rlcompleter.py symtable.py weakref.py
cgitb.py _distutils_hack iniconfig pickletools.py runpy.py _sysconfigdata__linux_x86_64-linux-gnu.py _weakrefset.py
chunk.py doctest.py inspect.py pip sched.py sysconfig.py webbrowser.py
cifra email io.py pipes.py secrets.py tabnanny.py wsgiref
cmd.py encodings ipaddress.py pkg_resources selectors.py tarfile.py xdrlib.py
codecs.py ensurepip json pkgutil.py setuptools telnetlib.py xml
codeop.py enum.py keyword.py platform.py shelve.py tempfile.py xmlrpc
code.py filecmp.py lib2to3 plistlib.py shlex.py test_common zipapp.py
collections fileinput.py linecache.py pluggy shutil.py textwrap.py zipfile.py
_collections_abc.py fnmatch.py locale.py poplib.py signal.py this.py zipimport.py
colorsys.py formatter.py logging posixpath.py _sitebuiltins.py _threading_local.py zoneinfo
_compat_pickle.py fractions.py lzma.py pprint.py site.py threading.py
compileall.py ftplib.py mailbox.py profile.py smtpd.py timeit.py
_compression.py functools.py mailcap.py pstats.py smtplib.py tkinter

(venv)dante@Camelot:~/Projects$

If you want to name that folder with a moser self-explanatory name, just change "prefix" for whatever you want in configuration file. For instance, to name it "lib":

# Use in-memory location for adding resources by default.
# policy.resources_location = "in-memory"

# Use filesystem-relative location for adding resources by default.
policy.resources_location = "filesystem-relative:lib"

# Attempt to add resources relative to the built binary when
# `resources_location` fails.
# policy.resources_location_fallback = "filesystem-relative:prefix"

Let's see how our dependencies folder changed:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build
[...]
installing files to /home/dante/Projects/cifra/./build/x86_64-unknown-linux-gnu/debug/install
(venv)dante@Camelot:~/Projects/cifra$ls build/x86_64-unknown-linux-gnu/debug/install/
cifra lib
(venv)dante@Camelot:~/Projects$


Building for many distributions

So far, you have get a binary that can be run in any Linux distribution like the one you used to build that binary. Problem is that our binary can fail if its run in other distributions:

(venv)dante@Camelot:~/Projects/cifra$ ls build/x86_64-unknown-linux-gnu/debug/install/
cifra lib
(venv) dante@Camelot:~/Projects/cifra$ docker run -d -ti -v $(pwd)/build/x86_64-unknown-linux-gnu/debug/install/:/work ubuntu
36ad7e78c00f90172bd664f9a045f14acc2138b2ee5b8ac38e7158aa145d4965
(venv) dante@Camelot:~/Projects/cifra$ docker attach 36ad
root@36ad7e78c00f:/# ls /work
cifra lib
root@36ad7e78c00f:/# /work/cifra --help
usage: cifra [-h] {dictionary,cipher,decipher,attack} ...

Console command to crypt and decrypt texts using classic methods. It also performs crypto attacks against those methods.

positional arguments:
{dictionary,cipher,decipher,attack}
Available modes
dictionary Manage dictionaries to perform crypto attacks.
cipher Cipher a text using a key.
decipher Decipher a text using a key.
attack Attack a ciphered text to get its plain text

optional arguments:
-h, --help show this help message and exit

Follow cifra development at: <https://github.com/dante-signal31/cifra>
root@36ad7e78c00f:/# exit
exit

(venv)dante@Camelot:~/Projects$

Here our built binary runs in another ubuntu because my personal box (Camelot) is an ubuntu (actually Linux Mint). Our generated binary will run right in other machines with the same distribution like the one I used to build binary. 

But let's see what happens if we run ourbinary in a different distribution:

(venv)dante@Camelot:~/Projects/cifra$ docker run -d -ti -v $(pwd)/build/x86_64-unknown-linux-gnu/debug/install/:/work centos
Unable to find image 'centos:latest' locally
latest: Pulling from library/centos
a1d0c7532777: Pull complete
Digest: sha256:a27fd8080b517143cbbbab9dfb7c8571c40d67d534bbdee55bd6c473f432b177
Status: Downloaded newer image for centos:latest
376c6f0d085d9947453a2786a9a712146c471dfbeeb4c452280bd028a5f5126f
(venv) dante@Camelot:~/Projects/cifra$ docker attach 376
[root@376c6f0d085d /]# ls /work
cifra lib
[root@376c6f0d085d /]# /work/cifra --help
/work/cifra: /lib64/libm.so.6: version `GLIBC_2.29' not found (required by /work/cifra)
[root@376c6f0d085d /]# exit
exit

(venv)dante@Camelot:~/Projects$

As you can see, our binary fails on Centos. That happens because libraries are not named the same across distributions so our compiled binary fails to load shared dependencies it needs to run.

To solve this you need to build a fully statically linked binary, so it has no external dependencies at all. 

To build that kind of binaries we need to update Rust toolchain to build for that kind of targets:

(venv)dante@Camelot:~/Projects/cifra$ rustup target add x86_64-unknown-linux-musl
info: downloading component 'rust-std' for 'x86_64-unknown-linux-musl'
info: installing component 'rust-std' for 'x86_64-unknown-linux-musl'
30.4 MiB / 30.4 MiB (100 %) 13.9 MiB/s in 2s ETA: 0s

(venv)dante@Camelot:~/Projects$

To make PyOxidizer build a binary like that you are supossed to do:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build --target-triple x86_64-unknown-linux-musl
[...]
Processing greenlet-1.1.1.tar.gz
Writing /tmp/easy_install-futc9qp4/greenlet-1.1.1/setup.cfg
Running greenlet-1.1.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-futc9qp4/greenlet-1.1.1/egg-dist-tmp-zga041e7
no previously-included directories found matching 'docs/_build'
warning: no files found matching '*.py' under directory 'appveyor'
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '*.pyd' found anywhere in distribution
warning: no previously-included files matching '*.so' found anywhere in distribution
warning: no previously-included files matching '.coverage' found anywhere in distribution
error: Setup script exited with error: command 'musl-clang' failed: No such file or directory
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: "command [\"/home/dante/.cache/pyoxidizer/python_distributions/python.70974f0c6874/python/install/bin/python3.9\", \"setup.py\", \"install\", \"--prefix\", \"/tmp/pyoxidizer-setup-py-installvKsIUS/install\", \"--no-compile\"] exited with code 1" }', pyoxidizer/src/py_packaging/packaging_tool.rs:336:38
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

(venv)dante@Camelot:~/Projects$

As you can see, I get an error that I've been unable to solve. Because of that I've filled an issue at PyOxidizer GitHub page. PyOxidizer author kindly answered my issue pointing I needed musl-clang command in my system. I've not found any package with musl-clang, so I guess it's something you have to compile from source. I've tried to google it but I haven't found a clear way to get that command up and running (I have to admit I'm not a C/C++ ninja). So, I guess I'll have to use pyoxidizer without static compiling until musl-clang dependency dissapears ot pyoxidizer documentation explains how to get that command.


Conclusion

Althought I haven't got it work to create statically linked binaries, PyOxidizer feels like a promising tool. I think I can use at vdist to speed up packaging process. As in vdist I use specific containers for every type of package inability to create statically linked binaries doesn't seem a blocker. It seems actively developed and has a lot of contributors, so it seems a good chance to help to simplify python packaging and deployment of python applications.