Showing posts with label packaging. Show all posts
Showing posts with label packaging. Show all posts

07 November 2021

How to use GitHub Actions for continuous integration and deployment


In a previous article I explained how to use Travis CI to get continuous integration in your project. Problem is that Travis has changed its usage terms in the last year and now its not so comfortable for open source projects. The keep saying they are still free for open source project but actually you have to beg for free credits every time you expend them and they make you to prove you still comply with what they define as an open source project (I've heard about cases where developers where discarded for free credits just because they had GitHub sponsors).

So, although I've kept my vdist project at Travis (at the moment) I've been searching alternatives for my projects. As I use GitHub for my open source repositories its natural to try its continuous integration and deployment framework: GitHub Actions.

Conceptually, using GitHub Actions is pretty much the same as Travis CI, so I'm not going to repeat myself. If you want to learn why to use a continuous integration and deployment framework you can read the article I linked at this one start.

So, lets focus at the point of how to use GitHub Actions.

As Travis CI what you do with GitHub Actions is centered in yaml files you place at .github/workflows folder in your repository. GitHub will look at that folder to execute CI/CD workflows. In that folder you can have many workflows to be executed depending on different GitHub events (pushes over branches, pull requests, issues filling and a long etcetera). 

But I find GitHub Actions way better than Travis CI. GitHub promotes massive reusing. Every single step can be encapsulated and shared with your other workflows and with other people at GitHub. With so many people developing and sharing at GitHub you'll find yourself reusing others tasks (a.k.a actions) than implementing by yourself. Unless your are automatizing something really weird, chances are that some other has implemented it and shared. In this article we'll use others actions and implement our own custom steps. Besides you can reuse your own workflows (or share with others) so if you have a working workflow for your project you can reuse it in a similar project so you don't need to reimplement its workflow from scratch.

For this article we'll focus in a typical "test -> package -> deploy" workflow which we're going to call test_and_deploy.yaml (if you feel creative you can call it as you like, but try to be expressive). If you want this article full code you can find it in this commit of my Cifra-rust project at GitHub.

To create that file you have two options: create it in your IDE and push like any other file or write it using built-in GitHub web editor. For the first time my advice is to use web editor as it guides you better to get your first yaml up and working. So go to your GitHub repo page and click over Actions tab:


There, your are prompted to create your first workflow. When you choose to create a workflow you're offered to use a predefined template (GitHub has many for different tasks and languages) or set up a workflow yourself. For the sake of this article choose "set up a workflow yourself". Then you will enter to web editor. An initial example will be already loaded to let you have and initial scaffolding.

Now lets check Cifra-rust yaml file to learn what we can do with GitHub actions.


Header

At very first few lines (from line 1 to 12) you can see we name this workflow ("name" tag). Use an expressive name, to identify quickly what this workflow does. You will use this name to reuse this workflow from other repositories.

The "on" tag defines which events trigger this workflow. In my case, this workflow is triggered by pushes and pull requests over staging branch. There're many more events you can use.

The "workflow_dispatch" tag allows you to trigger this workflow manually from GitHub web interface. I use to set it, it doesn't harm to have that option.



Jobs

Next is the "jobs" tag (line 15) and there is where the "nuts and guts" of workflow begins. A workflow is composed of jobs. Every job is runned in a separate virtual machine (the runner) so each job has its dependencies encapsulated. That encapsulation is good to avoid jobs messing dependencies and filesystems of others jobs. Try to focus every job in just one task. 

Jobs are run in parallel by default unless you explicitly set dependencies between them. If you need a job B be run after a job A is completed successfully you need to use tag "needs" to set B needs A to be completed to start. In Cifra-rust example jobs "merge_staging_and_master", "deploy_crates_io" and "generate_deb_package" need "tests" job to be successfully finished to start. You can see an example of "needs" tag usage at line 53:

As "deploy_debian_package" respectively needs "generate_deb_package" to be finished before, you end with an execution tree like this:


Actions

Every job is composed of one or multiple steps. One step is a sequence of shell commands You can run native shell commands or scripts you had included in your repository. From line 112 to 115 we have one of those steps:

There we are calling a script stored in a folder called ci_scripts from my repository. Note the pipe (" | ") next to "run" tag. Without that pipe you can include just one command in run tag but that pipe allows you to include multiple command to executed separated in different lines (like step in lines 44 to 46).

If you find yourself repeating the same commands across multiple workflows them you have a good candidate to encapsulate those commands in an action. As you can guess Actions are the key stones of GitHub Actions. An action is a set of commands packaged to be shared and reused in many workflows (yours or of others users). An action has inputs and outputs and what happens inside is not your problem while it works as intended. You can develop your own actions and share with others but it deserves an article on its own. In next article I will convert that man page generation step in a reusable action.

At right hand side, GitHub Actions web editor has a searcher to find actions suitable for the task you want to perform. Guess you want to install a Rust toolchain, you can do this search:


Although GitHub web editor is really complete and useful to find mistakes in your yaml files, its searcher lacks a way to filter or reorder its results. Nevertheless that searcher is your best option to find shared actions for your workflows.

Once you click in any searcher result you are shown a summary about which text to include in your yaml file to use that action. You can get more detailed instructions clicking in "View full Marketplace listing" link.

As any other step, and action uses a "name" tag to document which task is intended to do and an "id" if that step must be referenced from other workflow places (see an example at line 42). What makes different an action from a command step is the "uses" tag. That tag links to the action we want to use. the text to use in that tag differs for every action but you can find what to write there at searcher results instructions. In those instructions the use to describe which inputs the action accepts. Those inputs are included in the "with" tag. For instance, in lines 23 to 27 I used an action to install the Rust building framework:

As you can see, in "uses" tag you can include which version of given action to use. There are wildcards to use latest version but you'd better set an specific version to avoid your workflows get broken by actions updates.

You make a job chaining actions as steps. Every step in a job is executed sequentially. If any of then fails the entire workflow fails. 


Sharing data between steps and jobs

Although the steps of a workflow are executed in the same virtual machine they cannot share bash environment variables because every step spawns a different bash process. If you set an environment variable in a step that needs to be used in another step of the same job you have two options:

  • Quick and dirty: Append that environment variable to $GITHUB_ENV so that variable can be accessed later using env context. For instance, at line 141 we create DEB_PACKAGE environment variable:

That environment variable is accessed at line 146, in the next step:


  • Set step outputs: Problem with last method is that although you can share data between steps of the same job you cannot do it across different jobs. Setting steps outputs is a bit slower but leaves your step ready to share data not only with other steps of the same job but with any step of the workflow. To set a variable environment as an step output you need to echo that variable to ::set-output and give a name to that variable followed by its value after a double colon ("::"). You have an example at line 46:

Note that step must be identified with an "id" tags to further retrieve the shared variable. As that step is identified as "version_tag" the created "package_tag" variable can be later retrieved from another step of the same job using: 

${{ steps.version_tag.outputs.package_tag }}

Actually that method is used at line 48 to prepare that variable to be recovered from another job. Remember that method so far helps you to pass data to steps in the same job. To export data from a job to be used from another job you have to declare it first as a job output (lines 47-48):


Note that in last screenshot indentation level should be at the same level as "steps" tag to properly set package_tag as a job output.

To retrieve that output from another job, the catching job must declare giving job as needed in its "needs" tag. After that, the shared value can be retrieved using next format:

${{ needs.<needed_job>.outputs.<output_varieble_name> }} 

In our example, "deploy_debian_package" needs the value exported at line 48 so it declares its job (tests) as needed in line 131:

After that, it can get and use that value at line 157:



Passing files between jobs

Sometimes passing a variable is not enough because you need to produce files in a job to be consumed in another job.

You can share files between steps in the same job because those steps share the same virtual machine. But between jobs you need to transfer files from a virtual machine to another.

When you generate a file (an artifact) in a job, you can upload it to a temporal shared storage to allow other jobs in the same workflow get that artifact. To upload and download artifact to that temporal storage you have two predefined actions: upload-artifact and download-artifact.

In our example, "generate_deb_package" job generates a debian package that is needed by "deploy_debian_package". So, in lines 122 to 126 "generated_deb_package" uploads that package:

On the other side "deploy_debian_package" downloads saved artifact in lines 132 to 136:



Using your repository source code

By default you start every job with a clean virtual machine. To have your source code downloaded to that virtual machine you use an action called checkout. In our example it is used as first step of "tests" job (line 21) to allow code to be built and tested:

You can set any branch to be downloaded, but if you don't set anyone the one related to triggering event is used. In our example, staging branch is the used one.


Running your workflow

You have two options to trigger your workflow to try it: you can do the triggering event (in our example pushing code to staging branch) or you can launch workflow manually from GitHub web interface (assuming you set "workflow_dispatch" tag in your yml file as I advised).

To do a manual triggering, go to repository Actions tab and select the workflow you want to launch. Then you'll see "Run workflow button" at right hand side:


Once pushed that button, workflow will start and that tab list will show that workflow as active. Selecting that workflow in the list shows an execution tree like the one I showed earlier. Clicking in any of the boxes shows running logs of every step of that job. It is extremely easy read through generated logs.


Conclusion

I've found GitHub Actions really enjoyable. Its focus in reusability and sharing makes really easy and fast creating complex workflow. Unless you're working in a really weird workflow, chances are that most of your workflow components (if not all) are already implemented and shared as actions, so designing a complex workflow becomes an easy task of joining already available pieces. 

GitHub Actions documentation is great, and it popularity makes easy find answers online for any problem you meet.

Besides that, I've felt yml files structure coherent and logic so its easy to grasp concepts an gain a good level really quickly. 

Being free and unlimited for open source repositories I guess I'm going to migrate all my CI/CD workflows to GitHub Actions from Travis CI.

23 October 2021

How to use PackageCloud to distribute your packages


Recently I wrote an article explaining how to setup a JFrog Artifactory account to host Debian and RPM repositories. 

Since then, my interest in Artifactory has weakened because they have a policy of continuous activity to keep your account alive. If you have periods with no packages uploads/downloads they suspend your account and you have to reactivate it manually. That is extremely unpleasant and uncomfortable and happens frequently (I've been receiving suspensions every few weeks) when you're like me, a hobbyist developer in his spare time who can't keep a continuous advance in his projects. That and the extremely complex setup to run a package repository made me look for another option.

Searching through the web I've found PackageCloud and so far it happens to be an appealing alternative to Artifactory. PackageCloud has a free tier with 2GB storage and 10GB of monthly transfer. For a hobbyist like I think it is enough. 

Creating a repository

Once your register at PackageCloud you access to an extremely clean dashboard. Compared to Artifactory everything is simple and easy. At up-right corner of "Home" page you have a big button to "Create a repository". You can use it to create a repository for every application you have. 

 


A wonderful feature of PackageCloud is that a single repository can host many packages types simultaneously. So you don't have to create a repository for every package type you want to host for your application, instead you have a single repository for your application and you upload there your deb, rpm, gem, etc packages you build. 

Uploading packages

While you use PackageCloud your realize it's developers have made a great effort to guide at every step. Once you create a new repository you are given immediate guidance about how to upload packages through console (although you have a nice blue button too to upload them through web interface if you like):

As a first approach we will first try web interface to upload a package and after we will test console approach.

If you select your recently created repository at Home page you will enter to the same screen I posted before. To use web interface to upload packages push "Upload a package" button. That button will trigger next pop-up window:

As you can see in last screenshot, I've pushed "Select a package" button and selected vdist_2.1.0:amd64.deb. After you have to select your target distribution in given combo box. I'd suggest selecting a general compatible distribution to widen you audience. For example, I develop in a Linux Mint box and although Linux Mint is present at combo box I prefer to select Ubuntu Focal as a wider equivalent (being aware that my Linux Mint Uma is based in Ubuntu Focal). After selecting package and target distribution, "upload" button will be enabled. Upload will start when you push that button. You will informed that upload has ended with a green boxed "Upload Successful!".

If you prefer console you can use PackageCloud console application. That package is developed in Ruby so you have to be sure you have it installed:

dante@Camelot:~/$ sudo apt install ruby ruby-dev g++
[sudo] password for dante:
[...]

dante@Camelot:~$

Ruby package gives you "gem" command to install PackageCloud application:

dante@Camelot:~/$ sudo gem install package_cloud
Building native extensions. This could take a while...
Successfully installed unf_ext-0.0.8
Successfully installed unf-0.1.4
Successfully installed domain_name-0.5.20190701
Successfully installed http-cookie-1.0.4
Successfully installed mime-types-data-3.2021.0901
Successfully installed mime-types-3.3.1
Successfully installed netrc-0.11.0
Successfully installed rest-client-2.1.0
Successfully installed json_pure-1.8.1
Building native extensions. This could take a while...
Successfully installed rainbow-2.2.2
Successfully installed package_cloud-0.3.08
Parsing documentation for unf_ext-0.0.8
Installing ri documentation for unf_ext-0.0.8
Parsing documentation for unf-0.1.4
Installing ri documentation for unf-0.1.4
Parsing documentation for domain_name-0.5.20190701
Installing ri documentation for domain_name-0.5.20190701
Parsing documentation for http-cookie-1.0.4
Installing ri documentation for http-cookie-1.0.4
Parsing documentation for mime-types-data-3.2021.0901
Installing ri documentation for mime-types-data-3.2021.0901
Parsing documentation for mime-types-3.3.1
Installing ri documentation for mime-types-3.3.1
Parsing documentation for netrc-0.11.0
Installing ri documentation for netrc-0.11.0
Parsing documentation for rest-client-2.1.0
Installing ri documentation for rest-client-2.1.0
Parsing documentation for json_pure-1.8.1
Installing ri documentation for json_pure-1.8.1
Parsing documentation for rainbow-2.2.2
Installing ri documentation for rainbow-2.2.2
Parsing documentation for package_cloud-0.3.08
Installing ri documentation for package_cloud-0.3.08
Done installing documentation for unf_ext, unf, domain_name, http-cookie, mime-types-data, mime-types, netrc, rest-client, json_pure, rainbow, package_cloud after 8 seconds
11 gems installed


dante@Camelot:~$

That package_cloud command lets you do many things, even create repositories from console, but let focus on package uploading. To upload a package to an existing repository just use "push" verb:

dante@Camelot:~/$ package_cloud push dante-signal31/vdist/ubuntu/focal vdist_2.2.0post1_amd64.deb 
Email:
dante.signal31@gmail.com
Password:

/var/lib/gems/2.7.0/gems/json_pure-1.8.1/lib/json/common.rb:155: warning: Using the last argument as keyword parameters is deprecated
Got your token. Writing a config file to /home/dante/.packagecloud... success!
/var/lib/gems/2.7.0/gems/json_pure-1.8.1/lib/json/common.rb:155: warning: Using the last argument as keyword parameters is deprecated
Looking for repository at dante-signal31/vdist... /var/lib/gems/2.7.0/gems/json_pure-1.8.1/lib/json/common.rb:155: warning: Using the last argument as keyword parameters is deprecated
success!
Pushing vdist_2.2.0post1_amd64.deb... success!

dante@Camelot:~$

URL you pass to "package_cloud push" is always of the form <username>/<repository>/<distribution>/<distribution version>.

After that you can see the new package registered at web interface:

Installing packages

Ok, so far you know everything you need from the publisher side, but know you need to learn what your users have to do to install your packages.

The thing cannot be simplest. Every repository has a button in its packages section called "Quick install instructions for:". Push its button and you will get a pop-up window like this:


Just copy given command text (or use copy button) and make your user paste that command in their consoles (i.e. include that command at your installation instructions documentation):

dante@Camelot:~/$ curl -s https://packagecloud.io/install/repositories/dante-signal31/vdist/script.deb.sh | sudo bash
[sudo] password for dante:
Detected operating system as LinuxMint/uma.
Checking for curl...
Detected curl...
Checking for gpg...
Detected gpg...
Running apt-get update... done.
Installing apt-transport-https... done.
Installing /etc/apt/sources.list.d/dante-signal31_vdist.list...done.
Importing packagecloud gpg key... done.
Running apt-get update... done.

The repository is setup! You can now install packages.

dante@Camelot:~$

That command registered your PackageCloud repository as one of the system authorised packaged sources. Theoretically now your user could do a "sudo apt update" and get your package listed but here comes the only gotcha of this process. Recall when I told that I develop in Linux Mint but I set repository to Ubuntu/focal? the point is that last command detected my system and set my source as it was for Linux Mint:

dante@Camelot:~/$ cat /etc/apt/sources.list.d/dante-signal31_vdist.list 
# this file was generated by packagecloud.io for
# the repository at https://packagecloud.io/dante-signal31/vdist

deb https://packagecloud.io/dante-signal31/vdist/linuxmint/ uma main
deb-src https://packagecloud.io/dante-signal31/vdist/linuxmint/ uma main


dante@Camelot:~$

I've marked as red the incorrect path. If you insist to update apt in this moment you get next error:

dante@Camelot:~/$ sudo apt update
Hit:1 http://archive.canonical.com/ubuntu focal InRelease
Hit:2 https://download.docker.com/linux/ubuntu focal InRelease
Hit:3 http://archive.ubuntu.com/ubuntu focal InRelease
Get:4 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Ign:5 http://packages.linuxmint.com uma InRelease
Get:6 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal-backports InRelease [101 kB]
Hit:8 http://packages.linuxmint.com uma Release
Ign:11 https://packagecloud.io/dante-signal31/vdist/linuxmint uma InRelease
Hit:10 https://packagecloud.io/dante-signal31/cifra-rust/ubuntu focal InRelease
Err:12 https://packagecloud.io/dante-signal31/vdist/linuxmint uma Release
404 Not Found [IP: 52.52.239.191 443]
Reading package lists... Done
E: The repository 'https://packagecloud.io/dante-signal31/vdist/linuxmint uma Release' does not have a Release file.
N: Updating from such a repository can't be done securely, and is therefore disabled by default.
N: See apt-secure(8) manpage for repository creation and user configuration details.


dante@Camelot:~$

So, in my case I have to correct it manually to let it this way:

dante@Camelot:~/$ cat /etc/apt/sources.list.d/dante-signal31_vdist.list 
# this file was generated by packagecloud.io for
# the repository at https://packagecloud.io/dante-signal31/vdist

deb https://packagecloud.io/dante-signal31/vdist/ubuntu/ focal main
deb-src https://packagecloud.io/dante-signal31/vdist/ubuntu/ focal main

dante@Camelot:~$

Now "apt update" will work and you'll find our package:

dante@Camelot:~/$ sudo apt update
Hit:1 https://download.docker.com/linux/ubuntu focal InRelease
Hit:2 http://archive.ubuntu.com/ubuntu focal InRelease
Hit:3 http://archive.canonical.com/ubuntu focal InRelease
Get:4 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Ign:5 http://packages.linuxmint.com uma InRelease
Get:6 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Get:7 http://archive.ubuntu.com/ubuntu focal-backports InRelease [101 kB]
Hit:8 http://packages.linuxmint.com uma Release
Hit:10 https://packagecloud.io/dante-signal31/cifra-rust/ubuntu focal InRelease
Get:11 https://packagecloud.io/dante-signal31/vdist/ubuntu focal InRelease [24,4 kB]
Get:12 https://packagecloud.io/dante-signal31/vdist/ubuntu focal/main amd64 Packages [963 B]
Fetched 353 kB in 3s (109 kB/s)
Reading package lists... Done
Building dependency tree
Reading state information... Done
All packages are up to date.
dante@Camelot:~/Downloads$ sudo apt show vdist
Package: vdist
Version: 2.2.0post1
Priority: extra
Section: net
Maintainer: dante.signal31@gmail.com
Installed-Size: 214 MB
Depends: libssl1.0.0, docker-ce
Homepage: https://github.com/dante-signal31/vdist
Download-Size: 59,9 MB
APT-Sources: https://packagecloud.io/dante-signal31/vdist/ubuntu focal/main amd64 Packages
Description: vdist (Virtualenv Distribute) is a tool that lets you build OS packages from your Python applications, while aiming to build an isolated environment for your Python project by utilizing virtualenv. This means that your application will not depend on OS provided packages of Python modules, including their versions.

N: There is 1 additional record. Please use the '-a' switch to see it

dante@Camelot:~$

Obviously, if your user system matches your repository target there would be nothing to fix, but chances are that your user have derivatives boxes so they'll need to apply this fix, so make sure to include it in your documentation.

At this point, your users will be able to install your package as any other:

dante@Camelot:~/$ sudo apt install vdist
[...]

dante@Camelot:~$

Conclusion

PackageCloud makes extremely easy deploying your packages. Comparing with Bintray or Artifactory, its setup is a charm. I have to check how things go on the long term but at first glance it seems a promising service.


How to package Rust applications - DEB packages


Rust language itself is harsh. Your first contact with compiler and borrow checker uses to be traumatic until you realize they are actually to protect you against yourself. Once you understand that you begin to love that language.

But everything else apart of the language is kind, really comfortable I'd say. With cargo, compiling, testing, documenting, profiling and even publishing to crates.io (the Pypi of Rust) is a charm. Packaging is no exception of that as it is integrated with cargo assuming some configuration we're going to explain here.

To package my Rust applications into debian packages, I use cargo-deb. To install it just type:

dante@Camelot:~~/Projects/cifra-rust/$ cargo install cargo-deb
Updating crates.io index
Downloaded cargo-deb v1.32.0
Downloaded 1 crate (63.2 KB) in 0.36s
Installing cargo-deb v1.32.0
Downloaded crc v1.8.1
Downloaded build_const v0.2.2
[...]
Compiling crossbeam-deque v0.8.1
Compiling xz2 v0.1.6
Compiling toml v0.5.8
Compiling cargo_toml v0.10.1
Finished release [optimized] target(s) in 53.21s
Installing /home/dante/.cargo/bin/cargo-deb
Installed package `cargo-deb v1.32.0` (executable `cargo-deb`)


dante@Camelot:~$

Done that, you could start packaging simple applications. By default cargo deb obtains basic information from your Cargo.toml file. That way it loads next fields:

  • name
  • version
  • license
  • license-file
  • description
  • readme
  • homepage
  • repository

But seldom happens your application has no dependencies at all. To configure more advanced use cases, create a [package.metadata.deb] section in your Cargo.toml. In that section you can configure next fields:

  • maintainer
  • copyright
  • changelog
  • depends
  • recommends
  • enhances
  • conflicts
  • breaks
  • replaces
  • provides
  • extended-description
  • extended-description-file
  • section
  • priority
  • assets
  • maintainer-scripts

As a working example of this you can read this Cargo.toml version of my application Cifra.

There you can read general section from where cargo deb loads its basic information:

 

 

Cargo has a great documentation where you can find every section and tag explained.

Be aware that every file path you include in Cargo.toml is relative to Cargo.toml file.

Specific section for cargo-deb must not be long to have a working package:


 

Tags for this section are documented at cargo-deb homepage.

Section and priority tags are used to classify your application in Debian hierarchy. Although I've set them I think they are rather useless because official Debian repositories have higher requirement than cargo-deb can meet at the moment, so any debian package produced with cargo-deb will end in a personal repository were debian hierarchy for applications is not present.

Actually, most important tag is assets. That tag lets you set which files should be included in package and where they should be placed at installation. Format of that tag contents is straightforward. It is a list of tuples of three elements:

  • Relative path to file to be included in package: That path is relative to Cargo.toml location.
  • Absolute path to place that file in user computer.
  • Permissions for that file at user computer.

I should have included a "depends" tag to add my package dependencies. Cifra depends on SQLite3 and that is not a Rust crate but a system package, so it is a dependency of Cifra debian package. If you want to use "depends" tag you must use debian dependency format, but actually is not necessary because cargo-deb can calculate your dependencies automatically if you don't use "depends" tag. It does it using ldd against your compiled artifact and searching with dpkg  which system packages provides libraries detected by ldd.

Once you have your cargo-deb configuration in your Cargo.toml, building your debian package is as simple as:

dante@Camelot:~/Projects/cifra-rust/$ cargo deb
[...]
Finished release [optimized] target(s) in 0.17s
/home/dante/Projects/cifra-rust/target/debian/cifra_0.9.1_amd64.deb

dante@Camelot:~$

As you can see in the output, you will find your generated package in a new folder inside your project's called target/debian/.

Cargo deb is a wonderful tool which only downside is not being capable to meet Debian packaging policy to build packages suitable to be included in official repositories.

25 September 2021

How to create Python executables with PyOxidizer


Python is a great development language. But it lacks of a proper distribution and packaging support if you want end user get your application. Pypi and wheel packages are more intendend for developers to install their own apps dependencies, but and end user will feel as painful to use pip and virtualenv to install and run a python application.

There are some projects trying to solve that problem as PyInstaller, py2exe or cx_Freeze. I maintain vdist, that its closely related with this problem and tries to solve it creating debian, rpm and archlinux packages from python applications. In this article I'm going to analyze PyOxidizer. This tool is written in a language I'm really loving (Rust), and follows and approach somewhat similar to vdist as it bundles your application along a python distribution but besides compiles the entire bundle into an executable binary.

To structure this tutorial, I'm going to build and compile my Cifra project. You can clone it at this point to follow this tutorial step by step.

First thing to be aware is that PyOxidizer bundles your application with a customized Python 3.8 or 3.9 distribution, so your app should be compatible with one of those. You will need a C compiler/toolchain to build with PyOxidizer, If you don't have one PyOxidizer outputs instructions to install one. PyOxidizer uses Rust toolchain too, but it downloads it in the background for you so it's not a dependency you should worry about.

PyOxidizer installation

You have some ways to install PyOxidizer (downloading a compiled release from its GitHub Page, compiling from source) but I feel straighter and cleaner to install into your project's virtualenv using pip.

Guess you have Cifra project cloned (in my case at path ~/Project/cifra), inside it you created a virtualenv inside a venv folder and you activated that virtualenv, then you can install PyOxidizer inside that virtualenv doing:

(venv)dante@Camelot:~/Projects/cifra$ pip install pyoxidizer
Collecting pyoxidizer
Downloading pyoxidizer-0.17.0-py3-none-manylinux2010_x86_64.whl (9.9 MB)
|████████████████████████████████| 9.9 MB 11.5 MB/s
Installing collected packages: pyoxidizer
Successfully installed pyoxidizer-0.17.0

(venv)dante@Camelot:~/Projects/cifra$

You can check you have pyoxidizer properly installed doing:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer --help
PyOxidizer 0.17.0
Gregory Szorc <gregory.szorc@gmail.com>
Build and distribute Python applications

USAGE:
pyoxidizer [FLAGS] [SUBCOMMAND]

FLAGS:
-h, --help
Prints help information

--system-rust
Use a system install of Rust instead of a self-managed Rust installation

-V, --version
Prints version information

--verbose
Enable verbose output


SUBCOMMANDS:
add Add PyOxidizer to an existing Rust project. (EXPERIMENTAL)
analyze Analyze a built binary
build Build a PyOxidizer enabled project
cache-clear Clear PyOxidizer's user-specific cache
find-resources Find resources in a file or directory
help Prints this message or the help of the given subcommand(s)
init-config-file Create a new PyOxidizer configuration file.
init-rust-project Create a new Rust project embedding a Python interpreter
list-targets List targets available to resolve in a configuration file
python-distribution-extract Extract a Python distribution archive to a directory
python-distribution-info Show information about a Python distribution archive
python-distribution-licenses Show licenses for a given Python distribution
run Run a target in a PyOxidizer configuration file
run-build-script Run functionality that a build script would perform


(venv)dante@Camelot:~/Projects/cifra$


PyOxidizer configuration

Now create an initial PyOxidizer configuration file at your project root folder:

(venv)dante@Camelot:~/Projects/cifra$ cd ..
(venv) dante@Camelot:~/Projects$ pyoxidizer init-config-file cifra
writing cifra/pyoxidizer.bzl

A new PyOxidizer configuration file has been created.
This configuration file can be used by various `pyoxidizer`
commands

For example, to build and run the default Python application:

$ cd cifra
$ pyoxidizer run

The default configuration is to invoke a Python REPL. You can
edit the configuration file to change behavior.

(venv)dante@Camelot:~/Projects$

Generated pyoxidizer.bzl is written in Starlark, a python dialect for configuration files, so we'll feel at home there. Nevertheless, that configuration file is rather long and although it is fully commented at first it is not clear how to align all the moving parts. PyOxidizer is pretty complete too but it can be rather overwhelming for anyone that only wants to get started. Filtering the entire PyOxidizer documentation site, I've found this section as the most helpful to customize PyOxidizer configuration file.

There are many approaches to build binaries using PyOxidizer. Let's assume you have a setuptools setup.py file for your project, then we can ask PyOxidizer run that setup.py inside it's customized python. To do that add next line before return line of pyoxidizer.bzl make_exe() function:

# Run cifra's own setup.py and include installed files in binary bundle.
exe.add_python_resources(exe.setup_py_install(package_path=CWD))

In package_path parameter you have to provide your setup.py path. I've provided a CWD argument because I'm assuming that pyoxidizer.bzl and setup.py are at the same folder.

Next thing to customize at pyoxidizer.bzl is that it defaults to build a Windows MSI executable. At Linux we want an ELF executable output. So, first comment line near the end that registers a target to build a msi installer:

#register_target("msi_installer", make_msi, depends=["exe"])

We need also an entry point for our application. It would be nice if PyOxidicer would take setup.py entry_points parameter configuration but it doesn't. Instead we have to provide it manually through pyoxidizer.bzl configuration file. In our example just find the line at make_exe() function where python_config variable is created and place after:

python_config.run_command = "from cifra.cifra_launcher import main; main()"


Building executable binaries

Just now, you can run "pyoxidizer build" and pyoxidizer will begin to bundle our application.

But Cifra has a very specific problem at this point. If you try to run build over cifra with configuration so far, you will get next error:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build
[...]
error[PYOXIDIZER_PYTHON_EXECUTABLE]: adding PythonExtensionModule<name=sqlalchemy.cimmutabledict>

Caused by:
extension module sqlalchemy.cimmutabledict cannot be loaded from memory but memory loading required
--> ./pyoxidizer.bzl:272:5
|
272 | exe.add_python_resources(exe.setup_py_install(package_path=CWD))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ add_python_resources()


error: adding PythonExtensionModule<name=sqlalchemy.cimmutabledict>

Caused by:
extension module sqlalchemy.cimmutabledict cannot be loaded from memory but memory loading required

(venv)dante@Camelot:~/Projects$

PyOxidizer tries to embed every dependency of your application inside produced binary (in-memory mode). This is nice because you end with an unique distributable binary file and performance to load those dependencies is improved. Problem is that not every package out there admits to be embedded that way. Here SQLAlchemy fails to be embedded. 

In that case, those dependencies should be stored next to produced binary (filesystem-relative mode). PyOxidizer will link those dependencies inside binary using their places relative to produced binary, so when distributing we must pack produced binary and its dependencies files keeping their relative places.

One way to deal with this problem is asking PyOxidizer to keep things in-memory whenever it can and fallback to filesystem-relative when not. To do that uncomment next two lines from make_exe() function at configuration file:

# Use in-memory location for adding resources by default.
policy.resources_location = "in-memory"

# Use filesystem-relative location for adding resources by default.
# policy.resources_location = "filesystem-relative:prefix"

# Attempt to add resources relative to the built binary when
# `resources_location` fails.
policy.resources_location_fallback = "filesystem-relative:prefix"

Doing this you may make things work in your application, but for Cifra things keep failing despite build go further:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build
[...]
adding extra file prefix/sqlalchemy/cresultproxy.cpython-39-x86_64-linux-gnu.so to .
installing files to /home/dante/Projects/cifra/./build/x86_64-unknown-linux-gnu/debug/install
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "cifra.cifra_launcher", line 31, in <module>
File "cifra.attack.vigenere", line 21, in <module>
File "cifra.tests.test_simple_attacks", line 2, in <module>
File "pytest", line 5, in <module>
File "_pytest.assertion", line 9, in <module>
File "_pytest.assertion.rewrite", line 34, in <module>
File "_pytest.assertion.util", line 13, in <module>
File "_pytest._code", line 2, in <module>
File "_pytest._code.code", line 1223, in <module>
AttributeError: module 'pluggy' has no attribute '__file__'
error: cargo run failed

(venv)dante@Camelot:~/Projects$

This error seems somewhat related to this PyOxidizer nuance

To deal with this problem I must keep everything filesystem related:

# Use in-memory location for adding resources by default.
# policy.resources_location = "in-memory"

# Use filesystem-relative location for adding resources by default.
policy.resources_location = "filesystem-relative:prefix"

# Attempt to add resources relative to the built binary when
# `resources_location` fails.
# policy.resources_location_fallback = "filesystem-relative:prefix"

Now build go smoothly:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build
[...]
installing files to /home/dante/Projects/cifra/./build/x86_64-unknown-linux-gnu/debug/install
(venv)dante@Camelot:~/Projects/cifra$ls build/x86_64-unknown-linux-gnu/debug/install/
cifra prefix
(venv)dante@Camelot:~/Projects$

As you can see, PyOxidizer generated an ELF binary for our application and stored all of its dependencies in prefix folder:

(venv)dante@Camelot:~/Projects/cifra$ ls build/x86_64-unknown-linux-gnu/debug/install/prefix
abc.py concurrent __future__.py _markupbase.py pty.py sndhdr.py tokenize.py
aifc.py config-3 genericpath.py mimetypes.py py socket.py token.py
_aix_support.py configparser.py getopt.py modulefinder.py _py_abc.py socketserver.py toml
antigravity.py contextlib.py getpass.py multiprocessing __pycache__ sqlalchemy traceback.py
argparse.py contextvars.py gettext.py netrc.py pyclbr.py sqlite3 tracemalloc.py
ast.py copy.py glob.py nntplib.py py_compile.py sre_compile.py trace.py
asynchat.py copyreg.py graphlib.py ntpath.py _pydecimal.py sre_constants.py tty.py
asyncio cProfile.py greenlet nturl2path.py pydoc_data sre_parse.py turtledemo
asyncore.py crypt.py gzip.py numbers.py pydoc.py ssl.py turtle.py
attr csv.py hashlib.py opcode.py _pyio.py statistics.py types.py
base64.py ctypes heapq.py operator.py pyparsing stat.py typing.py
bdb.py curses hmac.py optparse.py _pytest stringprep.py unittest
binhex.py dataclasses.py html os.py pytest string.py urllib
bisect.py datetime.py http _osx_support.py queue.py _strptime.py uuid.py
_bootlocale.py dbm idlelib packaging quopri.py struct.py uu.py
_bootsubprocess.py decimal.py imaplib.py pathlib.py random.py subprocess.py venv
bz2.py difflib.py imghdr.py pdb.py reprlib.py sunau.py warnings.py
calendar.py dis.py importlib __phello__ re.py symbol.py wave.py
cgi.py distutils imp.py pickle.py rlcompleter.py symtable.py weakref.py
cgitb.py _distutils_hack iniconfig pickletools.py runpy.py _sysconfigdata__linux_x86_64-linux-gnu.py _weakrefset.py
chunk.py doctest.py inspect.py pip sched.py sysconfig.py webbrowser.py
cifra email io.py pipes.py secrets.py tabnanny.py wsgiref
cmd.py encodings ipaddress.py pkg_resources selectors.py tarfile.py xdrlib.py
codecs.py ensurepip json pkgutil.py setuptools telnetlib.py xml
codeop.py enum.py keyword.py platform.py shelve.py tempfile.py xmlrpc
code.py filecmp.py lib2to3 plistlib.py shlex.py test_common zipapp.py
collections fileinput.py linecache.py pluggy shutil.py textwrap.py zipfile.py
_collections_abc.py fnmatch.py locale.py poplib.py signal.py this.py zipimport.py
colorsys.py formatter.py logging posixpath.py _sitebuiltins.py _threading_local.py zoneinfo
_compat_pickle.py fractions.py lzma.py pprint.py site.py threading.py
compileall.py ftplib.py mailbox.py profile.py smtpd.py timeit.py
_compression.py functools.py mailcap.py pstats.py smtplib.py tkinter

(venv)dante@Camelot:~/Projects$

If you want to name that folder with a more self-explanatory name, just change "prefix" for whatever you want in configuration file. For instance, to name it "lib":

# Use in-memory location for adding resources by default.
# policy.resources_location = "in-memory"

# Use filesystem-relative location for adding resources by default.
policy.resources_location = "filesystem-relative:lib"

# Attempt to add resources relative to the built binary when
# `resources_location` fails.
# policy.resources_location_fallback = "filesystem-relative:prefix"

Let's see how our dependencies folder changed:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build
[...]
installing files to /home/dante/Projects/cifra/./build/x86_64-unknown-linux-gnu/debug/install
(venv)dante@Camelot:~/Projects/cifra$ls build/x86_64-unknown-linux-gnu/debug/install/
cifra lib
(venv)dante@Camelot:~/Projects$


Building for many distributions

So far, you have get a binary that can be run in any Linux distribution like the one you used to build that binary. Problem is that our binary can fail if its run in other distributions:

(venv)dante@Camelot:~/Projects/cifra$ ls build/x86_64-unknown-linux-gnu/debug/install/
cifra lib
(venv) dante@Camelot:~/Projects/cifra$ docker run -d -ti -v $(pwd)/build/x86_64-unknown-linux-gnu/debug/install/:/work ubuntu
36ad7e78c00f90172bd664f9a045f14acc2138b2ee5b8ac38e7158aa145d4965
(venv) dante@Camelot:~/Projects/cifra$ docker attach 36ad
root@36ad7e78c00f:/# ls /work
cifra lib
root@36ad7e78c00f:/# /work/cifra --help
usage: cifra [-h] {dictionary,cipher,decipher,attack} ...

Console command to crypt and decrypt texts using classic methods. It also performs crypto attacks against those methods.

positional arguments:
{dictionary,cipher,decipher,attack}
Available modes
dictionary Manage dictionaries to perform crypto attacks.
cipher Cipher a text using a key.
decipher Decipher a text using a key.
attack Attack a ciphered text to get its plain text

optional arguments:
-h, --help show this help message and exit

Follow cifra development at: <https://github.com/dante-signal31/cifra>
root@36ad7e78c00f:/# exit
exit

(venv)dante@Camelot:~/Projects$

Here our built binary runs in another ubuntu because my personal box (Camelot) is an ubuntu (actually Linux Mint). Our generated binary will run right in other machines with the same distribution like the one I used to build binary. 

But let's see what happens if we run our binary in a different distribution:

(venv)dante@Camelot:~/Projects/cifra$ docker run -d -ti -v $(pwd)/build/x86_64-unknown-linux-gnu/debug/install/:/work centos
Unable to find image 'centos:latest' locally
latest: Pulling from library/centos
a1d0c7532777: Pull complete
Digest: sha256:a27fd8080b517143cbbbab9dfb7c8571c40d67d534bbdee55bd6c473f432b177
Status: Downloaded newer image for centos:latest
376c6f0d085d9947453a2786a9a712146c471dfbeeb4c452280bd028a5f5126f
(venv) dante@Camelot:~/Projects/cifra$ docker attach 376
[root@376c6f0d085d /]# ls /work
cifra lib
[root@376c6f0d085d /]# /work/cifra --help
/work/cifra: /lib64/libm.so.6: version `GLIBC_2.29' not found (required by /work/cifra)
[root@376c6f0d085d /]# exit
exit

(venv)dante@Camelot:~/Projects$

As you can see, our binary fails on Centos. That happens because libraries are not named the same across distributions so our compiled binary fails to load shared dependencies it needs to run.

To solve this you need to build a fully statically linked binary, so it has no external dependencies at all. 

To build that kind of binaries we need to update Rust toolchain to build for that kind of targets:

(venv)dante@Camelot:~/Projects/cifra$ rustup target add x86_64-unknown-linux-musl
info: downloading component 'rust-std' for 'x86_64-unknown-linux-musl'
info: installing component 'rust-std' for 'x86_64-unknown-linux-musl'
30.4 MiB / 30.4 MiB (100 %) 13.9 MiB/s in 2s ETA: 0s

(venv)dante@Camelot:~/Projects$

To make PyOxidizer build a binary like that you are supossed to do:

(venv)dante@Camelot:~/Projects/cifra$ pyoxidizer build --target-triple x86_64-unknown-linux-musl
[...]
Processing greenlet-1.1.1.tar.gz
Writing /tmp/easy_install-futc9qp4/greenlet-1.1.1/setup.cfg
Running greenlet-1.1.1/setup.py -q bdist_egg --dist-dir /tmp/easy_install-futc9qp4/greenlet-1.1.1/egg-dist-tmp-zga041e7
no previously-included directories found matching 'docs/_build'
warning: no files found matching '*.py' under directory 'appveyor'
warning: no previously-included files matching '*.pyc' found anywhere in distribution
warning: no previously-included files matching '*.pyd' found anywhere in distribution
warning: no previously-included files matching '*.so' found anywhere in distribution
warning: no previously-included files matching '.coverage' found anywhere in distribution
error: Setup script exited with error: command 'musl-clang' failed: No such file or directory
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: "command [\"/home/dante/.cache/pyoxidizer/python_distributions/python.70974f0c6874/python/install/bin/python3.9\", \"setup.py\", \"install\", \"--prefix\", \"/tmp/pyoxidizer-setup-py-installvKsIUS/install\", \"--no-compile\"] exited with code 1" }', pyoxidizer/src/py_packaging/packaging_tool.rs:336:38
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

(venv)dante@Camelot:~/Projects$

As you can see, I get an error that I've been unable to solve. Because of that I've filled an issue at PyOxidizer GitHub page. PyOxidizer author kindly answered my issue pointing I needed musl-clang command in my system. I've not found any package with musl-clang, so I guess it's something you have to compile from source. I've tried to google it but I haven't found a clear way to get that command up and running (I have to admit I'm not a C/C++ ninja). So, I guess I'll have to use pyoxidizer without static compiling until musl-clang dependency dissapears ot pyoxidizer documentation explains how to get that command.


Conclusion

Althought I haven't got it work to create statically linked binaries, PyOxidizer feels like a promising tool. I think I can use at vdist to speed up packaging process. As in vdist I use specific containers for every type of package inability to create statically linked binaries doesn't seem a blocker. It seems actively developed and has a lot of contributors, so it seems a good chance to help to simplify python packaging and deployment of python applications.

10 August 2021

How to use JFrog Artifactory to distribute your packages


In a previous article I reviewed some ways to generate a debian or rpm packages for your developed applications. But after packaging your applications you must find an easy way to let your users get your package. You can allways leave then at your releases section at Github to let users download them but downside of that is that any update will be cumbersome to be announced and installed. It's far better to use a package repository to let user install your package using a package manager like apt or yum.

You can try to get your package into official distro repository but chances are you won't fulfill their requirements, so a personal repository is your most likely way to go. 

For long time I used Bintray to host my deb and rpm packages, but Bintray ended it's service as of march of 2021. I've had to look for an alternative. Finally I've found JFrog Artifactory, the official heir of Bintray.

JFrog Artifactory has a free tier for open source projects. If your project is not so popular to exceed 50 GB of monthly download Artifactory should be more than enough for your personal projects.

The only downside is Artifactory is a more complex (and complete) service than Bintray, so it's harder to get it running if you're just a hobbyist. I'm going to explain what I've learnt so far so you have an easier time than me with Artifactory.

Once registered on the platform, you enter the quick setup menu:

There, you can create a repository of any of supported packages. For this article I'm going for Debian. Click on Debian icon and select to create a new repository.

In next window you are asked to give a name (a prefix) for repository:

 


In the screenshot I'm calling vdist my repository. Artifactory creates a virtual, a remote and a local repositories. As far as I know the only repository useful at this point is the local one, so when following this article be sure to always select "debian-local" option.

Next window is deceptively simple as it can make you think you are ready to upload packages following instructions given in Deploy and Resolve tags:

 
Problem is that you need to configure some things before your repository is fully operational, as I've learnt the hard way.

First, you need to allow anonymous access to your repository to allow people download your packages. What is most confusing here is that anonymous access is configured (you can see that permission in Administration > Identity and Access > Permissions) but apparently it does not seem to work at all, so when you try to access to your repository using apt you only get a unauthorized error. The gotcha here is that first you need to enable globally anonymous access at Administration > Security > Settings:

 


 

Only after checking that option you will end getting unauthorized error.

To configure client linuxboxes to use your repository, just include in your /etc/sources.list file:

 deb https://dlabninja.jfrog.io/artifactory/<REPOSITORY_NAME>-debian-local <DISTRIBUTION> <COMPONENT>

In my example REPOSITORY_NAME is vdist, DISTRIBUTION is the name of distribution you are targeting (for example, in Ubuntu, it could be trusty) and por COMPONENT I use main. By the way, dlabninja is the tenant name I gave myself when I registered at Artifactory, yours is going to be different.

You may think you're ready to start uploading packages, but I'm afraid you're not yet. If you try to use apt to access your repository at this point you're going to get an error saying your repository is not signed so accessing to it is forbidden. To fix that you must create a pair of GPG keys to sign your packages and upload them to Artifactory.

To create a GPG key you can type at your console:



Type an identification name, an email and a password when asked. For name I use the repository one. Take note of the password you used If you forget it there is no way to recover it. Big string beginning with "F4F316" and ending with "010E55" is my key id, yours will be similar. That string will be useful to identify your key in gpg commands.

You can list your keys:


To upload your generated key, you first need to export it to a file. That export need to be split in two files: first you export your public key and afterwards your private key:

 

With first command I exported vdist public key and with second the private one. Note I've given an explanatory extension to exported files. This is a good moment to store these keys in a safe place.

To upload those files to Artifactory you need to go to Administration > Artifactory > Security > Keys management:

 

There, select "+ Add keys" at "Signing keys" tab. In the opening window enter the name for this signing key (in this case "vdist"), drag and drop over it exported key files and enter private key password. When done you'll have your key properly imported in Artifactory and ready to be used.

To configure imported GPG keys with a repository go to Administration > Repositories, select you repository and "Advanced" tab. There you have a "Primary key name" combo where you can select your key. Don't forget to click "Save & Finish" before leaving or any change will be lost:

 

Once done, you won't get an unsigned repository error with apt, but you'll still get an error:

(Click to enlarge)

Package manager complains because although repository is GPG signed it does not recognize its public key. To solve it we must upload our public key to one of the free PGP registries so our users can download and import it. For this matter I send my public keys to ubuntu.keyserver.com:


Once a public registry has a public key they sync with others to share them. Our user must import that public key and tell her package manager that public key is trusted. To do it we must be sudo:

Obviously we must do the same thing If we want to try to install our own package.

After that, sudo apt update will work smoothly:


Finally we are really ready to upload our first package to our repository. You have two way to do it: manually and programmatically.

You can upload packages manually through Artifactory web interface going to Artifactory > Artifacts > Selecting repository (in my example vdist-debian-local) > Deploy (button at the upper right corner). It opens a pop up window where you can drag an drop your package. Make sure that "Target repository" field is properly set to your repository (it is easy to send your package to the wrong repository).

Besides, Artifactory let you upload packages from command line what makes it perfect to do it programmatically in continuous integration workflow. You can see needed command in Artifactory > Artifacts > Set me up (button at the upper right corner). It opens a pop-up window with a tab called "Deploy" where you can find commands needed to deploy packages to a given repository:

As you can see commands has place holders for many fields. If you are not sure what to place at USERNAME and PASSWORD fields, go to configure tab, type your administrative password there and return to Deploy tab to see how those fields have been completed to you.