Showing posts with label rust. Show all posts
Showing posts with label rust. Show all posts

20 January 2022

How to parse console arguments in your Rust application with Clap

In a previous article I explained how to use ArgParse module to read console arguments in  your Python applications. Rust has it's own crates to parse consoles arguments too. In this article I'm going to explain how to use clap crate for that.

One wonderful thing about Rust is that inside its hard rustacean shell it uses to have a pythonista heart. Using clap you'll find many of ArgParse features. Actually concept are quite similar: while you called verbs as subparsers in ArgParser, here in clap you're going to call them subcommands, and ArgParse arguments are simply named at clap like arg. So if you're used to ArgParser you'll likely feel at home using clap. I'm going to assume you read my ArgParse article to not to repeat myself explaining the same concepts. 

Like any other Rust crate you need to include clap in your Cargo.toml file:

After that you can use clap in your source code. To illustrate explanations, I'm going to use as an example the command parsing I use in my project cifra-rust

As you can see there, you can use clap directly in you main() function but I like to abstract it in a generic function that returns my own defined Configuration struct type. That way if I switch from clap to any other parser crate change will be smoother for my app (reduce coupling). So, as I do at python I define a parse_arguments() function that returns a Configuration type.

There you can see that root parser is defined at clap using App::new(). As other rust crates, clap makes heavy use of builder pattern to configure its parsing. That way you can configure command version, author or long description ("long_about"), between other options, as you can see from lines 277 to 280:

Clap behaviour can be customized with setting() call. A typical parameter for that call is AppSettings::ArgRequiredElseHelp, to show help if command is called with no arguments:

A subparser is created calling subcommand() and passing it a new App instance:

With call about() you can define a short description, about the command or arguments, that will appear when --help is called.

Usually parser (and subparsers) will need arguments. Arguments are defined calling arg() in the parser they belong to. Here you have an example:

In last example you can see that subparser create (line 285) has two arguments: dictionary_name (line 287) and initial_words_file (line 292). Note that every arg() call uses as a parameter and Arg instance. Argument configuration is done using builder pattern over Arg instances.

Argument dictionary_name is required because it is configured as required(true) at line 288. Be aware that although you can use a flag here you are discouraged to do so. By definition, all required arguments should be positional (i.e. require no flag), the only time a flag could be used in required arguments is when called operation is destructive and user is required to prove he knows what he is doing providing that extra flag. When a positional argument is used you may call index() to specify the position of this argument relative to other positional arguments. Actually, I've found out later that you can leave off index() and in that case index will be assigned in order of evaluation. What index() allows you is setting indexes out of order.

When you call takes_value(true) on an argument, the value provided by user is stored in a key called like the argument. For instance, the takes_value(true) at line 290 makes the provided value to be stored in a key called dictionary_name. If your are using an optional flag with no value (i.e. a boolean flag) you could call takes_value(false) or just omit it.

The call to value_name() is equivalent to metavar parameter at python's argparse. It lets you define the string you want to be used to represent this parameter when help is called with --help argument.

Oddly, arguments don't use about() to define their help strings but help() instead.

You can find an optional argument definition from line 292 to 298. There, long version of flag is defined with long() and short one with short() call. In this example, this argument could be called both "--initial_words_file <PATH_TO_FILE_WITH_WORDS>" and "-i <PATH_TO_FILE_WITH_WORDS>".

It can be useful call validator(), as it let you define a function to be called over provided argument to assert it is what you expect to receive. Provided function should receive an string parameter with provided argument and it should return a Result<(), String>. If you make your checks and find correct the argument the function should return an Ok(()), or an Err("Here you write your error message") instead.

Chaining methods on nested arguments and subcommands you can define an entire command tree. Once you finish you have to make one final call to get_matches_from() to the root parser. You pass a vector of strings to that method, with every individual command argument:

Usually you'll pass a vector of strings from args() which returns a vector with every command argument given by user when application was called from console.

Note that my main() function is almost empty. That is because that way I can call _main() (note the underscore) from my integration tests, entering my own argument vector, to simulate a user calling the application from console.

What get_matches_from() returns is a ArgMatches type. That type retuns every argument value if be search by its key name. From line 149 to 245 we implement a method to create a Configuration type using an ArgMatches contents.

Be aware that when you only have a parser you can get values directly using value_of() method. But if you have subcommands you have first to get the specific ArgMatches for that subcommand branch doing a call to subcommand_matches():

In last example you can see the usual workflow:
  • You go deep getting the ArgMatches of the branch you are interested in. (lines 160-161)
  • Once you have the ArgMatches you want, you use value_of() to get an specific argument value. (line 164)
  • For optionals parameters you can make a is_present() call to check if that one was provided or not. (line 165). 
Using those methods, you can retrieve every provided value and build up your configuration to run your application.

As you can see you can make really powerful command parser with clap getting every functionality you are used at ArgParse but in a rustacean environment.

07 November 2021

How to use GitHub Actions for continuous integration and deployment

In a previous article I explained how to use Travis CI to get continuous integration in your project. Problem is that Travis has changed its usage terms in the last year and now its not so comfortable for open source projects. The keep saying they are still free for open source project but actually you have to beg for free credits every time you expend them and they make you to prove you still comply with what they define as an open source project (I've heard about cases where developers where discarded for free credits just because they had GitHub sponsors).

So, although I've kept my vdist project at Travis (at the moment) I've been searching alternatives for my projects. As I use GitHub for my open source repositories its natural to try its continuous integration and deployment framework: GitHub Actions.

Conceptually, using GitHub Actions is pretty much the same as Travis CI, so I'm not going to repeat myself. If you want to learn why to use a continuous integration and deployment framework you can read the article I linked at this one start.

So, lets focus at the point of how to use GitHub Actions.

As Travis CI what you do with GitHub Actions is centered in yaml files you place at .github/workflows folder in your repository. GitHub will look at that folder to execute CI/CD workflows. In that folder you can have many workflows to be executed depending on different GitHub events (pushes over branches, pull requests, issues filling and a long etcetera). 

But I find GitHub Actions way better than Travis CI. GitHub promotes massive reusing. Every single step can be encapsulated and shared with your other workflows and with other people at GitHub. With so many people developing and sharing at GitHub you'll find yourself reusing others tasks (a.k.a actions) than implementing by yourself. Unless your are automatizing something really weird, chances are that some other has implemented it and shared. In this article we'll use others actions and implement our own custom steps. Besides you can reuse your own workflows (or share with others) so if you have a working workflow for your project you can reuse it in a similar project so you don't need to reimplement its workflow from scratch.

For this article we'll focus in a typical "test -> package -> deploy" workflow which we're going to call test_and_deploy.yaml (if you feel creative you can call it as you like, but try to be expressive). If you want this article full code you can find it in this commit of my Cifra-rust project at GitHub.

To create that file you have two options: create it in your IDE and push like any other file or write it using built-in GitHub web editor. For the first time my advice is to use web editor as it guides you better to get your first yaml up and working. So go to your GitHub repo page and click over Actions tab:

There, your are prompted to create your first workflow. When you choose to create a workflow you're offered to use a predefined template (GitHub has many for different tasks and languages) or set up a workflow yourself. For the sake of this article choose "set up a workflow yourself". Then you will enter to web editor. An initial example will be already loaded to let you have and initial scaffolding.

Now lets check Cifra-rust yaml file to learn what we can do with GitHub actions.


At very first few lines (from line 1 to 12) you can see we name this workflow ("name" tag). Use an expressive name, to identify quickly what this workflow does. You will use this name to reuse this workflow from other repositories.

The "on" tag defines which events trigger this workflow. In my case, this workflow is triggered by pushes and pull requests over staging branch. There're many more events you can use.

The "workflow_dispatch" tag allows you to trigger this workflow manually from GitHub web interface. I use to set it, it doesn't harm to have that option.


Next is the "jobs" tag (line 15) and there is where the "nuts and guts" of workflow begins. A workflow is composed of jobs. Every job is runned in a separate virtual machine (the runner) so each job has its dependencies encapsulated. That encapsulation is good to avoid jobs messing dependencies and filesystems of others jobs. Try to focus every job in just one task. 

Jobs are run in parallel by default unless you explicitly set dependencies between them. If you need a job B be run after a job A is completed successfully you need to use tag "needs" to set B needs A to be completed to start. In Cifra-rust example jobs "merge_staging_and_master", "deploy_crates_io" and "generate_deb_package" need "tests" job to be successfully finished to start. You can see an example of "needs" tag usage at line 53:

As "deploy_debian_package" respectively needs "generate_deb_package" to be finished before, you end with an execution tree like this:


Every job is composed of one or multiple steps. One step is a sequence of shell commands You can run native shell commands or scripts you had included in your repository. From line 112 to 115 we have one of those steps:

There we are calling a script stored in a folder called ci_scripts from my repository. Note the pipe (" | ") next to "run" tag. Without that pipe you can include just one command in run tag but that pipe allows you to include multiple command to executed separated in different lines (like step in lines 44 to 46).

If you find yourself repeating the same commands across multiple workflows them you have a good candidate to encapsulate those commands in an action. As you can guess Actions are the key stones of GitHub Actions. An action is a set of commands packaged to be shared and reused in many workflows (yours or of others users). An action has inputs and outputs and what happens inside is not your problem while it works as intended. You can develop your own actions and share with others but it deserves an article on its own. In next article I will convert that man page generation step in a reusable action.

At right hand side, GitHub Actions web editor has a searcher to find actions suitable for the task you want to perform. Guess you want to install a Rust toolchain, you can do this search:

Although GitHub web editor is really complete and useful to find mistakes in your yaml files, its searcher lacks a way to filter or reorder its results. Nevertheless that searcher is your best option to find shared actions for your workflows.

Once you click in any searcher result you are shown a summary about which text to include in your yaml file to use that action. You can get more detailed instructions clicking in "View full Marketplace listing" link.

As any other step, and action uses a "name" tag to document which task is intended to do and an "id" if that step must be referenced from other workflow places (see an example at line 42). What makes different an action from a command step is the "uses" tag. That tag links to the action we want to use. the text to use in that tag differs for every action but you can find what to write there at searcher results instructions. In those instructions the use to describe which inputs the action accepts. Those inputs are included in the "with" tag. For instance, in lines 23 to 27 I used an action to install the Rust building framework:

As you can see, in "uses" tag you can include which version of given action to use. There are wildcards to use latest version but you'd better set an specific version to avoid your workflows get broken by actions updates.

You make a job chaining actions as steps. Every step in a job is executed sequentially. If any of then fails the entire workflow fails. 

Sharing data between steps and jobs

Although the steps of a workflow are executed in the same virtual machine they cannot share bash environment variables because every step spawns a different bash process. If you set an environment variable in a step that needs to be used in another step of the same job you have two options:

  • Quick and dirty: Append that environment variable to $GITHUB_ENV so that variable can be accessed later using env context. For instance, at line 141 we create DEB_PACKAGE environment variable:

That environment variable is accessed at line 146, in the next step:

  • Set step outputs: Problem with last method is that although you can share data between steps of the same job you cannot do it across different jobs. Setting steps outputs is a bit slower but leaves your step ready to share data not only with other steps of the same job but with any step of the workflow. To set a variable environment as an step output you need to echo that variable to ::set-output and give a name to that variable followed by its value after a double colon ("::"). You have an example at line 46:

Note that step must be identified with an "id" tags to further retrieve the shared variable. As that step is identified as "version_tag" the created "package_tag" variable can be later retrieved from another step of the same job using: 

${{ steps.version_tag.outputs.package_tag }}

Actually that method is used at line 48 to prepare that variable to be recovered from another job. Remember that method so far helps you to pass data to steps in the same job. To export data from a job to be used from another job you have to declare it first as a job output (lines 47-48):

Note that in last screenshot indentation level should be at the same level as "steps" tag to properly set package_tag as a job output.

To retrieve that output from another job, the catching job must declare giving job as needed in its "needs" tag. After that, the shared value can be retrieved using next format:

${{ needs.<needed_job>.outputs.<output_varieble_name> }} 

In our example, "deploy_debian_package" needs the value exported at line 48 so it declares its job (tests) as needed in line 131:

After that, it can get and use that value at line 157:

Passing files between jobs

Sometimes passing a variable is not enough because you need to produce files in a job to be consumed in another job.

You can share files between steps in the same job because those steps share the same virtual machine. But between jobs you need to transfer files from a virtual machine to another.

When you generate a file (an artifact) in a job, you can upload it to a temporal shared storage to allow other jobs in the same workflow get that artifact. To upload and download artifact to that temporal storage you have two predefined actions: upload-artifact and download-artifact.

In our example, "generate_deb_package" job generates a debian package that is needed by "deploy_debian_package". So, in lines 122 to 126 "generated_deb_package" uploads that package:

On the other side "deploy_debian_package" downloads saved artifact in lines 132 to 136:

Using your repository source code

By default you start every job with a clean virtual machine. To have your source code downloaded to that virtual machine you use an action called checkout. In our example it is used as first step of "tests" job (line 21) to allow code to be built and tested:

You can set any branch to be downloaded, but if you don't set anyone the one related to triggering event is used. In our example, staging branch is the used one.

Running your workflow

You have two options to trigger your workflow to try it: you can do the triggering event (in our example pushing code to staging branch) or you can launch workflow manually from GitHub web interface (assuming you set "workflow_dispatch" tag in your yml file as I advised).

To do a manual triggering, go to repository Actions tab and select the workflow you want to launch. Then you'll see "Run workflow button" at right hand side:

Once pushed that button, workflow will start and that tab list will show that workflow as active. Selecting that workflow in the list shows an execution tree like the one I showed earlier. Clicking in any of the boxes shows running logs of every step of that job. It is extremely easy read through generated logs.


I've found GitHub Actions really enjoyable. Its focus in reusability and sharing makes really easy and fast creating complex workflow. Unless you're working in a really weird workflow, chances are that most of your workflow components (if not all) are already implemented and shared as actions, so designing a complex workflow becomes an easy task of joining already available pieces. 

GitHub Actions documentation is great, and it popularity makes easy find answers online for any problem you meet.

Besides that, I've felt yml files structure coherent and logic so its easy to grasp concepts an gain a good level really quickly. 

Being free and unlimited for open source repositories I guess I'm going to migrate all my CI/CD workflows to GitHub Actions from Travis CI.

23 October 2021

How to package Rust applications - DEB packages

Rust language itself is harsh. Your first contact with compiler and borrow checker uses to be traumatic until you realize they are actually to protect you against yourself. Once you understand that you begin to love that language.

But everything else apart of the language is kind, really comfortable I'd say. With cargo, compiling, testing, documenting, profiling and even publishing to (the Pypi of Rust) is a charm. Packaging is no exception of that as it is integrated with cargo assuming some configuration we're going to explain here.

To package my Rust applications into debian packages, I use cargo-deb. To install it just type:

dante@Camelot:~~/Projects/cifra-rust/$ cargo install cargo-deb
Updating index
Downloaded cargo-deb v1.32.0
Downloaded 1 crate (63.2 KB) in 0.36s
Installing cargo-deb v1.32.0
Downloaded crc v1.8.1
Downloaded build_const v0.2.2
Compiling crossbeam-deque v0.8.1
Compiling xz2 v0.1.6
Compiling toml v0.5.8
Compiling cargo_toml v0.10.1
Finished release [optimized] target(s) in 53.21s
Installing /home/dante/.cargo/bin/cargo-deb
Installed package `cargo-deb v1.32.0` (executable `cargo-deb`)


Done that, you could start packaging simple applications. By default cargo deb obtains basic information from your Cargo.toml file. That way it loads next fields:

  • name
  • version
  • license
  • license-file
  • description
  • readme
  • homepage
  • repository

But seldom happens your application has no dependencies at all. To configure more advanced use cases, create a [package.metadata.deb] section in your Cargo.toml. In that section you can configure next fields:

  • maintainer
  • copyright
  • changelog
  • depends
  • recommends
  • enhances
  • conflicts
  • breaks
  • replaces
  • provides
  • extended-description
  • extended-description-file
  • section
  • priority
  • assets
  • maintainer-scripts

As a working example of this you can read this Cargo.toml version of my application Cifra.

There you can read general section from where cargo deb loads its basic information:



Cargo has a great documentation where you can find every section and tag explained.

Be aware that every file path you include in Cargo.toml is relative to Cargo.toml file.

Specific section for cargo-deb must not be long to have a working package:


Tags for this section are documented at cargo-deb homepage.

Section and priority tags are used to classify your application in Debian hierarchy. Although I've set them I think they are rather useless because official Debian repositories have higher requirement than cargo-deb can meet at the moment, so any debian package produced with cargo-deb will end in a personal repository were debian hierarchy for applications is not present.

Actually, most important tag is assets. That tag lets you set which files should be included in package and where they should be placed at installation. Format of that tag contents is straightforward. It is a list of tuples of three elements:

  • Relative path to file to be included in package: That path is relative to Cargo.toml location.
  • Absolute path to place that file in user computer.
  • Permissions for that file at user computer.

I should have included a "depends" tag to add my package dependencies. Cifra depends on SQLite3 and that is not a Rust crate but a system package, so it is a dependency of Cifra debian package. If you want to use "depends" tag you must use debian dependency format, but actually is not necessary because cargo-deb can calculate your dependencies automatically if you don't use "depends" tag. It does it using ldd against your compiled artifact and searching with dpkg  which system packages provides libraries detected by ldd.

Once you have your cargo-deb configuration in your Cargo.toml, building your debian package is as simple as:

dante@Camelot:~/Projects/cifra-rust/$ cargo deb
Finished release [optimized] target(s) in 0.17s


As you can see in the output, you will find your generated package in a new folder inside your project's called target/debian/.

Cargo deb is a wonderful tool which only downside is not being capable to meet Debian packaging policy to build packages suitable to be included in official repositories.