17 October 2021

How to write manpages with Markdown and Pandoc


No decent console application is released without a man page documenting how to use it. However, man page inners are rather arcane, but being a 1971 file format it has held up quite well.

Nowadays you have two standard ways to learn how to use a console command (google apart ;-) ): typing application command followed by "--help" to get a quick glance of application usage, or typing "man" followed by application name to get a detailed information about application usage. 

To implement "--help" approach in your application you can manually include "--help" parsing and output or you'd better use an argument parsing library like Python's ArgParse.

The "man" approach needs you to write a man page for your application.

Standard way to produce man pages is using troff formating commands. Linux has it's own troff implementation called groff. You have a feel about how is standard man pages format you can type next source inside a file called, for instance, corrupt.1:

.TH CORRUPT 1 .SH NAME corrupt \- modify files by randomly changing bits .SH SYNOPSIS .B corrupt [\fB\-n\fR \fIBITS\fR] [\fB\-\-bits\fR \fIBITS\fR] .IR file ... .SH DESCRIPTION .B corrupt modifies files by toggling a randomly chosen bit. .SH OPTIONS .TP .BR \-n ", " \-\-bits =\fIBITS\fR Set the number of bits to modify. Default is one bit.

Once saved, that file can be displayed by man. Assuming you are in the same folder than corrupt.1 you can type:

dante@Camelot:~/$ man -l corrupt.1

Output is: 

CORRUPT(1)                                                 General Commands Manual 

NAME
corrupt - modify files by randomly changing bits

SYNOPSIS
corrupt [-n BITS] [--bits BITS] file...

DESCRIPTION
corrupt modifies files by toggling a randomly chosen bit.

OPTIONS
-n, --bits=BITS
Set the number of bits to modify. Default is one bit.

CORRUPT(1)


 

You can opt to write specific man pages for your application following for instance this small cheat sheet

Nevertheless, troff/groff format is rather cumbersome and I feel keeping two sources of usage documentation (your README.md and your man page) is prone to errors and an effort waste. So, I follow a different approach: I write and keep updated my README.md and afterwards I convert it to a man page. Sure you need to keep a quite standard format for your README, and it won't be the fanciest one, but at least you will avoid to type the same things twice, in two different files, with different formatting tags.

The key stone to make that conversion is a tool called Pandoc. That tool is the Swiss army knife of document conversion. You can use it to convert between many documents format like word (docx), openoffice (odt), epub... or markdown (md) and groff man. Pandoc uses to be available at your distribution standard package manager repositories, so at an ubuntu distribution you just need:

dante@Camelot:~/$ sudo apt install pandoc

Once Pandoc is installed, you have to observe some conventions with your README.md to make further conversion easier. To illustrate this article explanations, we will follow as an example the file README.md for my project Cifra.

As you can see, linked README.md contains GitHub badges at the very beginning but those are going to be removed in conversion process we are going to explain later in this article. Everything else is structured to comply with contents a man page is supposed to have.

Maybe, most suspecting line is the first one. It's not usual to find a line like this in GitHub README:

Actually, that line is metadata for man page reader. After "%" contains document title (usually application name), manual section, a version, a "|" separator and finally a header. Manual section is 1 for user commands, 2 for system calls and 3 for C library functions. Your applications will fit in section 1 99% of times. I've not included a version for Cifra in that line, but you could have done. Besides, header indicates what set of documentation this manual page belongs to.

After that line, every section use to be included in man pages, but the only ones that should be included at least are:

  • Name: The name of the command.
  • Synopsis: A one-liner summarizing command line arguments and options.
  • Description: Describes in detail how to use your application command.

Other sections you can add are:

  • Options: Command line options.
  • Examples: About command usage.
  • Files: Useful if your application includes configuration files.
  • Environment: Here you explain if your application uses any environment variable.
  • Bugs: Detected bugs should be reported? There you can link your GitHub issues page.
  • Authors: Who is this masterpiece of code author?
  • See also: References to other man pages.
  • Copyright | License: A good place to include your application license text.

Once you have decided which sections include in manpage, you should write them following a format easily convertible by pandoc in a manpage with an standard structure. That's why main sections are always marked at level one of indentation ("#"). A problem with indentation is that while you can get further subheaders in markdown using "#" character  ("#" for main titles, "##" for subheaders, "###" for sections, and so on), those subheaders are only recognized by Pandoc up to sublevel 2. To get more sublevels I've opted for "pandoc lists", a format you can use in your markdown that is recognized afterwards by pandoc:

 

In lines 42 and 48, you have what pandoc call are "line blocks". Those are lines beginning with a vertical bar ( | ) followed by a space. Those spaces between vertical bar and command will be keep in man page converted text by pandoc.

Everything else in that README.md is good classic markdown format.

Let guess we have written our entire README.md and we want to perform conversion. To do that you can use an script like this from a temporal folder:

 

Lines 3 and 4 clean any file used in a previous conversion, while line 5 actually copies README.md from source folder.

Line 6 removes every GitHub badge you could have in your README.md. 

Line 7 is where we call pandoc to perform conversion.

In that point you could have a file that can be opened with man:

dante@Camelot:~/$ man -l man/cifra.1

But man pages use to be compressed in gzip format, as you can see with your system man pages:

dante@Camelot:~/$ ls /usr/share/man/man1

That's why script compress at line 8 your generated man page. And while generated man page is now a compressed gzip file, it is still readable by man:

dante@Camelot:~/$ man -l man/cifra.1.gz

Although those are some steps, you can see they are easily scriptable both as a local script, as an step of your continuous integration workflow.

As you can see, with these easy steps you only need to keep updated your README.md file because man page can be generated from it.