28 December 2013

Documenting code with Sphinx

When writing code that will be used by others is critical to document it properly . This will help the users to get the most of our code and understand the design choices taken.

However, documenting code can be a tedious task. It is difficult to get programmers comment properly their source code so it's almost impossible to get them keep updated their code and a separate document with explanations about using functions and objects. Therefore, there are many tools to generate code from the comments captured in the very source code . Thus programmers only have to update information in one place ( the source) which facilitates keeping it updated .

In Python world , one of the most used tools for these purposes is Sphinx , which generates a documentation similar to the Python documentation format (in fact this one has also been generated with Sphinx) . To generate the documentation , Sphinx inspect the source code directory, detected all the packages , modules, classes and functions placed there, and associates their respective docstrings to each of them.

A docstring is a comment that is placed in the first line of the module, function or class it comments . Is enclosed between two groups of three double quotes ( "" "... " "") and unlike "programmer" comments  (initiated with a hash #), which are simply ignored by the Python interpreter , the docstrings are stored in the attribute __ doc__ of commented element (remember that even Python functions are objects which may have their own attributes). While the developer comments are intended to be read by those who will directly modify the code, the docstring are more geared to those who will make use of the code. So while the focus of developer comments is to explain why the code programmed in a certain way , the docstring is often used to explain how to use each module, function or class. The PEP- 8 provides good practice writing Python code and redirects to PEP- 257 for the specific case of docstrings .

Actually , there is no single way to write docstrings and even following good practice the topic is wide open . Google recommends a format for docstrings in which clarity prevails for direct reading, however if you want to generate documentation automatically with Sphinx you must follow a format that is perhaps not so clear at first glance, but that uses tags identified by Sphinx to build documentation based on docstrings .

Let's begin explaining how to install Sphinx . The source code of the program can be obtained from the repository at Bitbucket. From there you can download the program in a single compressed package. It is too in the official Ubuntu repositories with the name " python- sphinx", so to install it is enough to do a simple:
dante@Camelot:~$ sudo aptitude install python-sphinx

In Windows, it is best to use pip or easy_install (it is included in setuptools) since Sphinx is in the official pypi repository.

Once installed, Sphinx is used with a few commands. To begin, we must generate the work directory for Sphinx. Let's follow an example where our source files are in the folder /pyalgorithm, there I've stores a library called PyAlgorithm that is being developed by a friend, so that the contents of the folder is as follows:
dante@Camelot:~/pyalgorithm$ ls __init__.py  __init__.pyc  __pycache__  checkers  exceptions.py  exceptions.pyc  multiprocessing  stats  time

All items except those with extension are directories with source code.

From there we will launch the wizard for initial configuration for our project Sphinx, sphinx-quickstart:
dante@Camelot:~/pyalgorithm$ sphinx-quickstart Welcome to the Sphinx 1.1.3 quickstart utility. Please enter values for the following settings (just press Enter to accept a default value, if one is given in brackets). Enter the root path for documentation. > Root path for the documentation [.]: docs                                                                                                                                                                                                                   You have two options for placing the build directory for Sphinx output.                                                                                                                                           Either, you use a directory "_build" within the root path, or you separate                                                                                                                                       "source" and "build" directories within the root path.                                                                                                                                                           > Separate source and build directories (y/N) [n]:                                                                                                                                                                                                                                                                                                                                                                                 Inside the root directory, two more directories will be created; "_templates"                                                                                                                                     for custom HTML templates and "_static" for custom stylesheets and other static                                                                                                                                   files. You can enter another prefix (such as ".") to replace the underscore.                                                                                                                                     > Name prefix for templates and static dir [_]:                                                                                                                                                                                                                                                                                                                                                                                     The project name will occur in several places in the built documentation.                                                                                                                                         > Project name: PyAlgorithm                                                                                                                                                                                   > Author name(s): Nobody                                                                                                                                                                                     Sphinx has the notion of a "version" and a "release" for the software. Each version can have multiple releases. For example, for Python the version is something like 2.5 or 3.0, while the release is something like 2.5.1 or 3.0a1.  If you don't need this dual structure, just set both to the same value. > Project version: 1 > Project release [1]: The file name suffix for source files. Commonly, this is either ".txt" or ".rst".  Only files with this suffix are considered documents. > Source file suffix [.rst]: One document is special in that it is considered the top node of the "contents tree", that is, it is the root of the hierarchical structure of the documents. Normally, this is "index", but if your "index" document is a custom template, you can also set this to another filename. > Name of your master document (without suffix) [index]: Sphinx can also add configuration for epub output: > Do you want to use the epub builder (y/N) [n]: Please indicate if you want to use one of the following Sphinx extensions: > autodoc: automatically insert docstrings from modules (y/N) [n]: y > doctest: automatically test code snippets in doctest blocks (y/N) [n]: > intersphinx: link between Sphinx documentation of different projects (y/N) [n]: > todo: write "todo" entries that can be shown or hidden on build (y/N) [n]: > coverage: checks for documentation coverage (y/N) [n]: > pngmath: include math, rendered as PNG images (y/N) [n]: > mathjax: include math, rendered in the browser by MathJax (y/N) [n]: > ifconfig: conditional inclusion of content based on config values (y/N) [n]: > viewcode: include links to the source code of documented Python objects (y/N) [n]: y A Makefile and a Windows command file can be generated for you so that you only have to run e.g. `make html' instead of invoking sphinx-build directly. > Create Makefile? (Y/n) [y]: > Create Windows command file? (Y/n) [y]: Creating file docs/conf.py. Creating file docs/index.rst. Creating file docs/Makefile. Creating file docs/make.bat. Finished: An initial directory structure has been created. You should now populate your master file docs/index.rst and create other documentation source files. Use the Makefile to build the docs, like so:    make builder where "builder" is one of the supported builders, e.g. html, latex or linkcheck. dante@Camelot:~/pyalgorithm$

As you can see the wizard is very simple and most of the time we leave the default options just with a couple of notable exceptions:
  1. If you do not want our documentation files get mixed with source code you must answer the question "Root path for the documentation [.]" With the directory you want to be created, inside /pyalgorithm, to store Sphinx work files and documentation generated.
  2. On the other hand, we must be sure to answer "Yes" to the question "doc: automatically insert docstrings from modules (y / N) [n]:" since autodoc is actualle the extension that Sphinx  uses to assess docstrings and generate documentation.
We see that the wizard created in the docs directory the next files:
dante@Camelot:~/source$ ls __init__.py  __init__.pyc  __pycache__  checkers  docs  exceptions.py  exceptions.pyc  multiprocessing  stats  time

And 4 files inside it :

  • conf.py : It is a python file where the configuration of Sphinx is stored. If you ever wanted to change some of the choices you made with sphinx-quickstart you'd do it here. For example, if we want to change the version numbers we would set variables release and version of that file . Actually, the only thing we need to change in that file is uncomment a line at the top that says " sys.path.insert (0, os.path.abspath () '. ')" . That line serves to Sphinx to find the elements of our source code. We have two options: putting the path to your source code directory on Sphinx (in our example docs/ ) or putting the absolute path. In the first case, in our example the file is located in /pyalgorithm/docs/conf.py so we would put : sys.path.insert (0, os.path.abspath ( ' .. / .. /)) . If we want to use the absolute path we would have to remove the part of abspath () and put the path directly: sys.path.insert (0, " path to conf.py ").
  • Makefile : When generating documentation Sphinx uses make. This Makefile has what makes needs to render documents in multiple formats : html, pdf, latex, epub , etc. . Thus, to generate documentation in html simply do: make html. The documentation generated is stored in the build directory, inside the Sphinx .
  • make.bat : make executable for Windows users.
  • index.rst : Main template of the generated documentation. It is based in reStructuredText format. This format allows you to create templates that serve Sphinx to structure information extracted from the source files .

If at this point you run "make html" you will see that some html files are generated, but if we open the index file we would see it is empty. This is because Sphinx expects to find a file . rst for each of the packages you want to document. We can create them by hand, but it is more convenient to use the sphinx-apidoc command, which examines the source directory and creates a file . rst for each packet it detects. Its use is very simple, in our case it would suffice to do :
dante@Camelot:~/pyalgorithm$ sphinx-apidoc -o docs .
The -o option sets the directory in which the .rst file is to be created, and next parameter is the directory we want sphinx-apidoc start looking. In our example the following files are generated:
dante@Camelot:~/pyalgorithm$ ls *.rst index.rst pyalgorithm.checkers.rst pyalgorithm.rst pyalgorithm.time.rst modules.rst  pyalgorithm.multiprocessing.rst  pyalgorithm.stats.rst
With these files we could do "make html" and see that many html files are generated (actually one per file .rst). However, the file index.html continue showing an empty page. That's because we have to configure it to include links to other pages. In our case, the documentation refers to a library whose root is pyalgorithm, so in index.rst we do include reference to it:

Welcome to PyAlgorithm's documentation!
=======================================
Contents:
.. toctree::
   :maxdepth: 3

   pyalgorithm
When Sphinx finds that reference, it will search pyalgorithm.rst file and will render its content in index.html, doing the same recursively with all the references found pyalgorithm.rst and dependant .rst files. Sphinx will stop rendering when maximum depth (defined by parameter "maxdepth" is reached). Following our example, we have seen the content of index.rst, if you define a maxdepth 3 in index.rst, and the content of pyalgorithm.rst is:

pyalgorithm package
===================
Subpackages
-----------
.. toctree::
    pyalgorithm.checkers
    pyalgorithm.multiprocessing
    pyalgorithm.stats
    pyalgorithm.time
pyalgorithm.exceptions module
-----------------------------
.. automodule:: pyalgorithm.exceptions
    :members:
    :undoc-members:
    :show-inheritance:
 Rendered pages would be:



Whereas if we set it to 4, Sphinx would assess al .rst files listed in pyalgorithm.rst so rendered page would be:


You can set the level you want. Once the maximum level is reached you have to click on each link to see additional sublevels that have included.

The .rst files that refer to packages with modules feature a series of tags Sphinx to which you should pay special attention, let's see an example:

pyalgorithm.checkers package
============================
pyalgorithm.checkers.input_decorators module
--------------------------------------------
.. automodule:: pyalgorithm.checkers.input_decorators
    :members:
    :undoc-members:
    :show-inheritance:

This .rst file generated by sphinx-apidoc refers to a package ("pyalgorithm.checkers ") in which modules have been detected (in this case "input_decorators") . The .rst file explains how to generate the documentation associated with that module. The tag automodule belongs to labels used by autodoc Sphinx extension (along with labels autofunction and autoclass ) and tells Sphinx to document the module taking its docstring usually placed at the beginning of the element they are commenting. The autoclass and autofunction labels do the same but at the class and function respectively. The members tag makes autodoc to include in the documentation the "public" members of the element , ie those attributes whose name is preceded by an underscore ( "_") , while "undoc-members" asks to include also in documentation the elements laking a docstring. If we wanted to show private items (those that start with " _" or " __" ) we should use the label "private- members" . If you only want to show the public members you will find that class constructors are not shown, so to make them appear in the documentation you must include the label "special- members: __ init__ " next to the "members" . As for the "show- inheritance" label what it does is to show the list of base classes from which it inherits,  placing that list under the class declaration . These are labels that includes sphinx-apidoc, but there are many more . We are free to include more labels and formatting options to make our documentation show the information we want in the format that we like. We can also change the text files rst since these are mere templates, what we put in them will be added to automatically generate Sphinx after doing the " make".

Labels can also include Sphinx docstrings for help to distinguish which refers each text element. Let's see how a function would be documented:

 def apply_async(self, func, args):
        """Insert a function and its arguments in process pool.
   
        Input is inserted in queues using a round-robin fashion. Every job is
        identified by and index that is returned by function. Not all parameters
        of original multiprocessing.Pool.apply_aync are implemented so far.
   
        :param func: Function to process.
        :type func: Callable.
        :param args: Arguments for the function to process.
        :type args: Tuple.
        :returns: Assigned job id.
        :rtype: Int.
        """ 
Sphinxs tags used in last example are:
  • param : Identifies an argument in the function declaration.
  • type : Expected argument type.
  • returns: What function returns.
  • rtype: Expected returned value type.
Last docstring would render in this way:


 I do not want to finish without commenting a problem that I've found when using Sphinx on a directory with folders with unittest scripts. In such cases I have had to move the folder out of the scope that examines Sphinx to allow this work smoothly. If not Sphinxs crash at build time with an exception. This is only a rather crude workaround, if someone finds a more elegant solution will be happy to hear it.

20 October 2013

Creating virtual laboratories with Netkit (II)

In my previous article about Netkit, we saw a simple example of its ability to simulate a network with multiple devices. The possibilities are almost endless, but we do need some kind of automation if we want to simulate complex topologies, with many nodes and especially if we pass these topologies to other researchers to experiment with them. Netkit provides this level of automation by laboratories, directory structures containing files to configure each of the nodes automatically. The good thing is that we can compress this directory tree and pass it to other researchers who can boot the lab and have all nodes configured and running with just a single command. In addition, this compressed file is really little since the configuration files on each node are text only.

We are going to prepare an example of a laboratory to simulate a scenario in which Alice is connected to the Internet from the switched network in your organization. What she doesn't know is that an intruder has gained access to the network and intends to launch a arp-spoofing attack against Alice to find out which internet pages she visits.


First of all we set Netkit to work properly in our operating system in  laboratory mode. First step is setting global variables used by Netkit. In the previous article we configured those variables correctly, problem is that when you start a laboratory in which one of the computers is to be connected to the Internet you have to use sudo which, for security, ignores most of the global variables, including those that we configured and used as normal users. A workaround is to configure sudo to not ignore Netkit variables by editing /etc/sudoers. In the "Default" of this file we must write:

Defaults:dante env_keep+="NETKIT_HOME", env_keep+="MANPATH"

Of course, instead of dante your own username must be used. You should set PATH and common sense says that you should be able to configure the sudoers as the other two ... the problem is that  sudo does not work that way and continues to ignore the user's PATH although we we configure in sudoers that way. It is something that is already reported in the Ubuntu launchpad as a bug long dragged. The solution is to create an alias that includes the user PATH every time we use sudo. To do this edit the file .bash_aliases in our home directory and write:

#PToNETKIT.

alias sudo="sudo env PATH=$PATH"



With that Netkit has all the necessary variables, whether we use sudo as if we run it as a normal user.

Besides, version 2.6 has a bug that prevents the virtual machine that acts as a gateway to Internet to run properly, the solution is a patch that one of the authors of the application wrote on the mailing list:

========================================================
diff -Naur netkit-old/bin/script_utils netkit-new/bin/script_utils
--- netkit-old/bin/script_utils 2007-12-19 10:55:58.000000000 +0100
+++ netkit-new/bin/script_utils 2008-02-02 12:48:46.000000000 +0100
@@ -317,8 +317,8 @@
# This function starts all the hubs inside a given list
runHubs() {
local HUB_NAME BASE_HUB_NAME ACTUAL_HUB_NAME TAP_ADDRESS GUEST_ADDRESS
- HUB_NAME="$1"
while [ $# -gt 0 ]; do
+ HUB_NAME="$1"
BASE_HUB_NAME="`varReplace HUB_NAME \".*_\" \"\"`"
if [ "${BASE_HUB_NAME#tap${HUB_SOCKET_EXTENSION},}" !=
"$BASE_HUB_NAME" ]; then
# This is an Internet connected hub
@@ -328,7 +328,7 @@
startInetHub "$ACTUAL_HUB_NAME" "$TAP_ADDRESS" "$GUEST_ADDRESS"
else
# This is a normal hub
- startHub "$1"
+ startHub "$HUB_NAME"
fi
shift
done
========================================================
One way to apply this patch is to create a text file in $ NETKIT_HOME /bin/ called patch2_6_bug patch with the patch text and immediately do: 
dante@Hades:/usr/share/netkit/bin$ sudo patch script_utils patch2_6_bug patching file script_utils dante@Hades:/usr/share/netkit/bin$

This ends the worst part: Netkit preconfiguration . The good thing is that you just have to do it only once, from now on you will only have to worry about creating good laboratories .

To begin with ours, we have to create a folder to contain the directory tree and the configuration files. In that folder you create the main configuration file of the laboratory: lab.conf. We can store in that file all those parameters we would use with vstart to define a single machine . We can define how many interfaces have every node and to which collision domain would belong each one ( we must remember that, unlike hubs , switches define a different collision domain for each of its ports ). Another element that can be defined is the RAM used by every machine. You can also include information to identify the laboratory for its author, a , version, etc.. In our example, lab.conf file is as follows:

dante@Hades:~/netkit_labs/lab_sniffing_sw$ cat lab.conf LAB_DESCRIPTION="Laboratorio para simular un ataque de ARP-Spoofing" LAB_VERSION="0.1" LAB_AUTHOR="Dante" LAB_EMAIL="dante.signal31@gmail.com" LAB_WEB="http://danteslab.blogspot.com/" PC-Alice[mem]=100 PC-Sniffer[mem]=100 Router[mem]=100 Switch[mem]=100 PC-Alice[0]=CD-A Switch[1]=CD-A PC-Sniffer[0]=CD-C Switch[2]=CD-C Router[1]=CD-B Switch[0]=CD-B Router[0]=tap,

The last line sets the eth0 interface of the router as tap. That is equivalent to establish a point to point line between the virtual machine Router (192.168.10.2) and our real PC (192.168.10.1) through which your router can go to the Internet using your PC as a gateway. In fact, when we start the lab if we do a ifconfig on your PC you will see that it has created an interface called nk_tap_root with IP 192.168.10.1. As the default route is concerned, Netkit puts a static route on Router pointing to 192.168.10.1. This allows Router go to Internet normally but if we also want to let the rest of the virtual network PCs go to Internet (using Router and real PC as gateways) we add a path in our real PC to the virtual network:

dante@Hades:~$ sudo route add -net 192.168.0.0 netmask 255.255.255.0 gw 192.168.10.2

Now, inside the lab folder of the lab we are going to create a subdirectory for each virtual machine to be used.
dante@Hades:~/netkit_labs$ mkdir lab_sniffing_sw dante@Hades:~/netkit_labs$ cd lab_sniffing_sw/ dante@Hades:~/netkit_labs/lab_sniffing_sw$ mkdir Routerdante@Hades:~/netkit_labs/lab_sniffing_sw$ mkdir Switch dante@Hades:~/netkit_labs/lab_sniffing_sw$ mkdir PC-Alice dante@Hades:~/netkit_labs/lab_sniffing_sw$ mkdir PC-Snifferdante@Hades:~/netkit_labs/lab_sniffing_sw$ ls PC-Alice PC-Sniffer Router dante@Hades:~/netkit_labs/lab_sniffing_sw$

The subdirectories of the virtual machines can stay empty or be used to deposit files to make then appear in the virtual machine. For example if we wanted to Alice PC-X to have the script in /usr/bin, we would create the folder "/netkit_labs/lab_sniffing_sw/PC-Alice/usr/bin" and we wold leave there a copy of the script. In our case what we qould put in the /etc/ from each team the necessary files for configuration of network resources properly.

For the case of Alice, be as follows:
dante@Hades:~/netkit_labs/lab_sniffing_sw$ cat ./PC-Alice/etc/network/interfaces auto lo eth0 iface lo inet loopback address 127.0.0.1 netmask 255.0.0.0 iface eth0 inet static address 192.168.0.2 netmask 255.255.255.0 gateway 192.168.0.1

The PC-Sniffer will have the following network configuration:
dante@Hades:~/netkit_labs/lab_sniffing_sw $ cat ./PC-Sniffer/etc/network/interfaces auto lo eth0 iface lo inet loopback address 127.0.0.1 netmask 255.0.0.0 iface eth0 inet static address 192.168.0.3 netmask 255.255.255.0 gateway 192.168.0.1
The router will have the following network configuration:
dante@Hades:~/netkit_labs/lab_sniffing_sw $cat ./Router/etc/network/interfaces auto lo eth1 iface lo inet loopback address 127.0.0.1 netmask 255.0.0.0 iface eth1 inet static address 192.168.0.1 netmask 255.255.255.0

Also, if we want our virtual machines perform DNS resolutions (essential if we install packages with apt or browse using lynx), we have to include a file /etc/resolv.conf on each of the virtual machines like hicimo s with /etc/network/interfaces. A copy of the contents of the file /etc/resolv.conf of our real PC will be enough.

The final step is to configure the switch and decide which services are to be started on each virtual machine. For that, Netkit lets define which commands are run during startup and shutdown of virtual machines using the startup and shutdown scripts respectively. For our lab we will create the following scripts:
dante@Hades:~/netkit_labs/lab_sniffing_sw$ cat Switch.startup ifconfig eth0 up ifconfig eth1 up ifconfig eth2 up brctl addbr br0 brctl addif br0 eth0 brctl addif br0 eth1 brctl addif br0 eth2 brctl stp br0 on ifconfig br0 up

This script configures a bridge after Switch machine boot by incorporating the bridge ports eth0, eth1 and eth2 and also activating the spanning-tree (unnecessary in this case because there is only one switch but it is a habit we should have) .

Startup commands of the rest of the machines are easier because all they do is start the network service:
dante@Hades:~/netkit_labs/lab_sniffing_sw$ cat Router.startup /etc/init.d/networking start dante@Hades:~/netkit_labs/lab_sniffing_sw$ cat PC-Alice.startup /etc/init.d/networking start dante@Hades:~/netkit_labs/lab_sniffing_sw $ cat PC-Sniffer.startup /etc/init.d/networking start

This example is very simple, but thanks to these scripts we can do that a laboratory configures itself without the end user having to do anything. In fact, the best is to configure the IP addresses of the interfaces and routes through these scripts instead of copying the configuration files in the /etc of each machine, but taking that detour gave me the opportunity to explain how to input files in the directory tree of virtual machines.

Now, finally, it's time to start our laboratory. The laboratories are started with the order lstart and stopped with lhalt. Let's see what happens in our case:
dante@Hades:~/netkit_labs/lab_sniffing_sw$ sudo lstart [sudo] password for dante: 033[1m======================== Starting lab ===========================033[0m Lab directory: /home/dante/netkit_labs/lab_sn iffing_sw Version: 0.1 Author: Dante Email: dante.signal31@gmail.com Web: http://danteslab.blogspot.com/ Description: Laboratorio para simular un ataque de ARP-Spoofing 033[1m=================================================================033[0m 033[1mStarting "PC-Alice" with options "-q --mem=100 --eth0 CD-A --hostlab=/home/dante/netkit_labs/lab_sniffing_sw --hostwd=/ho me/dante/netkit_labs/lab_sniffing_sw"... 033[0m 033[1mStarting "PC-Sniffer" with options "-q --mem=100 --eth0 CD-C --hostlab=/home/dante/netkit_labs/lab_sniffing_sw --hostwd=/hom e/dante/netkit_labs/lab_sniffing_sw"... 033[0m 033[1mStarting "Router" with options "-q --mem=100 --eth1 CD-B --eth0 tap,192.168.10.1,192.168.10.2 --hostlab=/home/dante/netkit_labs/lab_sniffing_sw --hostwd=/home/dante/netkit_labs/lab_sniffing_sw"... 033[0m 033[1mStarting "Switch" with options "-q --mem=100 --eth1 CD-A --eth2 CD-C --eth0 CD-B --hostlab=/home/dante/netkit_labs/lab_sniffing_sw --hostwd=/home/dante/netkit_labs/lab_sniffing_sw"... 033[0m 033[1mThe lab has been started.033[0m 033[1m===================== ============================================033[0m dante@Hades:~/netkit_labs/lab_sniffing_sw$

To check that the resulting network works we can try surfing the Internet with lynx from PC-Alice:


Now that we have checked that our network works we can give our laboratory an use performing the experiment mentioned at the beginning of the article: An ARP -Spoofing attack against Alice from PC - Sniffer .

As you can see in the photo above, PC - Sniffer has activated a tcpdump , but only see the traffic spanning tree (STP) issued by the switch. Unlike previous article network in which I simulated a hub, this switch does not replicate traffic on all ports but only on the port that connects the destination of the data.

To spy traffic between Router and PC- Alice we will have to place ourselves between them making them believe they talk each other when actually they do with PC- Sniffer .

We will use Ettercap tool, which is installed by default in Debian that brings Netkit (if not, you could install it with a simple " #aptitude install ettercap" , as we would in our real PC) . Ettercap allows interception on switched networks . It really is a little wonder that deserves an article to itself . Here we will use Ettercap from PC-sniffer to spy Alice web traffic . To activate an ARP -Spoofing attack (also called ARP- Poisoning) with Ettercap the order would be:

PC-Sniffer# ettercap -M arp:remote -T /192.168.0.1/ /192.168.0.2/

Immediately PC-Sniffer screen start draining the contents of the data exchanged between PC-Alice and her gateway, Router, while browsing the Internet:


To stop the lab you just have to do "sudo lhalt " from the root directory of the lab (where we formerly launched  lstart ) . If we wanted to relaunch the laboratory we must take into account the need to put back the path to the virtual network which disappeared when lhalt disabled the tap interface. On the other hand, lstart created .disk files in the lab folder to not to lose the programs you have installed on virtual machines or any other changes we have made. That means that the folders where we placed configuration files are no longer read so if you want to put new files there first you have to delete the .disk files and do a Netkit lstart to re-read those folders. There is another useful option to transfer files to and from virtual machines using the /hosthome folder that there is in all virtual machines which mounts the user's home that launched the Netkit .

In another article I will explain how Ettercap works and other interception techniques . This article  served to demonstrate the utility of Netkit to simulate the complex networks that make computer security experiments possible with a considerable saving in both effort and money and space. It is also an invaluable aid for those who intend to teach the art of computer security, allowing us to offer our readers preconfigured laboratories to let student focus exclusively on the techniques explained without having to waste time setting up a test network on their own. Thats why, my further articles will include links to Netkit laboratories especially designed to train in a way fast and simple what is explained in them.

12 October 2013

Creating virtual laboratories with Netkit (I)

Progress in the field of security engineering requires constant learning in multiple disciplines . This learning is largely theoretical but to be fully effective need to be implemented. A security engineer should be able to put himself in an attacker boots and foresee with reasonable certainty what could be his next step . But this is difficult because an engineer can not go around attacking networks with the excuse that he is learning how bad guys work .

Until recently, the only option available for students of security was to set up a laboratory at home collecting low-cost computers and network equipment . Unfortunately this was expensive , consuming an increasingly scarce space in modern homes and placed your partner / spouse / father / mother against you because they didn´t understand the utility of that. Fortunately , the era of virtualization came to the rescue so today is possible to assemble complex virtual laboratories within our computer.

The easiest option are VMWare or VirtualBox, which are ideal for testing tools, rootkit, vulnerability, etc. on different operating systems without endangering our own computer. Starting several of these virtual machines can test simulating a LAN network. Even there are tutorials to simulate more complex topologies using VMWare. However, at that point carry such tools may require consume an amount of computer resources that may not count on.

The other option is called Netkit , developed at the University of Rome. The focus is not so much the specific equipment emulation but complete networks . Netkit allows us to define a network topology and test with it. To do that we would begin a series of nodes that are only light Debian Linux virtual instances and we would configure them as the role they play within the network we want to simulate ( router , switch or end device ) .

Installation is very simple . First, download three files :


Once downloaded must be unzipped all together in the directory of your choice ( we assume that /usr/share/netkit ) ( see update at end of article ) and then prepare a few environment variables to record where we installed the Netkit . To do this, it is best to add at the end of /etc /profile the following lines :
NETKIT_HOME="/usr/share/netkit"
MANPATH="$MANPATH:$NETKIT_HOME/man"
PATH="$PATH:$NETKIT_HOME/bin"
export NETKIT_HOME MANPATH PATH
Once this is done you have to restart the computer to run the environment variables you just configured. And that it is, to check if anything fails test run a script that includes netkit:
dante@Hades:/usr/share/netkit$ ./check_configuration.sh > Checking path correctness... passed. > Checking environment... passed. > Checking for availability of man pages... passed. > Checking for proper directories in the PATH... passed. > Checking for availability of auxiliary tools: awk : ok basename : ok date : ok dirname : ok find : ok getopt : ok grep : ok head : ok id : ok kill : ok ls : ok lsof : ok ps : ok readlink : ok wc : ok port-helper : ok tunctl : ok uml_mconsole : ok uml_switch : ok passed. > Checking for availability of terminal emulator applications: xterm : found konsole : found gnome-terminal : not found passed. [ READY ] Congratulations! Your Netkit setup is now complete! Enjoy Netkit!

At this point begins the really interesting.

Let's start two virtual machines connected to the same collision domain (it is like they were connected to the same hub) and we will do ping between them:
dante@Hades:~/netkit_labs$ vstart --eth0=CD-A -M 100 PC-1 dante@Hades:~/netkit_labs$ vstart --eth0=CD-A -M 100 PC-2 dante@Hades:~/netkit_labs$ ls -l total 1136 -rw-r--r-- 1 dante dante 1074012160 2008-09-11 23:26 PC-1.disk -rw-r--r-- 1 dante dante 470 2008-09-11 23:25 PC-1.log -rw-r--r-- 1 dante dante 1074012160 2008-09-11 23:26 PC-2.disk -rw-r--r-- 1 dante dante 470 2008-09-11 23:26 PC-2.log

With the --eth0 parameter that interface is assigned to CD-A collision domain (we call them as we wish), and with the --M one  we are gping to give 100 MB of RAM to the machine. You can see that Netkit creates a .disk file for each virtual machine created, that disk is where you keep your file system. Each file system uses default 1'1GB so you better make sure you have free disk space before starting an experiment with many machines.

By doing vstart each xterm window appears with the virtual machine console that has been launched.


Here is where we are going to configure every machine:
PC-1:~# ifconfig eth0 192.168.0.1 netmask 255.255.255.0 up PC-2:~# ifconfig eth0 192.168.0.2 netmask 255.255.255.0 up

If now we ping between the two machines you  can see their reply .

Now let's do a simple experiment, so they do not say that this article has not been addressed security. Let's go back to over 10 years ago , when networks were based largely on hubs , ie when all the PCs on the network are connected to the same collision domain. In those days it was very easy to eavesdrop ( sniffing ) as hubs replicate everything they received from one port to other ports so if someone put the interface in promiscuous mode could prevent his network card to drop off packets not addressed to it and display them in a program like tcpdump. To simulate what an attacker would have done in those days we are going to create your PC in the same way as above but calling PC-3 and assigning the IP address 192.168.0.3 . This would be like connecting the PC the hub of spied victims. Now you start a tcpdump in PC -3 and see if we can capture the ping we are launching from PC-2 to PC-1:


As you can see from the picture we gave time to PC-2 to send two ping PC-1 and this has answered correctly. And what PC-3 has seen?, answer is all: the arp of PC-2, the response of PC-1, the ICMP-request of PC-2 (round pings) and ICMP-reply of PC-1 (pings back). In fact PC-3 has heard the whole "conversation" between PC-2 and PC-1. If this had not been a mere ping but a telnet session, PC-3 might have heard of the username and password of the person you passed from PC-2 to PC-1. Hence the importance of encrypting SSH terminal sessions.

To shut down the virtual machines, you can make a halt from within each of them or with vhalt from our actual command line:
dante@Hades:~/netkit_labs$ vhalt PC-1 Halting virtual machine "PC-1" (PID 29271) owned by dante [.... ] dante@Hades:~/netkit_labs$ vhalt PC-2 Halting virtual machine "PC-2" (PID 30824) owned by dante [.... ] dante@Hades:~/netkit_labs$ vhalt PC-3 Halting virtual machine "PC-3" (PID 28568) owned by dante [.... ] dante@Hades:~/netkit_labs$

If you would like to restart the machines then we should only use the command vstart (provided you have not deleted the .disk file) . For example, to boot PC-1 we would do: vstart PC-1

Well, not bad for a start, we have assembled an experiment with three PCs and a hub without having any of the four things ... Could not ask for more , does it? . In my next (or next ones ) article I'm going to delve into Netkit possibilities to develop more complex and interesting experiments .

Update 2010-01-01 : In his new version, Netkit has changed the file system ( Debian image that is loaded with each of the virtual machines ) to a 10 GB filesystem. Fortunately , if we use partitions with serious filesystems ( ext , ReiserFS , XFS , JFS and NTFS ) not to worry because it will treated as a sparse file and will not take even a fraction of that size. The only thing you should do is extract all files downloaded from the web of Netkit in /user/share/netkit using the command: sudo tar - xjSf being the S precisely to treat stracted files as sparse ones.