Separation Anxiety: A Tutorial for Isolating Your System with Linux Namespaces

With the advent of tools like Docker, Linux Containers, and others, it has become super easy to isolate Linux processes into their own little system environments. This makes it possible to run a whole range of applications on a single real Linux machine and ensure no two of them can interfere with each other, without having to resort to using virtual machines. These tools have been a huge boon to PaaS providers. But what exactly happens under the hood?

These tools rely on a number of features and components of the Linux kernel. Some of these features were introduced fairly recently, while others still require you to patch the kernel itself. But one of the key components, using Linux namespaces, has been a feature of Linux since version 2.6.24 was released in 2008.

Anyone familiar with chroot already has a basic idea of what Linux namespaces can do and how to use namespace generally. Just as chroot allows processes to see any arbitrary directory as the root of the system (independent of the rest of the processes), Linux namespaces allow other aspects of the operating system to be independently modified as well. This includes the process tree, networking interfaces, mount points, inter-process communication resources and more.

Why Use Namespaces for Process Isolation?

In a single-user computer, a single system environment may be fine. But on a server, where you want to run multiple services, it is essential to security and stability that the services are as isolated from each other as possible. Imagine a server running multiple services, one of which gets compromised by an intruder. In such a case, the intruder may be able to exploit that service and work his way to the other services, and may even be able compromise the entire server. Namespace isolation can provide a secure environment to eliminate this risk.

For example, using namespacing, it is possible to safely execute arbitrary or unknown programs on your server. Recently, there has been a growing number of programming contest and “hackathon” platforms, such as HackerRank, TopCoder, Codeforces, and many more. A lot of them utilize automated pipelines to run and validate programs that are submitted by the contestants. It is often impossible to know in advance the true nature of contestants’ programs, and some may even contain malicious elements. By running these programs namespaced in complete isolation from the rest of the system, the software can be tested and validated without putting the rest of the machine at risk. Similarly, online continuous integration services, such asDrone.io, automatically fetch your code repository and execute the test scripts on their own servers. Again, namespace isolation is what makes it possible to provide these services safely.

Namespacing tools like Docker also allow better control over processes’ use of system resources, making such tools extremely popular for use by PaaS providers. Services like Heroku and Google App Engine use such tools to isolate and run multiple web server applications on the same real hardware. These tools allow

them to run each application (which may have been deployed by any of a number of different users) without worrying about one of them using too many system resources, or interfering and/or conflicting with other deployed services on the same machine. With such process isolation, it is even possible to have entirely different stacks of dependency softwares (and versions) for each isolated environment!

If you’ve used tools like Docker, you already know that these tools are capable of isolating processes in small “containers”. Running processes in Docker containers is like running them in virtual machines, only these containers are significantly lighter than virtual machines. A virtual machine typically emulates a hardware layer on top of your operating system, and then runs another operating system on top of that. This allows you to run processes inside a virtual machine, in complete isolation from your real operating system. But virtual machines are heavy! Docker containers, on the other hand, use some key features of your real operating system, including namespaces, and ensure a similar level of isolation, but without emulating the hardware and running yet another operating system on the same machine. This makes them very lightweight.

Process Namespace

Historically, the Linux kernel has maintained a single process tree. The tree contains a reference to every process currently running in a parent-child hierarchy. A process, given it has sufficient privileges and satisfies certain conditions, can inspect another process by attaching a tracer to it or may even be able to kill it.

With the introduction of Linux namespaces, it became possible to have multiple “nested” process trees. Each process tree can have an entirely isolated set of processes. This can ensure that processes belonging to one

process tree cannot inspect or kill – in fact cannot even know of the existence of – processes in other sibling or parent process trees.

Every time a computer with Linux boots up, it starts with just one process, with process identifier (PID) 1. This process is the root of the process tree, and it initiates the rest of the system by performing the appropriate maintenance work and starting the correct daemons/services. All the other processes start below this process in the tree. The PID namespace allows one to spin off a new tree, with its own PID 1 process. The process that does this remains in the parent namespace, in the original tree, but makes the child the root of its own process tree.

With PID namespace isolation, processes in the child namespace have no way of knowing of the parent process’s existence. However, processes in the parent namespace have a complete view of processes in the child namespace, as if they were any other process in the parent namespace.

This namespace tutorial outlines the separation of various process trees using namespace systems in Linux.

It is possible to create a nested set of child namespaces: one process starts a child process in a new PID namespace, and that child process spawns yet another process in a new PID namespace, and so on.

With the introduction of PID namespaces, a single process can now have multiple PIDs associated with it, one for each namespace it falls under. In the Linux source code, we can see that a struct named pid, which used to keep track of just a single PID, now tracks multiple PIDs through the use of a struct named upid:

To create a new PID namespace, one must call the clone() system call with a special flag CLONE_NEWPID. (C provides a wrapper to expose this system call, and so do many other popular languages.) Whereas the other namespaces discussed below can also be created using the unshare() system call, a PID namespace can only be created at the time a new process is spawned using clone(). Once clone() is called with this flag,

the new process immediately starts in a new PID namespace, under a new process tree.

This can be demonstrated with a simple C program:

Compile and run this program with root privileges and you will notice an output that resembles this:

The PID, as printed from within the child_fn, will be 1.

Even though this namespace tutorial code above is not much longer than “Hello, world” in some languages, a lot has happened behind the scenes. The clone() function, as you would expect, has created a new process by cloning the current one and started execution at the beginning of the child_fn() function. However, while doing so, it detached the new process from the original process tree and created a separate process tree for the new process.

Try replacing the static int child_fn() function with the following, to print the parent PID from the isolated process’s perspective:

Running the program this time yields the following output:

Notice how the parent PID from the isolated process’s perspective is 0, indicating no parent. Try running the same program again, but this time, remove the CLONE_NEWPID flag from within the clone() function call:

This time, you will notice that the parent PID is no longer 0:

However, this is just the first step in our tutorial. These processes still have unrestricted access to other common or shared resources. For example, the networking interface: if the child process created above were to listen on port 80, it would prevent every other process on the system from being able to listen on it.

Linux Network Namespace

This is where a network namespace becomes useful. A network namespace allows each of these processes to see an entirely different set of networking interfaces. Even the loopback interface is different for each network namespace.

Isolating a process into its own network namespace involves introducing another flag to the clone() function call: CLONE_NEWNET;

Output:

What’s going on here? The physical ethernet device enp4s0 belongs to the global network namespace, as indicated by the “ip” tool run from this namespace. However, the physical interface is not available in the new network namespace. Moreover, the loopback device is active in the original network namespace, but is “down” in the child network namespace.

In order to provide a usable network interface in the child namespace, it is necessary to set up additional “virtual” network interfaces which span multiple namespaces. Once that is done, it is then possible to create Ethernet bridges, and even route packets between the namespaces. Finally, to make the whole thing work, a “routing process” must be running in the global network namespace to receive traffic from the physical interface, and route it through the appropriate virtual interfaces to to the correct child network namespaces. Maybe you can see why tools like Docker, which do all this heavy lifting for you, are so popular!

Linux network namespace is comprised of a routing process to multiple child net namespaces.

To do this by hand, you can create a pair of virtual Ethernet connections between a parent and a child namespace by running a single command from the parent namespace:

Here, <pid> should be replaced by the process ID of the process in the child namespace as observed by the parent. Running this command establishes a pipe-like connection between these two namespaces. The parent namespace retains the veth0 device, and passes the veth1 device to the child namespace. Anything that enters one of the ends, comes out through the other end, just as you would expect from a real Ethernet connection between two real nodes. Accordingly, both sides of this virtual Ethernet connection must be assigned IP addresses.

Mount Namespace

Linux also maintains a data structure for all the mountpoints of the system. It includes information like what disk partitions are mounted, where they are mounted, whether they are readonly, et cetera. With Linux namespaces, one can have this data structure cloned, so that processes under different namespaces can change the mountpoints without affecting each other.

Creating separate mount namespace has an effect similar to doing a chroot(). chroot() is good, but it does not provide complete isolation, and its effects are restricted to the root mountpoint only. Creating a separate mount namespace allows each of these isolated processes to have a completely different view of the entire system’s mountpoint structure from the original one. This allows you to have a different root for each isolated process, as well as other mountpoints that are specific to those processes. Used with care per this tutorial, you can avoid exposing any information about the underlying system.

Learning how to use namespace correctly has multiple benefits as outlined in this namespace tutorial.

The clone() flag required to achieve this is CLONE_NEWNS:

Initially, the child process sees the exact same mountpoints as its parent process would. However, being under a new mount namespace, the child process can mount or unmount whatever endpoints it wants to, and the change will affect neither its parent’s namespace, nor any other mount namespace in the entire system. For example, if the parent process has a particular disk partition mounted at root, the isolated process will see the exact same disk partition mounted at the root in the beginning. But the benefit of isolating the mount namespace is apparent when the isolated process tries to change the root partition to something else, as the change will only affect the isolated mount namespace.

Interestingly, this actually makes it a bad idea to spawn the target child process directly with the CLONE_NEWNSflag. A better approach is to start a special “init” process with the CLONE_NEWNS flag, have that “init” process change the “/”, “/proc”, “/dev” or other mountpoints as desired, and then start the target process. This is discussed in a little more detail near the end of this namespace tutorial.

Other Namespaces

There are other namespaces that these processes can be isolated into, namely user, IPC, and UTS. The user namespace allows a process to have root privileges within the namespace, without giving it that access to processes outside of the namespace. Isolating a process by the IPC namespace gives it its own interprocess communication resources, for example, System V IPC and POSIX messages. The UTS namespace isolates two specific identifiers of the system: nodename and domainname.

A quick example to show how UTS namespace is isolated is shown below:

This program yields the following output:

Here, child_fn() prints the nodename, changes it to something else, and prints it again. Naturally, the change happens only inside the new UTS namespace.

More information on what all of the namespaces provide and isolate can be found in the tutorial here

Cross-Namespace Communication

Often it is necessary to establish some sort of communication between the parent and the child namespace. This might be for doing configuration work within an isolated environment, or it can simply be to retain the ability to peek into the condition of that environment from outside. One way of doing that is to keep an SSH daemon running within that environment. You can have a separate SSH daemon inside each network namespace. However, having multiple SSH daemons running uses a lot of valuable resources like memory. This is where having a special “init” process proves to be a good idea again.

The “init” process can establish a communication channel between the parent namespace and the child namespace. This channel can be based on UNIX sockets or can even use TCP. To create a UNIX socket that spans two different mount namespaces, you need to first create the child process, then create the UNIX

socket, and then isolate the child into a separate mount namespace. But how can we create the process first, and isolate it later? Linux provides unshare(). This special system call allows a process to isolate itself from the original namespace, instead of having the parent isolate the child in the first place. For example, the following code has the exact same effect as the code previously mentioned in the network namespace section:

And since the “init” process is something you have devised, you can make it do all the necessary work first, and then isolate itself from the rest of the system before executing the target child.

Conclusion

This tutorial is just an overview of how to use namespaces in Linux. It should give you a basic idea of how aLinux developer might start to implement system isolation, an integral part of the architecture of tools likeDocker or Linux Containers. In most cases, it would be best to simply use one of these existing tools, which are already well-known and tested. But in some cases, it might make sense to have your very own, customized process isolation mechanism, and in that case, this namespace tutorial will help you out tremendously.

There is a lot more going on under the hood than I’ve covered in this article, and there are more ways you might want to limit your target processes for added safety and isolation. But, hopefully, this can serve as a useful starting point for someone who is interested in knowing more about how namespace isolation with Linux really works.

About the author

Mahmud Ridwan

Dhaka, Bangladesh
Mahmud is a software developer with a knack for efficiency, scalability, and stable solutions. With years of experience working with a wide range of technologies, he is still interested in exploring, encountering, and solving new and interesting programming problems.

Sharing Windows Folders Across Linux Machines in a Network

Sharing Windows Folders Across Linux Machines in a Network

Below are the high level steps to Share files from a windows 7 machine to Linux Ubuntu 14.04 Machine. A common goal of setting up computers on a local network is being able to share files and folders. In order for file sharing to work, the computers must exist on the same network and Workgroup.

Enabling Sharing on Windows Folders

Sharing of files directly across network is not allowed. So, we have to copy files into one directory and that directory/folder can be shared. Right click on the folder and select Share with –> Specific People as shown below.

Share Files on Windows

Then we will get below screen. Here we can either find people on the same domain or we can share it with every one as shown below.

File Sharing Options

Accessing Shared Files on Ubuntu

Now Login, to your remote Ubuntu 14.04 machine is connected in the same network, Go To Home Directory via File System and then select File –> Connect to Server as shown below.

Connect to Server

Connect to Server Options

In the below screen, we need to provide Ip address of the windows machine in smb:// protocol, and then it asks for user name, domain group and password to login. Once we provide credentials, From there onwards we will be able to access the shared folders as shown below.

Shared files on Ubuntu

Thus we can easily use Ubuntu to access shared folders on a Windows 7 machine that co-exists on the same Workgroup.

Run Remote Commands over SSH

In this post, we will discuss about the details on communication between two nodes in a network via SSH and executing/running remote commands over SSH on a remote machine.

These two nodes in the cluster can be treated as server/client machines for easy understanding. To allow secure communications between Server and client machines, on the server side, we will need a public key and an authorization file, and on the client side, we will need a private key and an identification file. The public key on the server and private key on the client must be a matching pair of keys as generated by the $ ssh-keygen command.

In simpler words, 

If we need to connect to a remote machine m2 from m1 we need to generate ssh-keygen on m1 and copy ~/.ssh/id_rsa.pub file from m1 into ~/.ssh/authorized_keys in m2 machine. We can add any no of keys into this file for providing connection to m2 machine via ssh from many machines, but there should not be any empty lines between keys in authorized_keys file.

Now if we connect to m2 machine from m1 via $ ssh username@m2 it will directly connect without asking for password, if we have generated passwordless key in m1.

This is useful when there are N number of machines connected in a network and need to communicate with each other via ssh or scp without prompting for password. One real time example for this scenario is Hadoop Cluster.

The above setup will allow us to remotely login to another machine and and submit command on the remote machine. It is easy to submit single commands over SSH as shown below.

But it is a bit tricky to submit multiple commands over SSH. Below are the ways to submit multiple commands on a remote machine via SSH.

  • If commands are less and there are no control flow statements, (if, loops, etc…) then we can use them in single quotes, each command separated by semicolon as shown below.

  • If we need to use SSH in shell scripting and need to many commands on remote machine, including control flow statements like if, loop statements, then we can use it as follows.

But this method fails if we use to refer any local variables defined above the SSH login as shown below.

Here if we need to use local variables we can follow either of the below two options.

  • Write all the commands in a shell script file and the .sh file can be feed to SSH as shown below.

  • Embed all the commands in between tags like EOF, ENDSSH, ENDFTP, etc… as shown below. We can nest commands with this syntax.

  • For Interactive shell on remote machine from current machine’s command line we can fire below command.

The -t flag tells ssh that we’ll be interacting with remote shell. Without the -t flag top option will return results of commands and after that ssh will log out of the remote machine immediately. With the -t flag, ssh keeps us logged in until we exit the interactive command.

Sample Use Case to Run Remote Commands over SSH:

Below is the shell script for the sample use case where there are three machines m1, m2, m3. We need to copy files from m2 to m3 but this commands need to be submitted from machine m1. Copy below code into samplecopy.sh file.

Brief Notes on Unix Shell Scripting Concepts

This post provides a very brief notes on Unix Shell Scripting. As this topic is very well described in many text books,we are not going much deep into the details of each point. This post is for quick review/revision/reference of common Unix commands or Unix Shell Scripting.

Unix Shell Scripting

Kernel

The kernel is the heart of the UNIX system. It provides utilities with a means of accessing a machine’s hardware. It also handles the scheduling and execution of commands.

Note: When the computer is booted, the kernel is loaded from disk into memory. The kernel remains in memory until the machine is turned off. Utilities, on the other hand, are stored on disk and loaded into memory only when they are executed.

Shell

shell is an interface to the UNIX system. It collects input from user and executes programs based on that input. Once program finishes executing, shell displays that program’s output.

The different Bourne-type shells follow:

  • Bourne shell ( sh)
  • Korn shell ( ksh)
  • Bourne Again shell ( bash)
  • POSIX shell ( sh)

The #!/bin/sh must be the first line of a shell script in order for sh to be used to run the script. If this appears on any other line, it is treated as a comment and ignored by all shells.

PATH

The PATH specifies the locations in which the shell will look for commands. Usually it is set as follows:PATH=/bin:/usr/bin

Each of the individual entries separated by the colon character, :, are directories.

Compound Command

A compound command consists of a list of simple and complex commands separated by the semicolon character ( ;). An example of a complex command is

Here hostname and date are simple commands and who am i is complex command.

Comments

In shell scripts, comments start with the # character. Everything between the # and end of the line are considered part of the comment and are ignored by the shell.

Counting Words in a file

wc command can be used to get a count of the total number of lines, words, and characters contained in a file. The syntax of this command is

  • -l Counts the number of lines
  • -w Counts the number of words
  • -c Counts the number of characters
File Types in Unix

-rwxr-xr-x 1 siva users 2368 Jul 11 15:57 /home/siva/test.txt

Here first character is a hyphen (-). This indicates that the file is a regular file. For special files, the first character will be one of the letters given in below table.

Character File Type

  •  Regular file
  • l  Symbolic link
  • c  Character special
  • b  Block special
  • p  Named pipe
  • s  Socket
  • d  Directory file

A symbolic link is a special file that points to another file on the system.A symbolic link is similar to a shortcut or an alias.

Creating Symbolic Links

We can create symbolic links using the ln command with the -s option. The syntax is as follows:

Here, source is either the absolute or relative path to the original version of the file, and destination is the name we want the link to have.

Changing File and Directory Permissions

We can change permissions of a file/directory with the chmod command. Chmod options are below

Here (user) options are:

Symbol Represents

  • u Owner
  • g Group
  • o Other
  • a All

Actions

Symbol Represents

  • + Adding permissions to the file
  • Removing permission from the file
  • = Explicitly set the file permissions

Permissions:

Symbol Represents

  • r Read
  • w Write
  • x Execute
  • s SUID or SGID

Octal Method:

By changing permissions with an octal expression, we can only explicitly set file permissions.

The values of the individual permissions are the following:

  • Read permission has a value of 4
  • Write permission has a value of 2
  • Execute permission has a value of 1
Background Process

The simplest way to start a process in background is to add an ampersand (&) at the end of the command.

For example:

Moving a Foreground Process to the Background

While a foreground process is running, the shell does not process any new commands. Before we can enter any commands, we have to suspend the foreground process to get a command prompt. The suspend key is Ctrl+Z.

When a foreground process is suspended, a command prompt enables us to enter more commands; the original process is still in memory but is not getting any CPU time. To resume the foreground process, we have two choices–background and foreground. The bg command enables us to resume the suspended

process in the background; the fg command returns it to the foreground.

Some useful unix tips:

COPY & PASTE (WITHIN A TERMINAL):

COPY: CONTROL + SHIFT + C
PASTE: CONTROL + SHIFT + V

SETTING UP A VARIABLE GLOBALLY:

/etc/profile (one time per session on logon)
/etc/bash.bashrc (every time you close and open a terminal)

COMMAND TO REFRESH PROFILE CHANGES:
./etc/profile

SUDO EDIT:
[cloudera@localhost ~]$ sudo gedit /etc/profile
[cloudera@localhost ~]$ sudo gedit /etc/bashrc

 COMMAND COMPLETION:
[cloudera@localhost ~]$ cd s + TAB key –> will take you into scripts directory
[cloudera@localhost scripts]$

 CLEAR THE SCREEN:
CONTROL + L

 CUSTOMIZE COMMAND PROMPT:
[cloudera@localhost ~]$ export PS1=’$ ‘
$
$

 TO SPAN A COMMAND INTO MULTIPLE LINES: (use \)

TO LIST ALL JAVAE PROGRAMS (demons) RUNNING:

 REPLICATION FACTOR

Note: here 3 is the replication factor for _SUCCESS and part-r00000

TO LIST FILES IN THE LOCAL FILE SYSTEM (using hadoop fs command):
[cloudera@localhost ~]$ hadoop fs -ls file:///

RUN A MAPREDUCE PROGRAM:
[cloudera@localhost ~]$ hadoop jar <jar_file_name> <class_name> <input_dir> <output_dir>

To find all files modified in the last 24 hours (last full day) in a particular specific directory and its sub-directories:

files created, modified or accessed in the last hour