5 Chaining and piping

At this point we should make a distinction between the three operators we can use to combine commands: >, >> and |. Let’s cover them one by one. The first operator, > is used to overwrite a file completely, based on the input of another one.

touch test_file
cat "some content" > test_file
cat test_file

This should result in:

some content

But if we do:

cat "new content" > test_file

Can you guess the output? Yes - it will be updated. So how is this different from using >>? We can use the same example to illustrate this method. Instead of overrwriting the contents of the test_file we can test the other command:

cat "new content" >> test_file

In this case, the content from the previous chunk will be appended to the old one in the file:

cat test_file
some content
new content

So how about the | operator? This is called a “pipe,” and is used to construct the pipelines. Let’s go through a specific example. A common command-line procedure for data science work is creating a virtual environment for the Python setup (this is covered in another module). One thing we can commonly check for is what packages are installed in that specific environment. We can use the pip command to do that:

pip freeze

If you have some packages installed, you’ll get an output like this:

ptyprocess==0.7.0
pyasn1==0.4.2
pyasn1-modules==0.2.1

This shows the package names together with their versions. This list can become very big as a data science project matures, since we normally start to use more and more different packages. How do we go about searching for a specific package in this? We can use the popular grep command line tool, together with the pipe | operator:

pip freeze | grep ptyprocess
ptyprocess==0.7.0

And voila, it shows that the package is installed! We have fed the output of the first command to the input of the second one, creating a mini pipeline with a concrete result. Have a look at the diagram below for a visual overview of this process:

Can you figure out what would be the command to have a look at the last few rows?

Now that you observed a more advanced use case, let’s take a step back and familiarlize ourselves with the fundamental commands available on the command line.

# change a directory
cd [path_to_directory]

You can often use cd not to just move around in an existing project, but in other directories. It is often coupled with path names such as .. or ~/. The first one means “one directory above,” while the second one indicates the root directory of your system.

Can you find out what is the path name that indicates the current directory in the command line? Hint: you have used that in the version control module.

5.1 Users, Groups and Permissions

# list contents
ls

You can use this to have a look what is inside of a directory, and since you are not using a graphical interface anymore, where you would automatically see the contents of a folder when you navigate to it, you will be using ls a lot. It has a few useful arguments available, that can expand on the output, and useful if you want to know things such as the author, time of creation or most importantly - file size - such as ls -l. The output of this command can look like the following:

-rw-r--r--   1 username  staff    92 27 May 15:17 LICENSE

Here you can see a lot of information on a single file (in this case LICENSE), including the user - username, the group staff and the permissions -rw-r--r--. Let’s go through those elements one by one.

Starting with the user - this does what you expect it to do - authenticates you as a user on this particular machine. A user can be a part of different groups, with different permissions associated with each. This is very useful if you are configuring a machine collaboratively, and want to have better security. For example it is a common practice to limit the permissions on such systems only to the most needed ones, and the only users that have complete permissions (the so called root user) are the administrators. You can swith to a new user with the following command:

useradd [username]

And then switch to that user with:

su [username] -c command

There are a few other options available (such as the ones configuring the user groups), and you can learn more about them here.

The second concept that we should go through are the file permissions, represented by the several letters in the beginning of the ls- -lloutput. The first three characters correspond to the most important information - the (r)ead, (w)rite and e(x)ecute permissions. The second group of three are the same permissions, but for the group, and the last three are for all other users.

Those are mostly used to protect special files from an accidental or malicious edits. Those files are most commonly dotfiles, or configuration files that are of critical importance. If you have a super user privilege, you can change those permissions by using the following command:

chmod o+wc filename

Often you’ll be mistyping a command. Re-typing long commands can be tedious, but fortunately there’s are methods to navigate the history. The first thing you can do is use the arrow keys on your keyboard to select commands you executed, back in time. But there’s a handier method, which implements fuzzy searching through the complete history of the terminal. You can use CTRL + r and start typing. The fuzzy search means you can type any part of the command and you should be able to find it.

An easier way to change those privileges is to use a numbering scheme instead (you can find one here):

chmod 755 filename

What is the numeric code for the most wide permissions for a file?

A great visual overview of the different file permissions can be seen here:

The command line, especially if you are in sudo mode, can irreversibly destroy a lot of information, and even your system. Here are a few mistakes that can happen.

This command: sudo rm -rf ~./ will remove everything on your machine. The base command rm -rf is dangerous enough - if you don’t pay attention you can easily irreversibly delete files and folders. > filename will empty a file. Usually this can happen when you are typing commands too quickly.

Create a new user, named depoyment-user, add it to a user group called deployment and give all members there all permissions except write.

Most command line tools have some form of documentation available, where you can discover how to use them. Often more obscure things, such as a complete list of command line arguments might not be immidiately obvious, or even easily accessible on the internet, so you can check those help pages right in the command line. To do this use man [command name].