6 Third-party tools

A great playlist covering many command line tools is available below. It has a special focus on tools that are suitable for Data Scientists and Machine Learning Engineers.

The commands that you have so far used are built-in - that is they come preinstalled with your system. They are useful, but they are just scratching the surface of what is possible with the CLI. There is a multitude of tools that you can use there, and here we will go through a few.

6.1 wget

This is one of the best ways to download a file from the internet. You can even use it to download files even when you are using a normal browser such as Google Chrome, since the download might often hang, or you can close the browser window by accident. Example usage:

wget [fileurl]

Use wget to download a file from a web page (i.e. ArXiV paper). Which browser functionality can help you with that?

6.2 sed

This library can be used to edit the contents in a file with a very good speed. Unfortunately the syntax to use it is not so easy. You can use it for example to remove all commas from a file:

sed 's/,//g' file  > new_file

:class: tip

Use sed to change the delimiter of a whole file.

6.3 grep

This is a tool to search for content within a file. For example if we want to see if we have scikit-learn installed in a virtual environment we can do the following:

pip freeze | grep scikit

How can you use grep to get information on all of the current working processes on your machine related to your browser?