17 The Joy of Notebooks

Our learning objectives in this section:

  • Understand how to effectively use Jupyter Notebooks for producing and distributing reports.

17.1 Jupyter: More than just a gas giant

17.1.1 The Birth of Jupyter

Throughout this course, we’ve used simple .py scripts to learn and execute python code. VS Code and the Python extension helped up in sending commands directly to a console and seeing the output all in one place.

The other format that we’ve casually seen is the Jupyter notebook. The notebook file extension, .ipynb, comes from the IPython package, which introduced interactive python in Python way back in 2001. After receiving financial support from funding institutions supporting open source software, the project was expanded to include Julia, Python notebooks and R, hence: Ju, pyte, rJupyter – was born. Nonetheless, this format is often used exclusively for Python in the data science community.

17.1.2 The Uses (and Misuses) of Jupyter

The interactive features of Jupyter notebooks make them very attractive, but in reality, they are often used to write and present Python code and output inline, i.e. code directly followed by it’s output, interspersed with non-code commentary written in Markdown. That sounds very much like Rmarkdown doesn’t it!? Well, not quite. There are at least two major distinctions to consider:

First, Jupyter notebooks natively support interactivity, but it’s often not taken advantage of, so a major feature is not exploited. Rmarkdown supports interactive features as well, but you have to explictly call a shiny runtime in the YAML header.

Second, Rmarkdown is a simple plain text file, you can edit it in any text editor. Jupyter notebooks are written in JSON format. JSON is JavaScript Object Notation and allows storing and transferring information from a server as pure text and using it as a JavaScript object. That’s a clever choice for being natively interactive, but it makes manually editing and reading the raw file more difficult.

Third, although it’s easy to produce Jupyter notebooks, when used to show you analysis and communicate results, they often suffer from very poor editing and attention to the needs of the reader. This is because, in contrast to Rmarkdown code chunks, it’s not clear to many users how to control the output in the document before delivering it to the end user (e.g. the client).

On a personal note, I also find them to be more cumbersome to navigate, requiring more keyboard shortcuts to move around. If there isn’t an advantage to using them, I prefer to stick with simple scripts or a flat text file.

Nonetheless they are quite popular so let’s take a look at some ways to get the most out of them.

17.1.3 The Avenues of Jupyter

In the next sections, we’ll discuss ways to effectively produce and export Jupyter notebooks. Among the many tools available, we’ll focus on some of the most popular and useful:

  • VS Code
    • Jupyter Notebooks as Python Scripts
    • Native Jupyter Notebooks
  • Jupyter
    • Jupyter Notebooks
    • Jupyter Lab
  • Markdown-like
    • Quarto

For some of these tools, the documentation is available on readthedocs.org, an online service the builds and hosts documentation for open source projects. When relevant, you’ll be asked to read the docs for details.

In addition to the above tools, there are many more options available. You may already be familiar with using Notebooks online with Kaggle Notebooks and Google Colab.

17.2 VS Code

Before we dig deeper into Jupyter Notebooks proper, let’s make it clear that we have actually been using Jupyter Notebooks incognito as Python scripts the whole time.

17.2.1 Jupyter Notebooks as Python Scripts

The VS Code Python extension give us the capability to use the comment # %% in a regular python script using the .py file extension. This adds a small, clickable interface that includes thge Run Cell command. Yes, this was a Jupyter Notebook all along!

You can switch between markdown and code cells, move, insert and delect cells, plus many more features. This table, take from the documentation, contains some handy commands accessible from the command palette or directly using keyboard shortcuts:

Command Palette Keyboard shortcut
Python: Go to Next Cell Ctrl+Alt+]
Python: Go to Previous Cell Ctrl+Alt+[
Python: Extend Selection by Cell Above Ctrl+Shift+Alt+[
Python: Extend Selection by Cell Below Ctrl+Shift+Alt+]
Python: Move Selected Cells Up Ctrl+; U
Python: Move Selected Cells Down Ctrl+; D
Python: Insert Cell Above Ctrl+; A
Python: Insert Cell Below Ctrl+; B
Python: Insert Cell Below Position Ctrl+; S
Python: Delete Selected Cells Ctrl+; X
Python: Change Cell to Code Ctrl+; C
Python: Change Cell to Markdown Ctrl+; M

The advantage of using notebooks but writing as a script is that we can write in a plain text format, moving and prototyping commands more easily than in the native Notebook format with is json format under the hood. Not bad! See more documentation on this feature here.

17.2.2 Native Jupyter Notebooks

We can of course use Jupyter Notebooks natively, which you have already done with the files provided for this course. Here the notebooks look like what you may expect a Jupyter Notebook to look like. We have a clear distinction between markdown text and script chunks and output is shown inline after each chunk. Under the hood, this is a json formatted file that you wouldn’t want to manually edit.

17.3 The Jupyter Project

Although we can use VS Code to write notebooks, the Jupyter project ecosystem contains useful tools that you are likely to encounter. You can see a list of some tools with instructions to install them on the project’s install page, but we’ll focus on the two main tools: Jupyter Notebooks and JupyterLab.

You can access jupyter from the console by running jupyter followed by a subcommand. We’ll focus on two subcomands:

  • notebook
  • lab

17.3.1 Jupyter Notebooks

Executing

jupyter notebook

will start a jupyter server on the localhost, e.g. http://localhost:8888 and allows you to serve individual Jupyter notebooks from a local directory. To exit you can stop the server by executing ctrl + c in the console.

17.3.2 Jupyter Lab

This simple interface is useful, but recall that we achieved a very similar thing in VS Code, which is why we did it all natively in the same text editor. However, Jupyter Lab, the next stage in notebook development offers more features. Execute:

jupyter lab

to start a Jupyter server on the localhost, e.g. http://localhost:8888 running Jupyter Lab. This is the next generation of Jupyter Notebooks and functions more like a comprehensive IDE. If you want to work with Jupyter Lab, you can read the docs.

17.3.3 Nbconvert

The nbconvert library is installed as a dependency of jupyter. It facilitates conversion of Jupyter notebooks to a variety of other formats, like HTML and pdf.

The nbconvert library can be imported and used inside Jupyter notebooks, or called from the command line. For the most part you don’t need to worry about this, but you may be interested in using it for advanced customization, like making template files. For details, read the docs

17.4 Markdown-like

17.4.1 Quarto

Quarto is the latest and greatest contribution to the Jupyter Notebook ecosystem. It also allow you to combine Julia, Python and R together in one notebook but gives you direct and flexible control of each chunk just like you can with RMarkdown4.

To use Quarto with VS Code, perform the following tasks:

  1. Install the Quarto extension inside your docker container. You can find this in the VS Code Extension Marketplace.
  2. Install the Quarto package by following the steps below.

There are a few ways to install Quarto, here we’ll see three: using apt-get, directly from github, or from the website.

Open an integrated terminal window in a VS Code workspace running inside your docker container. The following commands will update the packge list and install the necessary tools to run Quarto. The curl command is used to download the Linux source files.

$ sudo apt-get update
$ sudo apt-get install gdebi-core
$ sudo curl -LO https://quarto.org/download/latest/quarto-linux-amd64.deb
$ sudo gdebi quarto-linux-amd64.deb

Alternatively, you can install quarto directly from its source at GitHub as follows.

git clone https://github.com/quarto-dev/quarto-cli
cd quarto-cli
./configure-linux.sh
cd ..
rm -rf quarto-cli

Finally, you can also install quarto locally by downloading the installation files from the project homepage

After installation is complete, the following command should tell you which version you’re running


$ quarto help

  Usage:   quarto 
  Version: 0.9.639
...

In addition you should check that the installation works, can render markdown documents and has found Python and R. To do this run:

$ quarto check
[✓] Checking Quarto installation......OK

[✓] Checking basic markdown render....OK

[✓] Checking Python 3 installation....OK

[✓] Checking Jupyter engine render....OK

[✓] Checking R installation...........OK
 
[✓] Checking Knitr engine render......OK

Some installation issues seem to persist when installing into a Docker container on the Apple M1 (non-intel chips). If you can’t sort out the installation issues, you can install it on your local OS by following the instruction on the website.

When you ran quarto help, several commands were listed. The most useful are:

Command Arguments Description
render [input], [args...] Render input file(s) to various document types.
preview [file], [args...] Render and preview a document or website project.
serve [input] Serve a Shiny interactive document.
create-project [dir] Create a project for rendering multiple documents.
convert <input> Convert documents to alternate representations.

Using quarto in a notebook

In the courses repo you’ll file a directory called 05-python/the_joy_of_jupyter/quarto. The frist two files are self-explanatory, they use a python script with annotations for use in VS code, i.e. # %%, and as a plain notebook.

The file 03_notebook_quarto.ipynb is a Python notebook the contains annotations for using quarto. To render this file, execute:

quarto render 03_notebook_quarto.ipynb --execute

The --execute command will force executation of all the code chunks, otherwise it assumes you’ve already done it in the notebook.

Pure Quarto

Quarto itself works like Rmarkdown, as shown in 04_quarto_plain.html. If you have the Quarto extension in VS Code, you can click on the render button that will appear at the top of the document in VS Code if you have the quarto extension installed. Otherwise, execute:

quarto render 04_quarto_plain.qmd --execute 

Chunk Options

See the Jupyter Code Cell options for configuration of each code chunk.