17 The Joy of Notebooks
Our learning objectives in this section:
- Understand how to effectively use Jupyter Notebooks for producing and distributing reports.
17.1 Jupyter: More than just a gas giant
17.1.1 The Birth of Jupyter
Throughout this course, we’ve used simple .py
scripts to learn and execute python code. VS Code and the Python extension helped up in sending commands directly to a console and seeing the output all in one place.
The other format that we’ve casually seen is the Jupyter notebook. The notebook file extension, .ipynb
, comes from the IPython
package, which introduced interactive python in Python way back in 2001. After receiving financial support from funding institutions supporting open source software, the project was expanded to include Julia, Python notebooks and R, hence: Ju, pyte, r – Jupyter – was born. Nonetheless, this format is often used exclusively for Python in the data science community.
17.1.2 The Uses (and Misuses) of Jupyter
The interactive features of Jupyter notebooks make them very attractive, but in reality, they are often used to write and present Python code and output inline, i.e. code directly followed by it’s output, interspersed with non-code commentary written in Markdown. That sounds very much like Rmarkdown doesn’t it!? Well, not quite. There are at least two major distinctions to consider:
First, Jupyter notebooks natively support interactivity, but it’s often not taken advantage of, so a major feature is not exploited. Rmarkdown supports interactive features as well, but you have to explictly call a shiny
runtime in the YAML header.
Second, Rmarkdown is a simple plain text file, you can edit it in any text editor. Jupyter notebooks are written in JSON format. JSON is JavaScript Object Notation and allows storing and transferring information from a server as pure text and using it as a JavaScript object. That’s a clever choice for being natively interactive, but it makes manually editing and reading the raw file more difficult.
Third, although it’s easy to produce Jupyter notebooks, when used to show you analysis and communicate results, they often suffer from very poor editing and attention to the needs of the reader. This is because, in contrast to Rmarkdown code chunks, it’s not clear to many users how to control the output in the document before delivering it to the end user (e.g. the client).
On a personal note, I also find them to be more cumbersome to navigate, requiring more keyboard shortcuts to move around. If there isn’t an advantage to using them, I prefer to stick with simple scripts or a flat text file.
Nonetheless they are quite popular so let’s take a look at some ways to get the most out of them.
17.1.3 The Avenues of Jupyter
In the next sections, we’ll discuss ways to effectively produce and export Jupyter notebooks. Among the many tools available, we’ll focus on some of the most popular and useful:
- VS Code
- Jupyter Notebooks as Python Scripts
- Native Jupyter Notebooks
- Jupyter
- Jupyter Notebooks
- Jupyter Lab
- Markdown-like
- Quarto
For some of these tools, the documentation is available on readthedocs.org, an online service the builds and hosts documentation for open source projects. When relevant, you’ll be asked to read the docs for details.
In addition to the above tools, there are many more options available. You may already be familiar with using Notebooks online with Kaggle Notebooks and Google Colab.
17.2 VS Code
Before we dig deeper into Jupyter Notebooks proper, let’s make it clear that we have actually been using Jupyter Notebooks incognito as Python scripts the whole time.
17.2.1 Jupyter Notebooks as Python Scripts
The VS Code Python extension give us the capability to use the comment # %%
in a regular python script using the .py
file extension. This adds a small, clickable interface that includes thge Run Cell command. Yes, this was a Jupyter Notebook all along!
You can switch between markdown and code cells, move, insert and delect cells, plus many more features. This table, take from the documentation, contains some handy commands accessible from the command palette or directly using keyboard shortcuts:
Command Palette | Keyboard shortcut |
---|---|
Python: Go to Next Cell | Ctrl+Alt+] |
Python: Go to Previous Cell | Ctrl+Alt+[ |
Python: Extend Selection by Cell Above | Ctrl+Shift+Alt+[ |
Python: Extend Selection by Cell Below | Ctrl+Shift+Alt+] |
Python: Move Selected Cells Up | Ctrl+; U |
Python: Move Selected Cells Down | Ctrl+; D |
Python: Insert Cell Above | Ctrl+; A |
Python: Insert Cell Below | Ctrl+; B |
Python: Insert Cell Below Position | Ctrl+; S |
Python: Delete Selected Cells | Ctrl+; X |
Python: Change Cell to Code | Ctrl+; C |
Python: Change Cell to Markdown | Ctrl+; M |
The advantage of using notebooks but writing as a script is that we can write in a plain text format, moving and prototyping commands more easily than in the native Notebook format with is json
format under the hood. Not bad! See more documentation on this feature here.
17.2.2 Native Jupyter Notebooks
We can of course use Jupyter Notebooks natively, which you have already done with the files provided for this course. Here the notebooks look like what you may expect a Jupyter Notebook to look like. We have a clear distinction between markdown text and script chunks and output is shown inline after each chunk. Under the hood, this is a json
formatted file that you wouldn’t want to manually edit.
17.3 The Jupyter Project
Although we can use VS Code to write notebooks, the Jupyter project ecosystem contains useful tools that you are likely to encounter. You can see a list of some tools with instructions to install them on the project’s install page, but we’ll focus on the two main tools: Jupyter Notebooks and JupyterLab.
You can access jupyter from the console by running jupyter
followed by a subcommand. We’ll focus on two subcomands:
notebook
lab
17.3.1 Jupyter Notebooks
Executing
jupyter notebook
will start a jupyter server on the localhost, e.g. http://localhost:8888
and allows you to serve individual Jupyter notebooks from a local directory. To exit you can stop the server by executing ctrl
+ c
in the console.
17.3.2 Jupyter Lab
This simple interface is useful, but recall that we achieved a very similar thing in VS Code, which is why we did it all natively in the same text editor. However, Jupyter Lab, the next stage in notebook development offers more features. Execute:
jupyter lab
to start a Jupyter server on the localhost, e.g. http://localhost:8888
running Jupyter Lab. This is the next generation of Jupyter Notebooks and functions more like a comprehensive IDE. If you want to work with Jupyter Lab, you can read the docs.
17.3.3 Nbconvert
The nbconvert
library is installed as a dependency of jupyter
. It facilitates conversion of Jupyter notebooks to a variety of other formats, like HTML and pdf.
The nbconvert
library can be imported and used inside Jupyter notebooks, or called from the command line. For the most part you don’t need to worry about this, but you may be interested in using it for advanced customization, like making template files. For details, read the docs
17.4 Markdown-like
17.4.1 Quarto
Quarto is the latest and greatest contribution to the Jupyter Notebook ecosystem. It also allow you to combine Julia, Python and R together in one notebook but gives you direct and flexible control of each chunk just like you can with RMarkdown4.
To use Quarto with VS Code, perform the following tasks:
- Install the Quarto extension inside your docker container. You can find this in the VS Code Extension Marketplace.
- Install the Quarto package by following the steps below.
There are a few ways to install Quarto, here we’ll see three: using apt-get
, directly from github, or from the website.
Open an integrated terminal window in a VS Code workspace running inside your docker container. The following commands will update the packge list and install the necessary tools to run Quarto. The curl command is used to download the Linux source files.
$ sudo apt-get update
$ sudo apt-get install gdebi-core
$ sudo curl -LO https://quarto.org/download/latest/quarto-linux-amd64.deb
$ sudo gdebi quarto-linux-amd64.deb
Alternatively, you can install quarto directly from its source at GitHub as follows.
git clone https://github.com/quarto-dev/quarto-cli
cd quarto-cli
./configure-linux.sh
cd ..
rm -rf quarto-cli
Finally, you can also install quarto locally by downloading the installation files from the project homepage
After installation is complete, the following command should tell you which version you’re running
$ quarto help
Usage: quarto
Version: 0.9.639
...
In addition you should check that the installation works, can render markdown documents and has found Python and R. To do this run:
$ quarto check
[✓] Checking Quarto installation......OK
[✓] Checking basic markdown render....OK
[✓] Checking Python 3 installation....OK
[✓] Checking Jupyter engine render....OK
[✓] Checking R installation...........OK
[✓] Checking Knitr engine render......OK
Some installation issues seem to persist when installing into a Docker container on the Apple M1 (non-intel chips). If you can’t sort out the installation issues, you can install it on your local OS by following the instruction on the website.
When you ran quarto help
, several commands were listed. The most useful are:
Command | Arguments | Description |
---|---|---|
render |
[input] , [args...]
|
Render input file(s) to various document types. |
preview |
[file] , [args...]
|
Render and preview a document or website project. |
serve |
[input] |
Serve a Shiny interactive document. |
create-project |
[dir] |
Create a project for rendering multiple documents. |
convert |
<input> |
Convert documents to alternate representations. |
Using quarto in a notebook
In the courses repo you’ll file a directory called 05-python/the_joy_of_jupyter/quarto
. The frist two files are self-explanatory, they use a python script with annotations for use in VS code, i.e. # %%
, and as a plain notebook.
The file 03_notebook_quarto.ipynb
is a Python notebook the contains annotations for using quarto. To render this file, execute:
quarto render 03_notebook_quarto.ipynb --execute
The --execute
command will force executation of all the code chunks, otherwise it assumes you’ve already done it in the notebook.
Pure Quarto
Quarto itself works like Rmarkdown, as shown in 04_quarto_plain.html
. If you have the Quarto extension in VS Code, you can click on the render button that will appear at the top of the document in VS Code if you have the quarto extension installed. Otherwise, execute:
quarto render 04_quarto_plain.qmd --execute
Chunk Options
See the Jupyter Code Cell options for configuration of each code chunk.