Chapter 10 Comprehensions

Now that we’ve seen iterations in action, let’s take a look at how list and dict comprehensions make our lives easier.

10.1 List comprehensions

List comprehensions are used to create lists from other lists, DataFrame columns and other data containers. Comprehensions are useful and common as they allow you to rapidly iterate over sets of objects. They allow us to perform complex operations in a single line of code.

For example, this for loop is pretty tedious:

vals = [6, 8, 4, 2, 5, 6, 7, 3, 5]
vals
## [6, 8, 4, 2, 5, 6, 7, 3, 5]
new_vals = []

for num in vals:
    new_vals.append(num + 10)
    
new_vals
## [16, 18, 14, 12, 15, 16, 17, 13, 15]

A much easier way is to use a list comprehension:

new_vals2 = [num + 10 for num in vals]
new_vals2
## [16, 18, 14, 12, 15, 16, 17, 13, 15]

Can you see the direct relationship between the above list comprehension and the for loop?

Well, to be honest, in this case you would just use a NumPy array.

vals = np.array([6, 8, 4, 2, 5, 6, 7, 3, 5])
vals
## array([6, 8, 4, 2, 5, 6, 7, 3, 5])
vals + 10
## array([16, 18, 14, 12, 15, 16, 17, 13, 15])

But that will not work for everything you want to do, since comprehensions are not just for lists, it works over any iterable. Remember that a range object is iterable:

# like a range object
[num + 10 for num in range(10)]
## [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

where I don’t have to convert to a NumPy array first. To perform list comprehensions, we need:

  1. An iterable
  2. An iterator variable to represent the members of the iterable
  3. The output expression

List comprehensions use the following syntax [[output expression] for iterator variable in iterable].

This also works for nested for loops. Take a look at this typical example:

pairs_1 = []

for num1 in range(0,2):
    for num2 in range(6,8):
        pairs_1.append((num1, num2))
        
pairs_1
## [(0, 6), (0, 7), (1, 6), (1, 7)]

It’s better as list comprehensions:

pairs_2 = [(num1, num2) for num1 in range(0,2) for num2 in range(6,8)]
pairs_2
## [(0, 6), (0, 7), (1, 6), (1, 7)]

It is a little less readable at first, but once you get the hang of it, it’s a nice syntax. For example:

[i**2 for i in range(0,10)]
## [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Let’s look at a more interesting example, with a matrix. Remember a matrix is just a 2-dimenaional NumPy array. In Python a matrix is represented as a list of lists, all having the same type:

matrix = [[0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4],
          [0, 1, 2, 3, 4]]

Can we produce this using nested list comprehensions?

You can create one of the rows of the matrix with a single list comprehension, then, to create the list of lists, you simply have to supply the list comprehension as the output expression of the overall list comprehension.

That is, the output expression, as we see in the generic syntax [[output expression] for iterator variable in iterable], is itself a list comprehension.

Here’s the nested for loop solution:

matrix = [] 
  
for i in range(5): 
      
    # Append an empty sublist inside the list 
    matrix.append([]) 
      
    for j in range(5): 
        matrix[i].append(j) 
          
matrix
## [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]

And as a list comprehension:

# Nested list comprehension 
matrix = [[j for j in range(5)] for i in range(5)] 
# matrix = [[col for col in range(5)] for row in range(5)]
print(matrix) 
## [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]

So that is certainly cleaner code, and you can appreciate why list comprehensions are so popular.

10.2 Using conditionals

In the chapter on Logical Expressions, we saw how to use relational and logical operators to result in Boolean objects. Here, we can use them to subset lists.

[ output expression for iterator variable in iterable if predicate expression ]

For example:

[num ** 2 for num in range(10) if num % 2 == 0]
## [0, 4, 16, 36, 64]
Get only long names:
Exercise 10.1 Using the cities list, below, extract only those cities with long names, over 6 characters long.
cities = ['Munich', 'Paris', 'Amsterdam', 'Madrid', 'Istanbul']

As an example, recall the modulo operator to check if a value is even? We can include a conditional on the output:

# Square all the values and return only the even ones.
[num ** 2 if num % 2 == 0 else 0 for num in range(10)]
## [0, 0, 4, 0, 16, 0, 36, 0, 64, 0]
Exercise 10.2 Revisit the previous exercise, but change the position of the logical expression.

10.3 Dictionary comprehensions

In addition to lists, we can also use dictionary comprehensions. There are two key differences:

  1. Use {}, not [].
  2. The key and value are separated by a : in the output expressions
pos_neg = {num: -num for num in range(10)}
pos_neg
## {0: 0, 1: -1, 2: -2, 3: -3, 4: -4, 5: -5, 6: -6, 7: -7, 8: -8, 9: -9}
pos_neg = {num: num**2 for num in range(10)}
pos_neg
## {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25, 6: 36, 7: 49, 8: 64, 9: 81}
cities = ['Munich', 'Paris', 'Amsterdam', 'Madrid', 'Istanbul']
{names: len(names) for names in cities}
## {'Munich': 6, 'Paris': 5, 'Amsterdam': 9, 'Madrid': 6, 'Istanbul': 8}

10.3.1 Generators

Generators are like comprehensions, except that they don’t store the solution in memory. This is practical when working on large data sets. The difference is in notation, just use () notation instead of [].

10.4 Wrap-up

So far in our journey in iterations we’ve seen:

  • Iterators
  • Associated iteration functions
  • Generators (very briefly)
  • List and dict comprehensions