Chapter 13 Object-oriented Programming

Now that we’ve seen a few case studies in full, let’s move onto

13.1 What is OOP

pandas DataFrames are built on top of NumPy arrays which are built on top of Python objects.

OOP allows for a way to build flexible and reproducible code. We build building blocks that can be assembled into more advanced modules and libraries.

What we often do is imperative programming, where we write functions and variables as we go along, e.g.

myList = [1,2,3]

for item in myList:
    print(f"Item {item}")
## Item 1
## Item 2
## Item 3

Now if this list is part of another function, or if we need to call it with new values over and over, then it’s object-oriented programming (OOP) to the rescue! Let’s take a look:

# Create a new class:
class PrintList:

    def __init__(self, numberlist):
        self.numberlist = numberlist
        
    def print_list(self):
        for item in self.numberlist:
            print(f"Item {item}")

# Instantiate it (i.e. create an instance)
A = PrintList([1,2,3])

# Call a method of the instance
A.print_list()
        
## Item 1
## Item 2
## Item 3

Everything in python progressively builds on complexity. If we begin with a single number, we can group several numbers into a list, a list of list becomes a NumPy array and the array becomes a DataFrame of Series.

# A variable is a python object
numb = 4

# A list is an object 
list_on_numbs = [1,2,4]
number_array = [[1,2,4], [6,7,9]]

# A df is built form a numpy array
df = pd.DataFrame([[1,2,4], [6,7,9]])
print(df)
##    0  1  2
## 0  1  2  4
## 1  6  7  9

13.2 Using classes

If you have a chunk of code that has both functions and variables that you want to reference, Classes simplify your work. A class is a reusable chunk of code that has functions (here called methods) and variables.

Let’s clear up some terminology differences between imperative and OOP programming:

Imperative OOP
Variable Attribute/Field (or class variables)
Function Method

We’ve been discussing methods already throughout the workshop, and here we see what we really mean. A method is just a function associated with a specific class. How does this connect to an object?

A class is like a cookie cutter. Once you make a class you can use it over and over to create different objects from the parent class. A class is a template for an object, just like a cookie cutter is the template for all variety of different cookies, chocolate chip or peanut-butter oatmeal cookies are the specific objects.

Previously, we used the def keyword to define a function:

# For functions:
def greet(name, salutation):
    return "Hello, " + salutation + " " + name
print(f"{greet(name, salutation)}")
## Hello, Mr. Berlin

Defining a class is similar, but instead of def we use the class keyword:

# for a class (parentheses are optional in py3)
class Greet():
    pass

Here, pass means we don’t put any context or values in the class – yet. At the moment Python will skip over it and continue with the rest of the script.

Next, we create an instance of the class, Greet as an object:

hello = Greet()

Let’s see a complete class. This is an example from the DataCamp course on OOP in Python taught by Vicki Boykis. Here she’s building a class which will read in a file and convert it as a DataFrame-like object.

class TestClass:
    """This is my new class"""
    def __init__(self, filename):
        self.filename = filename

    def create_datashell(self):
        self.array = np.genfromtxt(self.filename, delimiter=',', dtype=None)
        return self.array

    def rename_column(self, old_colname, new_colname):
        for index, value in enumerate(self.array[0]):
            if value == old_colname.encode('UTF-8'):
                self.array[0][index] = new_colname
        return self.array
        
    def show_shell(self):
        print(self.array)
    
    def five_figure_summary(self, col_pos):
        statistics = stats.describe(self.array[1:col_pos].astype(np.float))
        return f"Five-figure stats of column {col_position}: {statistics}"

As you can see, we have a DOCSTRING below the definition of our class that is denoted by """...""". That’s again similar to what we saw with functions. The rest of the class contains attributes (class variables) and methods (i.e. class variables). You see lots of underscores in python variable names. We skirted around this issue so far, but let’s take a brief look:

Naming | Meani Meaning
_var | A con A convention used to show that a variable is meant for internal use within a function or method
var_ A convention used to avoid naming conflicts with Python keywords
__var | Trig Triggers name mangling when used in a class context to prevent inheritance collisions. Enforced by the Python interpreter
__var__ Special methods defined by the Python language. Avoid this naming scheme for your own attributes |
_ | Naming a temporary or insignificant variable, e.g. in a for loop

We won’t go into all the examples here, but the one of most interest is the “double underscore” variables, __var__, called a dunder variable.

The dunder variable in our class definition is the class constructor, i.e. the __init__ method.

So now we can see the three main features of a class in Python:

  • Constructors, i.e. the init
  • Attributes (class variables)
  • Methods (class functions)

The __init__ method is the constructor for the class. This special dunder method initializes the class. Within this method is the filename variable passed in as a parameter. this is an attribute (class variable) and is initialized when we create the class, hence init.

The methods are defined using a def keyword, just like what we’d do to define a function.

13.2.1 __init__ializing a class

A class has attributes (class variables) and methods (class functions). Each method takes self as a parameter and we begin with a special method __init__, which is also known as a constructor. It sets up the object the way we want from the very beginning before we pass anything in. It’s called automatically when an object is created and thus takes values outside of the object or sets values within an object. Let’s return to our Greet class:

# for a class (parentheses are optional in py3)
class Greet():
    def __init__(self, name):
        self.name = name

13.2.2 The self parameter

self represents the instance of the class, or the specific object, within the constructor. Recall that an object is an instance of a class. That object needs a way to reference that instance. The first variable is always a reference to the current instance of the class. self is just a typical shorthand for this variable, it’s not a keyword, although it’s used like one.

Let’s make a class with some instance variables:

# Create class: DataShell
class DataShell:
  
	# Initialize class with self and integerInput arguments
    def __init__(self, integerInput):
      
		# Set data as instance variable, and assign the value of integerInput
        self.data = integerInput

# Declare variable x with value of 10
x = 10      

# Instantiate DataShell passing x as argument: my_data_shell
my_data_shell = DataShell(x)

# Print my_data_shell
print(my_data_shell.data)
## 10

Of course, we can have multiple instance variables

# Create class: DataShell
class DataShell:
  
	# Initialize class with self, identifier and data arguments
    def __init__(self, identifier, data):
      
		# Set identifier and data as instance variables, assigning value of input arguments
        self.identifier = identifier
        self.data = data

# Declare variable x with value of 100, and y with list of integers from 1 to 5
x = 100
y = [1, 2, 3, 4, 5]

# Instantiate DataShell passing x and y as arguments: my_data_shell
my_data_shell = DataShell(x, y)

# Print my_data_shell.identifier
print(my_data_shell.identifier)

# Print my_data_shell.data
## 100
print(my_data_shell.data)
## [1, 2, 3, 4, 5]

13.2.3 Methods: Functions within classes

Methods are the functions within a class and look like regular functions in imperative mode:

# Create class: DataShell
class DataShell:
  
	# Initialize class with self and dataList as arguments
    def __init__(self, dataList):
      	# Set data as instance variable, and assign it the value of dataList
        self.data = dataList
        
	# Define method that returns data: show
    def show(self):
        return self.data
        
    # Define method that prints average of data: avg 
    def avg(self):
        # Declare avg and assign it the average of data
        avg = sum(self.data)/float(len(self.data))
        # Return avg
        return avg
        
# Instantiate DataShell taking integer_list as argument: my_data_shell
integer_list = [1, 2, 3, 4, 5]
my_data_shell = DataShell(integer_list)

# Print output of your object's show method
print(my_data_shell.show())

# Print output of your object's avg method
## [1, 2, 3, 4, 5]
print(my_data_shell.avg())
## 3.0

Notice that we use the object name my_data_shell and then the . notation to access a method, e.g. avg(). We saw a lot of that earlier on in the workshop.

13.3 The three method types

There are three different method types:

  • Instance methods
  • Class methods, and,
  • Static methods

Here is a class that contains all three method types:

class MyClass:
    """This is my new class"""
    def method(self):
        return 'instance method called, self = ', self

    @classmethod
    def classmethod(cls):
        return 'class method called cls = ', cls

    @staticmethod
    def staticmethod():
        return 'A static method was called, there are no parameters'  

Let’s take a closer look:

13.3.1 Instance Methods

method, is a regular instance method, like what we’ve seen so far. It takes one parameter, self (although as we saw above, it can accept more), which points to an instance of MyClass when the method is called.

Through the self parameter, an instance method can freely access attributes and other methods on the same object, which means they can modify an object instance state.

Not only can they modify object state, instance methods can also access the class itself through the self.__class__ attribute. This means instance methods can also modify class state.

13.3.2 Class Methods

classmethod is a class method. It is marked with a @classmethod decorator to flag it as such.

Class methods take a cls parameter that points to the class -— and not the object instance -— when the method is called.

Because the class method only has access to this cls argument, it can’t modify object instance state, but it can still modify class state that applies across all instances of the class.

13.3.3 Static Methods

staticmethod is a static method. It is marked with a @staticmethod decorator to flag it as such.

This type of method takes neither a self nor a cls parameter (but of course it’s free to accept an arbitrary number of other parameters).

Thus, a static method can neither modify object state nor class state. Static methods are restricted in what data they can access - and they’re primarily a way to namespace your methods.

In summary:

Method type Decorator Modify object state Modify class state
Instance None Yes Yes
Class @classmethod No Yes
Static @staticmethod No No

13.4 An example

13.4.1 Instance Methods

We’ll start by creating an instance of the class and then calling the three different methods on it. Each method’s implementation returns a tuple containing information for us to trace what’s going on -— and which parts of the class or object the method can access.

obj = MyClass()

# The type
type(obj)
## <class '__main__.MyClass'>
type(obj).__name__

# The class
## 'MyClass'
obj.__class__.__name__
## 'MyClass'

13.4.2 Instance Method

obj.method()
## ('instance method called, self = ', <__main__.MyClass object at 0x7f98093b9438>)
dir(obj)
## ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'classmethod', 'method', 'staticmethod']
import inspect
inspect.getmembers(obj)
## [('__class__', <class '__main__.MyClass'>), ('__delattr__', <method-wrapper '__delattr__' of MyClass object at 0x7f98093b9438>), ('__dict__', {}), ('__dir__', <built-in method __dir__ of MyClass object at 0x7f98093b9438>), ('__doc__', 'This is my new class'), ('__eq__', <method-wrapper '__eq__' of MyClass object at 0x7f98093b9438>), ('__format__', <built-in method __format__ of MyClass object at 0x7f98093b9438>), ('__ge__', <method-wrapper '__ge__' of MyClass object at 0x7f98093b9438>), ('__getattribute__', <method-wrapper '__getattribute__' of MyClass object at 0x7f98093b9438>), ('__gt__', <method-wrapper '__gt__' of MyClass object at 0x7f98093b9438>), ('__hash__', <method-wrapper '__hash__' of MyClass object at 0x7f98093b9438>), ('__init__', <method-wrapper '__init__' of MyClass object at 0x7f98093b9438>), ('__init_subclass__', <built-in method __init_subclass__ of type object at 0x7f97545f7d48>), ('__le__', <method-wrapper '__le__' of MyClass object at 0x7f98093b9438>), ('__lt__', <method-wrapper '__lt__' of MyClass object at 0x7f98093b9438>), ('__module__', '__main__'), ('__ne__', <method-wrapper '__ne__' of MyClass object at 0x7f98093b9438>), ('__new__', <built-in method __new__ of type object at 0x111dc90b8>), ('__reduce__', <built-in method __reduce__ of MyClass object at 0x7f98093b9438>), ('__reduce_ex__', <built-in method __reduce_ex__ of MyClass object at 0x7f98093b9438>), ('__repr__', <method-wrapper '__repr__' of MyClass object at 0x7f98093b9438>), ('__setattr__', <method-wrapper '__setattr__' of MyClass object at 0x7f98093b9438>), ('__sizeof__', <built-in method __sizeof__ of MyClass object at 0x7f98093b9438>), ('__str__', <method-wrapper '__str__' of MyClass object at 0x7f98093b9438>), ('__subclasshook__', <built-in method __subclasshook__ of type object at 0x7f97545f7d48>), ('__weakref__', None), ('classmethod', <bound method MyClass.classmethod of <class '__main__.MyClass'>>), ('method', <bound method MyClass.method of <__main__.MyClass object at 0x7f98093b9438>>), ('staticmethod', <function MyClass.staticmethod at 0x7f97f9ffb6a8>)]

This confirms that method (the instance method) has access to the object instance (printed as <MyClass instance>) via the self argument.

When the method is called, Python replaces the self argument with the instance object, obj. We could ignore the syntactic sugar of the dot-call syntax (obj.method()) and pass the instance object manually to get the same result:

# Instead of ... 
# obj = myClass()
# obj.method()

# ... we could have just used:
MyClass.method(obj)
## ('instance method called, self = ', <__main__.MyClass object at 0x7f98093b9438>)

Exercise 13.1 (Calling a method without an instance) Exercise: Can you guess what would happen if you tried to call the method without first creating an instance?

objNew.method()
MyClass.method(objNew)

Instance methods can also access the class itself through the self.__class__ attribute. This makes instance methods powerful in terms of access restrictions – they can modify state on the object instance and on the class itself.

13.4.3 Class Method

obj.classmethod()
## ('class method called cls = ', <class '__main__.MyClass'>)

Calling classmethod() showed us it doesn’t have access to the <MyClass instance> object, but only to the <class MyClass> object, representing the class itself. Remember, everything in Python is an object, even classes themselves.

Notice how Python automatically passes the class as the first argument to the function when we call MyClass.classmethod(). Calling a method in Python through the . notation triggers this behavior. The self parameter on instance methods works the same way.

Recall that naming these parameters self and cls is just a convention. You could just as easily name them the_object and the_class and get the same result. All that matters is that they’re positioned first in the parameter list for the method.

13.4.4 Static Method

obj.staticmethod()
## 'A static method was called, there are no parameters'

Did you see how we called staticmethod() on the object and were able to do so successfully? Some developers are surprised when they learn that it’s possible to call a static method on an object instance.

Behind the scenes Python simply enforces the access restrictions by not passing in the self or the cls argument when a static method gets called using the dot syntax.

This confirms that static methods can neither access the object instance state nor the class state. They work like regular functions but belong to the class’s (and every instance’s) namespace.

So let’s take a look at what happens when we attempt to call these methods on the class itself - without creating an object instance beforehand:

MyClass.method()
MyClass.classmethod()
## ('class method called cls = ', <class '__main__.MyClass'>)
MyClass.staticmethod()
## 'A static method was called, there are no parameters'

You can call classmethod() and staticmethod(), but calling the instance method method() fails with a TypeError.

This is to be expected. We didn’t create an object instance and tried calling an instance function directly on the class blueprint itself. This means there is no way for Python to populate the self argument and therefore the call fails.

Let’s look at some examples of when to use these special method types.

13.5 A bare-bones example

OK, so how does OOP make our lives easier? That is, Why bother? Let’s take a look at a basic example for using instance and class methods, we’re going to leave static methods out of the picture, since we understand that they are just like regular functions that belong to a specific class’s namespace.

# Pre python 3.6
class Cake:
    def __init__(self, ingredients):
        self.ingredients = ingredients

    def __repr__(self):
        return 'Cake(%r)' % self.ingredients
        # return f'Pizza({self.ingredients!r})'

With classmethod


Cake(['flour', 'sugar', 'eggs'])
## Cake(['flour', 'sugar', 'eggs'])
Cake(['flour'*4, 'sugar', 'eggs'])
## Cake(['flourflourflourflour', 'sugar', 'eggs'])
Cake(['cornmeal', 'honey', 'hazelnuts'])
## Cake(['cornmeal', 'honey', 'hazelnuts'])

Give the users of our Pizza class a better interface for creating the pizza objects they crave.

Do that by using class methods as factory functions for the different kinds of pizzas we can create:


class Cake:
    def __init__(self, ingredients):
        self.ingredients = ingredients

    def __repr__(self):
        return f'Cake({self.ingredients!r})'

    @classmethod
    def chocolate(cls):
        return cls(['chocolate', 'flour', 'sugar'])

    @classmethod
    def cornmeal(cls):
        return cls(['cornmeal', 'honey', 'hazelnuts'])
        

Note how I’m using the cls argument in the chocolate and cornmeal class methods instead of calling the Cake constructor directly. Here, the class methods are factor methods, the image is like a factory, they just keep generating more methods.

This prevents repetition of code. If we decide to rename this class at some point we won’t have to remember updating the constructor name in all of the classmethod factory functions.

What can we do with these factory methods? Let’s try them out:

Cake.chocolate()
## Cake(['chocolate', 'flour', 'sugar'])
Cake(['chocolate', 'flour'])
## Cake(['chocolate', 'flour'])

We can use the factory functions to create new Cake objects that are configured the way we want them. They all use the same __init__ constructor internally and simply provide a shortcut for remembering all of the various ingredients.

Another way to look at this use of class methods is that they allow you to define alternative constructors for your classes. Python only allows one __init__ method per class. Using class methods it’s possible to add as many alternative constructors as necessary. This can make the interface for your classes self-documenting (to a certain degree) and simplify their usage.

Let’s break this down a bit further. When I call the .chocolate() method, it automatially calls the class method .put_together()

class Cake:

    @classmethod
    def put_together(cls, data):
        return data*2

    @classmethod
    def chocolate(cls):
        return cls.put_together(['chocolate', 'flour', 'sugar'])


Cake.chocolate()
## ['chocolate', 'flour', 'sugar', 'chocolate', 'flour', 'sugar']

I can add another class method that will compile the whole program together for me, instead of having to call the .chocolate() method myself. That’s the run() method here:

class Cake:

    @classmethod
    def put_together(cls, data):
        return data*2

    @classmethod
    def chocolate(cls):
        return cls.put_together(['chocolate', 'flour', 'sugar'])

    @classmethod
    def run(cls):
        result = cls.chocolate()
        print(f"the ingredients are {result}")

Cake.run()
## the ingredients are ['chocolate', 'flour', 'sugar', 'chocolate', 'flour', 'sugar']

13.6 The Journey so far

  • Instance methods need a class instance and can access the instance through self.
  • Class methods don’t need a class instance. They can’t access the instance (self) but they have access to the class itself via cls.
  • Static methods don’t have access to cls or self. They work like regular functions but belong to the class’s namespace.
  • Static and class methods communicate and (to a certain degree) enforce developer intent about class design. This can have maintenance benefits.