Chapter 13 Object-oriented Programming
Now that we’ve seen a few case studies in full, let’s move onto
13.1 What is OOP
pandas DataFrames are built on top of NumPy arrays which are built on top of Python objects.
OOP allows for a way to build flexible and reproducible code. We build building blocks that can be assembled into more advanced modules and libraries.
What we often do is imperative programming, where we write functions and variables as we go along, e.g.
= [1,2,3]
myList
for item in myList:
print(f"Item {item}")
## Item 1
## Item 2
## Item 3
Now if this list is part of another function, or if we need to call it with new values over and over, then it’s object-oriented programming (OOP) to the rescue! Let’s take a look:
# Create a new class:
class PrintList:
def __init__(self, numberlist):
self.numberlist = numberlist
def print_list(self):
for item in self.numberlist:
print(f"Item {item}")
# Instantiate it (i.e. create an instance)
= PrintList([1,2,3])
A
# Call a method of the instance
A.print_list()
## Item 1
## Item 2
## Item 3
Everything in python progressively builds on complexity. If we begin with a single number, we can group several numbers into a list, a list of list becomes a NumPy array and the array becomes a DataFrame of Series.
# A variable is a python object
= 4
numb
# A list is an object
= [1,2,4]
list_on_numbs = [[1,2,4], [6,7,9]]
number_array
# A df is built form a numpy array
= pd.DataFrame([[1,2,4], [6,7,9]])
df print(df)
## 0 1 2
## 0 1 2 4
## 1 6 7 9
13.2 Using classes
If you have a chunk of code that has both functions and variables that you want to reference, Classes simplify your work. A class is a reusable chunk of code that has functions (here called methods) and variables.
Let’s clear up some terminology differences between imperative and OOP programming:
Imperative | OOP |
---|---|
Variable | Attribute/Field (or class variables) |
Function | Method |
We’ve been discussing methods already throughout the workshop, and here we see what we really mean. A method is just a function associated with a specific class. How does this connect to an object?
A class is like a cookie cutter. Once you make a class you can use it over and over to create different objects from the parent class. A class is a template for an object, just like a cookie cutter is the template for all variety of different cookies, chocolate chip or peanut-butter oatmeal cookies are the specific objects.
Previously, we used the def
keyword to define a function:
# For functions:
def greet(name, salutation):
return "Hello, " + salutation + " " + name
print(f"{greet(name, salutation)}")
## Hello, Mr. Berlin
Defining a class is similar, but instead of def
we use the class
keyword:
# for a class (parentheses are optional in py3)
class Greet():
pass
Here, pass
means we don’t put any context or values in the class – yet. At the moment Python will skip over it and continue with the rest of the script.
Next, we create an instance of the class, Greet
as an object:
= Greet() hello
Let’s see a complete class. This is an example from the DataCamp course on OOP in Python taught by Vicki Boykis. Here she’s building a class which will read in a file and convert it as a DataFrame-like object.
class TestClass:
"""This is my new class"""
def __init__(self, filename):
self.filename = filename
def create_datashell(self):
self.array = np.genfromtxt(self.filename, delimiter=',', dtype=None)
return self.array
def rename_column(self, old_colname, new_colname):
for index, value in enumerate(self.array[0]):
if value == old_colname.encode('UTF-8'):
self.array[0][index] = new_colname
return self.array
def show_shell(self):
print(self.array)
def five_figure_summary(self, col_pos):
= stats.describe(self.array[1:col_pos].astype(np.float))
statistics return f"Five-figure stats of column {col_position}: {statistics}"
As you can see, we have a DOCSTRING below the definition of our class that is denoted by """..."""
. That’s again similar to what we saw with functions. The rest of the class contains attributes (class variables) and methods (i.e. class variables). You see lots of underscores in python variable names. We skirted around this issue so far, but let’s take a brief look:
Naming | Meani | Meaning |
---|---|
_var | A con |
A convention used to show that a variable is meant for internal use within a function or method |
var_ |
A convention used to avoid naming conflicts with Python keywords |
__var | Trig |
Triggers name mangling when used in a class context to prevent inheritance collisions. Enforced by the Python interpreter |
__var__ |
Special methods defined by the Python language. Avoid this naming scheme for your own attributes | |
_ | |
Naming a temporary or insignificant variable, e.g. in a for loop |
We won’t go into all the examples here, but the one of most interest is the “double underscore” variables, __var__
, called a dunder variable.
The dunder variable in our class definition is the class constructor, i.e. the __init__
method.
So now we can see the three main features of a class in Python:
- Constructors, i.e. the init
- Attributes (class variables)
- Methods (class functions)
The __init__
method is the constructor for the class. This special dunder method initializes the class. Within this method is the filename variable passed in as a parameter. this is an attribute (class variable) and is initialized when we create the class, hence init.
The methods are defined using a def
keyword, just like what we’d do to define a function.
13.2.1 __init__ializing a class
A class has attributes (class variables) and methods (class functions). Each method takes self as a parameter and we begin with a special method __init__
, which is also known as a constructor. It sets up the object the way we want from the very beginning before we pass anything in. It’s called automatically when an object is created and thus takes values outside of the object or sets values within an object. Let’s return to our Greet
class:
# for a class (parentheses are optional in py3)
class Greet():
def __init__(self, name):
self.name = name
13.2.2 The self parameter
self represents the instance of the class, or the specific object, within the constructor. Recall that an object is an instance of a class. That object needs a way to reference that instance. The first variable is always a reference to the current instance of the class. self is just a typical shorthand for this variable, it’s not a keyword, although it’s used like one.
Let’s make a class with some instance variables:
# Create class: DataShell
class DataShell:
# Initialize class with self and integerInput arguments
def __init__(self, integerInput):
# Set data as instance variable, and assign the value of integerInput
self.data = integerInput
# Declare variable x with value of 10
= 10
x
# Instantiate DataShell passing x as argument: my_data_shell
= DataShell(x)
my_data_shell
# Print my_data_shell
print(my_data_shell.data)
## 10
Of course, we can have multiple instance variables
# Create class: DataShell
class DataShell:
# Initialize class with self, identifier and data arguments
def __init__(self, identifier, data):
# Set identifier and data as instance variables, assigning value of input arguments
self.identifier = identifier
self.data = data
# Declare variable x with value of 100, and y with list of integers from 1 to 5
= 100
x = [1, 2, 3, 4, 5]
y
# Instantiate DataShell passing x and y as arguments: my_data_shell
= DataShell(x, y)
my_data_shell
# Print my_data_shell.identifier
print(my_data_shell.identifier)
# Print my_data_shell.data
## 100
print(my_data_shell.data)
## [1, 2, 3, 4, 5]
13.2.3 Methods: Functions within classes
Methods are the functions within a class and look like regular functions in imperative mode:
# Create class: DataShell
class DataShell:
# Initialize class with self and dataList as arguments
def __init__(self, dataList):
# Set data as instance variable, and assign it the value of dataList
self.data = dataList
# Define method that returns data: show
def show(self):
return self.data
# Define method that prints average of data: avg
def avg(self):
# Declare avg and assign it the average of data
= sum(self.data)/float(len(self.data))
avg # Return avg
return avg
# Instantiate DataShell taking integer_list as argument: my_data_shell
= [1, 2, 3, 4, 5]
integer_list = DataShell(integer_list)
my_data_shell
# Print output of your object's show method
print(my_data_shell.show())
# Print output of your object's avg method
## [1, 2, 3, 4, 5]
print(my_data_shell.avg())
## 3.0
Notice that we use the object name my_data_shell
and then the .
notation to access a method, e.g. avg()
. We saw a lot of that earlier on in the workshop.
13.3 The three method types
There are three different method types:
- Instance methods
- Class methods, and,
- Static methods
Here is a class that contains all three method types:
class MyClass:
"""This is my new class"""
def method(self):
return 'instance method called, self = ', self
@classmethod
def classmethod(cls):
return 'class method called cls = ', cls
@staticmethod
def staticmethod():
return 'A static method was called, there are no parameters'
Let’s take a closer look:
13.3.1 Instance Methods
method
, is a regular instance method, like what we’ve seen so far. It takes one parameter, self
(although as we saw above, it can accept more), which points to an instance of MyClass
when the method is called.
Through the self parameter, an instance method can freely access attributes and other methods on the same object, which means they can modify an object instance state.
Not only can they modify object state, instance methods can also access the class itself through the self.__class__
attribute. This means instance methods can also modify class state.
13.3.2 Class Methods
classmethod
is a class method
. It is marked with a @classmethod
decorator to flag it as such.
Class methods take a cls
parameter that points to the class -— and not the object instance -— when the method is called.
Because the class method only has access to this cls
argument, it can’t modify object instance state, but it can still modify class state that applies across all instances of the class.
13.3.3 Static Methods
staticmethod
is a static method
. It is marked with a @staticmethod
decorator to flag it as such.
This type of method takes neither a self
nor a cls
parameter (but of course it’s free to accept an arbitrary number of other parameters).
Thus, a static method can neither modify object state nor class state. Static methods are restricted in what data they can access - and they’re primarily a way to namespace your methods.
In summary:
Method type | Decorator | Modify object state | Modify class state |
---|---|---|---|
Instance | None | Yes | Yes |
Class | @classmethod |
No | Yes |
Static | @staticmethod |
No | No |
13.4 An example
13.4.1 Instance Methods
We’ll start by creating an instance of the class and then calling the three different methods on it. Each method’s implementation returns a tuple containing information for us to trace what’s going on -— and which parts of the class or object the method can access.
= MyClass()
obj
# The type
type(obj)
## <class '__main__.MyClass'>
type(obj).__name__
# The class
## 'MyClass'
__name__ obj.__class__.
## 'MyClass'
13.4.2 Instance Method
obj.method()
## ('instance method called, self = ', <__main__.MyClass object at 0x7f98093b9438>)
dir(obj)
## ['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'classmethod', 'method', 'staticmethod']
import inspect
inspect.getmembers(obj)
## [('__class__', <class '__main__.MyClass'>), ('__delattr__', <method-wrapper '__delattr__' of MyClass object at 0x7f98093b9438>), ('__dict__', {}), ('__dir__', <built-in method __dir__ of MyClass object at 0x7f98093b9438>), ('__doc__', 'This is my new class'), ('__eq__', <method-wrapper '__eq__' of MyClass object at 0x7f98093b9438>), ('__format__', <built-in method __format__ of MyClass object at 0x7f98093b9438>), ('__ge__', <method-wrapper '__ge__' of MyClass object at 0x7f98093b9438>), ('__getattribute__', <method-wrapper '__getattribute__' of MyClass object at 0x7f98093b9438>), ('__gt__', <method-wrapper '__gt__' of MyClass object at 0x7f98093b9438>), ('__hash__', <method-wrapper '__hash__' of MyClass object at 0x7f98093b9438>), ('__init__', <method-wrapper '__init__' of MyClass object at 0x7f98093b9438>), ('__init_subclass__', <built-in method __init_subclass__ of type object at 0x7f97545f7d48>), ('__le__', <method-wrapper '__le__' of MyClass object at 0x7f98093b9438>), ('__lt__', <method-wrapper '__lt__' of MyClass object at 0x7f98093b9438>), ('__module__', '__main__'), ('__ne__', <method-wrapper '__ne__' of MyClass object at 0x7f98093b9438>), ('__new__', <built-in method __new__ of type object at 0x111dc90b8>), ('__reduce__', <built-in method __reduce__ of MyClass object at 0x7f98093b9438>), ('__reduce_ex__', <built-in method __reduce_ex__ of MyClass object at 0x7f98093b9438>), ('__repr__', <method-wrapper '__repr__' of MyClass object at 0x7f98093b9438>), ('__setattr__', <method-wrapper '__setattr__' of MyClass object at 0x7f98093b9438>), ('__sizeof__', <built-in method __sizeof__ of MyClass object at 0x7f98093b9438>), ('__str__', <method-wrapper '__str__' of MyClass object at 0x7f98093b9438>), ('__subclasshook__', <built-in method __subclasshook__ of type object at 0x7f97545f7d48>), ('__weakref__', None), ('classmethod', <bound method MyClass.classmethod of <class '__main__.MyClass'>>), ('method', <bound method MyClass.method of <__main__.MyClass object at 0x7f98093b9438>>), ('staticmethod', <function MyClass.staticmethod at 0x7f97f9ffb6a8>)]
This confirms that method
(the instance method) has access to the object instance (printed as <MyClass instance>
) via the self
argument.
When the method is called, Python replaces the self
argument with the instance object, obj
. We could ignore the syntactic sugar of the dot-call syntax (obj.method()
) and pass the instance object manually to get the same result:
# Instead of ...
# obj = myClass()
# obj.method()
# ... we could have just used:
MyClass.method(obj)
## ('instance method called, self = ', <__main__.MyClass object at 0x7f98093b9438>)
Exercise 13.1 (Calling a method without an instance) Exercise: Can you guess what would happen if you tried to call the method without first creating an instance?
objNew.method() MyClass.method(objNew)
Instance methods can also access the class
itself through the self.__class__
attribute. This makes instance methods powerful in terms of access restrictions – they can modify state on the object instance and on the class itself.
13.4.3 Class Method
classmethod() obj.
## ('class method called cls = ', <class '__main__.MyClass'>)
Calling classmethod()
showed us it doesn’t have access to the <MyClass instance>
object, but only to the <class MyClass>
object, representing the class itself. Remember, everything in Python is an object, even classes themselves.
Notice how Python automatically passes the class as the first argument to the function when we call MyClass.classmethod()
. Calling a method in Python through the .
notation triggers this behavior. The self
parameter on instance methods works the same way.
Recall that naming these parameters self and cls is just a convention. You could just as easily name them the_object and the_class and get the same result. All that matters is that they’re positioned first in the parameter list for the method.
13.4.4 Static Method
staticmethod() obj.
## 'A static method was called, there are no parameters'
Did you see how we called staticmethod()
on the object and were able to do so successfully? Some developers are surprised when they learn that it’s possible to call a static method on an object instance.
Behind the scenes Python simply enforces the access restrictions by not passing in the self or the cls argument when a static method gets called using the dot syntax.
This confirms that static methods can neither access the object instance state nor the class state. They work like regular functions but belong to the class’s (and every instance’s) namespace.
So let’s take a look at what happens when we attempt to call these methods on the class itself - without creating an object instance beforehand:
MyClass.method()
classmethod() MyClass.
## ('class method called cls = ', <class '__main__.MyClass'>)
staticmethod() MyClass.
## 'A static method was called, there are no parameters'
You can call classmethod()
and staticmethod()
, but calling the instance method method()
fails with a TypeError
.
This is to be expected. We didn’t create an object instance and tried calling an instance function directly on the class blueprint itself. This means there is no way for Python to populate the self
argument and therefore the call fails.
Let’s look at some examples of when to use these special method types.
13.5 A bare-bones example
OK, so how does OOP make our lives easier? That is, Why bother? Let’s take a look at a basic example for using instance and class methods, we’re going to leave static methods out of the picture, since we understand that they are just like regular functions that belong to a specific class’s namespace.
# Pre python 3.6
class Cake:
def __init__(self, ingredients):
self.ingredients = ingredients
def __repr__(self):
return 'Cake(%r)' % self.ingredients
# return f'Pizza({self.ingredients!r})'
With classmethod
'flour', 'sugar', 'eggs']) Cake([
## Cake(['flour', 'sugar', 'eggs'])
'flour'*4, 'sugar', 'eggs']) Cake([
## Cake(['flourflourflourflour', 'sugar', 'eggs'])
'cornmeal', 'honey', 'hazelnuts']) Cake([
## Cake(['cornmeal', 'honey', 'hazelnuts'])
Give the users of our Pizza class a better interface for creating the pizza objects they crave.
Do that by using class methods as factory functions for the different kinds of pizzas we can create:
class Cake:
def __init__(self, ingredients):
self.ingredients = ingredients
def __repr__(self):
return f'Cake({self.ingredients!r})'
@classmethod
def chocolate(cls):
return cls(['chocolate', 'flour', 'sugar'])
@classmethod
def cornmeal(cls):
return cls(['cornmeal', 'honey', 'hazelnuts'])
Note how I’m using the cls
argument in the chocolate
and cornmeal
class methods instead of calling the Cake
constructor directly. Here, the class methods are factor methods, the image is like a factory, they just keep generating more methods.
This prevents repetition of code. If we decide to rename this class at some point we won’t have to remember updating the constructor name in all of the classmethod
factory functions.
What can we do with these factory methods? Let’s try them out:
Cake.chocolate()
## Cake(['chocolate', 'flour', 'sugar'])
'chocolate', 'flour']) Cake([
## Cake(['chocolate', 'flour'])
We can use the factory functions to create new Cake
objects that are configured the way we want them. They all use the same __init__
constructor internally and simply provide a shortcut for remembering all of the various ingredients.
Another way to look at this use of class methods is that they allow you to define alternative constructors for your classes. Python only allows one __init__
method per class. Using class methods it’s possible to add as many alternative constructors as necessary. This can make the interface for your classes self-documenting (to a certain degree) and simplify their usage.
Let’s break this down a bit further. When I call the .chocolate()
method, it automatially calls the class method .put_together()
class Cake:
@classmethod
def put_together(cls, data):
return data*2
@classmethod
def chocolate(cls):
return cls.put_together(['chocolate', 'flour', 'sugar'])
Cake.chocolate()
## ['chocolate', 'flour', 'sugar', 'chocolate', 'flour', 'sugar']
I can add another class method that will compile the whole program together for me, instead of having to call the .chocolate()
method myself. That’s the run()
method here:
class Cake:
@classmethod
def put_together(cls, data):
return data*2
@classmethod
def chocolate(cls):
return cls.put_together(['chocolate', 'flour', 'sugar'])
@classmethod
def run(cls):
= cls.chocolate()
result print(f"the ingredients are {result}")
Cake.run()
## the ingredients are ['chocolate', 'flour', 'sugar', 'chocolate', 'flour', 'sugar']
13.6 The Journey so far
-
Instance methods need a class instance and can access the instance through
self
. -
Class methods don’t need a class instance. They can’t access the instance (
self
) but they have access to the class itself viacls
. -
Static methods don’t have access to
cls
orself
. They work like regular functions but belong to the class’s namespace. - Static and class methods communicate and (to a certain degree) enforce developer intent about class design. This can have maintenance benefits.