Office hours from 10-12, or by appointment
Glad to see a few people stop by today!
I know that many of you are in the Math Lecture (and likely also the LaTex lecture). If you want to meet via zoom at some other time, email me.
Optional readings are posted on the syllabus
Practice is key!
Monday: Intro and Coding Basics
Tuesday: Basics Continued and Control Flow
Today: Object Oriented Programming and Functions
Thursday: Data Analysis and APIs
Friday: Web Scraping and Text-as-Data
Write a program that checks whether a number, input by the user, is even or not.
Update your program to check which numbers from 1 to 10 are even. For each number, also indicate whether it is divisible by 3.
For each number, your output should clearly distinguish among four cases:
for number in range(1, 11):
if number % 2 == 0:
print(f"{number} is even!")
if number % 3 == 0: # Nest this if inside the first if
print(f"{number} is also divisible by 3!")
elif number % 3 == 0: # note the indentation here!
print(f"{number} is only divisible by 3!")
else: # note the indentation here!
print(f"{number} is not divisible by 2 or 3!")1 is not divisible by 2 or 3!
2 is even!
3 is only divisible by 3!
4 is even!
5 is not divisible by 2 or 3!
6 is even!
6 is also divisible by 3!
7 is not divisible by 2 or 3!
8 is even!
9 is only divisible by 3!
10 is even!
R and Python have different approaches to dealing with nesting and conditionality.
If I wanted to write a for loop in R, it would look like this
#| eval: false
for (number in 1:9) {
if (number %% 2 == 0) {
cat(number, "is even!\n")
if (number %% 3 == 0) { # Nested inside the first if
cat(number, "is also divisible by 3!\n")
}
} else if (number %% 3 == 0) { # else-if at the same level as the first if
cat(number, "is only divisible by 3!\n")
} else { # else at the same level as the first if
cat(number, "is not divisible by 2 or 3!\n")
}
}I think the Python version is more readable!
Branching with if, elif and else
While and For loops for iteration
Lists
Structure
Methods
List methods were updating the list in memory.
This is because lists are mutable objects, so we can change them with methods.
String methods (and methods applied to other immutables) do not update the object in memory — instead they return a new object. Let’s take a look in Colab.
Yesterday we wrote a loop that lowercased a list of newspaper titles and then extended it with a list of TV news networks:
papers = ["New York Times", "Washington Post", "Boston Globe", "Philadelphia Inquirer", "Atlanta Journal-Constitution"]
for i in range(len(papers)):
papers[i] = papers[i].lower() # note that `for paper in papers` doesn't update papers[i]
stations = ["Fox News", "CNN", "MSNBC", "Newsmax"]
papers.extend(stations)
papers['new york times',
'washington post',
'boston globe',
'philadelphia inquirer',
'atlanta journal-constitution',
'Fox News',
'CNN',
'MSNBC',
'Newsmax']
This works, but the range(len(papers)) pattern is clunky. Python gives us a cleaner way.
enumerate is a built-in function that takes an iterable and returns index-value pairs.
0: New York Times
1: Washington Post
and so on…
Take a moment to think about how we might be able to use enumerate to simplify our loop
If we have a string of multiple words (i.e., a sentence, paragraph, speech), string.split() will break it by any delimiter. The default is whitespace " ".
We can specify whatever delimiter we want — let’s try in Colab.
Create the following sentence
Write a for loop (or while loop, if you want) that prints each word, with its index, one by one.
Hint: use .split() to get each word
Output should be something like:
Word 0: I
Word 1: truly
etc…
Dictionaries are objects in Python that consist of key:value pairs
For example, suppose we wanted to create a dictionary of employees and their salaries
keys
must be unique (i.e., can’t have Jane twice)
must be immutable type (int, string, float, bool, tuple)
values
any type
Mike and Diego can have the same salary
can be lists, other dictionaries, etc.
In Python 3.7+, dictionaries maintain insertion order (sets still don’t)
Create a Dictionary using political parties as keys, and politicians who are members of that party as values. For example, you could have a key value pair of Democrat: Kamala Harris or Labour: Keir Starmer.
.keys() will give you the keys
.values() will give you the values
.update() will update key:value pairs
salaries = {"Jane":100,
"Mike": 80,
"Ali": 85,
"Diego": 80}
salaries["Mike"] ## return Mike's salary
salaries["Mike"] = 90 ## Mike got a raise! Update his salary
salaries["Allison"] = 105 ## New Hire! Add a new Key:Value pair
print(salaries){'Jane': 100, 'Mike': 90, 'Ali': 85, 'Diego': 80, 'Allison': 105}
We can iterate over dictionaries. Consider the following dictionary, and write a for loop to print out each student’s grade.
Hint: grades[person] (or whatever index) will return that person’s grade.
Create a dictionary using any method of your choosing
Find the value(s) associated with the key in the second position
print all the keys
Create a second dictionary from a list
Add that dictionary to the original
Iterate over the dictionary to print “The key is [key] and the value is [value]” for each entry
If we have yet to take our first break, we should take a break now!
elif it’s been over an hour since our last break, this would also be a good time to take a break!
else let’s continue on with class!
Reusable pieces of code
functions are not run until they are called/invoked somewhere.
returns somethingdef is the keyword used to define the function
: triggers the body of the function
same indentation rules
name of the function comes after def
() contains the parameters or arguments of the function
the docstring, enclosed in “““, provides info on how to use the function
the docstring can be called with help()
Scope is the mapping of objects to environments
Usually, we are in global scope
when we enter a function, a new scope/frame/environment is created
unless we call something from the global environment, all our variables are within the local environment of the function call
Once the function exits, any intermediate values will be discarded
What will this print?
What will this print?
return can only be used inside of a function
functions can have multiple returns
only one of them will be used each time a function is invoked
Once return is hit, the function’s scope is exited and nothing else in the function is run
As soon as we hit return, the function will exit. Let’s try this out in colab.
Write a function to check whether or not a number is divisible by 6. Test that function on the following numbers; 17, 64, 108, 157, 200.
Then, add an additional condition to check if the number is also divisible by 9.
Run this function on some quote you find (or make up). What does it return? How might it be useful?
What would you include as a docstring?
Imagine we had a list of employees and salaries from some company, and we wanted to extract a new dictionary of high earners (specifically, those earning more than the mean salary at the company).
We write a function to do this. Then, test it out with a dictionary you create
If we have yet to take our second break, we should take a break now!
else let’s continue on with class!
Python modules are files (.py) that (mainly) contain function definitions
they allow us to organize, distribute code; to share and reuse others’ code too
keep code coherent and self-contained
one can import modules or some functions from modules
The standard library already contains a bunch of useful modules. For example, we can load the math module.
Instead of writing our own power function
We can import a module that has already defined this function
There are tons of useful modules in the standard library
Today's date is: 2026-06-05
Here, I don’t necessarily want to import the whole datetime module, so I can instead just import date.
The standard modules come with your Python install. Colab also has many common libraries installed. Tomorrow we will cover how to install other libraries!
Shorthand code that can replace while/for loops and if/elif/else statements
Can be used for lists, sets and dictionaries
Can make code shorter and easier to read
In general, a comprehension will look something like this
As a loop, it might look like this
Simple example - create a copy of a list
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
new_list = []
for number in numbers:
new_list.append(number)
print(new_list)[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Simpler with a list comprehension
Square every item in a list, creating a new list of squared numbers
numbers = [2, 3, 4, 5]
new_list = []
for number in numbers:
new_list.append(number**2)
print(new_list)[4, 9, 16, 25]
With a list comprehension
Example - Square only even numbers
some_squared = []
for num in numbers:
if num % 2 == 0:
some_squared.append(num**2)
print(some_squared)[4, 16]
With a list comprehension
Write your own list comprehension for a list of strings that creates a new list of only long words (words of more than 5 characters).
zip() will combine two lists by index, which can be useful if we have lists that match by index. For example, consider
countries = ["USA", "Canada", "Mexico"]
capitals = ["Washington, DC.", "Ottawa", "Mexico City"]
merged = zip(countries, capitals)
print(merged)<zip object at 0xffcd5847be80>
So - this is kind of weird. It returns a zip object - how can we make this useful?
Note, pandas has nicer options for joining that aren’t just by index, so we aren’t limited to having indexes match!
We might wish to combine two dictionaries
Extract just the USA key:value pair (Get the capital of the USA)
{'USA': 'Washington, DC.'}
Take one of our loops from yesterday (or come up with your own), and re-implement it using a comprehension
NumPy is short for numerical Python. It’s a foundational package for data analysis in Python, and many packages depend on numpy arrays as a data type.
Many later libraries, like pandas, are built on the functions and data structures of NumPy. Machine learning frameworks like tensorflow also build on this infrastructure.
The most common and useful data structure in NumPy is the array. Arrays are structured objects in Python that contain data of all the same type.
numpy is not a built-in feature of Python, so we need to load it
by convention we load numpy as np
numpy Arrays are useful because they are very computationally efficient ways of storing multi-dimensional data. On average, between 10x to 100x faster than other Python approaches.
We can check the shape of data stored in an array
Or the type of data stored in the array
We can make an array of zeros with arbitrarily many dimensions like so
Or an ordered array from 0 to 19 like this
Like with lists, we can slice and index arrays. Let’s start with a 1D array. As with other data types, arrays are zero indexed in Python
We can update the values of arrays
You do need to be careful with slicing arrays. Even if you save a slice of an array to a new object, NumPy still recognizes that slice as part of the original array. And if you change the values in that new object, it will also change the original object in memory.
We can create arrays of arbitrarily many dimensions, although if things are getting very complex we might want to think about other ways to store our data
array([[[ 1, 2, 3],
[ 4, 5, 6]],
[[ 7, 8, 9],
[10, 11, 12]]])
Pay close attention to the placement of the brackets. Easy to mess this up!
Tomorrow — Data analysis with pandas, plus accessing APIs
Questions: come to office hours (10 AM – 12 PM daily), or email me
Recommended reading: McKinney, Python for Data Analysis, Ch. 3–7
Slides will be posted after class on Canvas and at will-horne.github.io/icpsr-2026