Intro to Python Day 3: Functions, Modules, and NumPy

Will Horne

Reminders

Office hours from 10-12, or by appointment

Glad to see a few people stop by today!

I know that many of you are in the Math Lecture (and likely also the LaTeX lecture). If you want to meet via zoom at some other time, email me.

Optional readings are posted on the syllabus

Practice is key!

Course Outline

Monday: Intro and Coding Basics

Tuesday: Basics Continued and Control Flow

Today: Functions, Modules, and NumPy

Thursday: Data Analysis and APIs

Friday: Web Scraping and Text-as-Data

Review

Write a program that checks whether a number, input by the user, is even or not.

An Answer

if number % 2 == 0:
    print(f"{number} is even!")
else:
    print(f"{number} is odd!")

More Review

Update your program to check which numbers from 1 to 10 are even. For each number, also indicate whether it is divisible by 3.

For each number, your output should clearly distinguish among four cases:

divisible by 2 only
divisible by both 2 and 3
divisible by 3 only
divisible by neither

An Answer

for number in range(1, 11):
    if number % 2 == 0:
        print(f"{number} is even!")
        if number % 3 == 0:  # Nest this if inside the first if
            print(f"{number} is also divisible by 3!")
    elif number % 3 == 0:    # note the indentation here!
        print(f"{number} is only divisible by 3!")
    else:                    # note the indentation here!
        print(f"{number} is not divisible by 2 or 3!")

1 is not divisible by 2 or 3!
2 is even!
3 is only divisible by 3!
4 is even!
5 is not divisible by 2 or 3!
6 is even!
6 is also divisible by 3!
7 is not divisible by 2 or 3!
8 is even!
9 is only divisible by 3!
10 is even!

An indentation tip for R users

R and Python have different approaches to dealing with nesting and conditionality.

If I wanted to write a for loop in R, it would look like this

#| eval: false
for (number in 1:9) {
  if (number %% 2 == 0) {
    cat(number, "is even!\n")
    if (number %% 3 == 0) {  # Nested inside the first if
      cat(number, "is also divisible by 3!\n")
    }
  } else if (number %% 3 == 0) {  # else-if at the same level as the first if
    cat(number, "is only divisible by 3!\n")
  } else {  # else at the same level as the first if
    cat(number, "is not divisible by 2 or 3!\n")
  }
}

I think the Python version is more readable!

Key concepts from Last Time

Branching with if, elif and else
While and For loops for iteration
Lists
- Structure
- Methods

Strings are Immutable Objects

List methods were updating the list in memory.

This is because lists are mutable objects, so we can change them with methods.

String methods (and methods applied to other immutables) do not update the object in memory — instead they return a new object. Let’s take a look in Colab.

Enumerate

enumerate is a built-in function that takes an iterable and returns index-value pairs.

for idx, name in enumerate(["Alice", "Bob", "Charlie"]):
    print(idx, name)

0 Alice
1 Bob
2 Charlie

Use enumerate when your loop body needs both the index and the value at the same time. If you only need one, a simpler pattern will do.

When Enumerate is Useful

Print numbered rankings of presidents:

presidents = ["Washington", "Adams", "Jefferson", "Madison"]

for i, president in enumerate(presidents, start=1):
    print(f"#{i}: {president}")

#1: Washington
#2: Adams
#3: Jefferson
#4: Madison

Without enumerate you’d write range(len()) and index in with presidents[i] — messier, more repetition. Or track your own counter with count = 0 and count += 1 — clunky.

start=1 shifts the numbering so it starts at 1 instead of 0.

A relevant string method

If we have a string of multiple words (i.e., a sentence, paragraph, speech), string.split() will break it by any delimiter. The default is whitespace " ".

We can specify whatever delimiter we want — let’s try in Colab.

Another Enumerate Example

Create the following sentence

sentence = "I truly love learning Python!"

Write a for loop (or while loop, if you want) that prints each word, with its index, one by one.

Hint: use .split() to get each word

Output should be something like:

Word 0: I

Word 1: truly

etc…

Dictionaries

Dictionaries are objects in Python that consist of key:value pairs

For example, suppose we wanted to create a dictionary of employees and their salaries

salaries = {"Jane":100,
"Mike": 80,
"Ali": 85,
"Diego": 80}

print(salaries)

{'Jane': 100, 'Mike': 80, 'Ali': 85, 'Diego': 80}

Keys and Values

keys
- must be unique (i.e., can’t have Jane twice)
- must be immutable type (int, string, float, bool, tuple)
values
- any type
- Mike and Diego can have the same salary
- can be lists, other dictionaries, etc.
In Python 3.7+, dictionaries maintain insertion order (sets still don’t)

Using Dictionaries

salaries = {"Jane":100,
"Mike": 80,
"Ali": 85,
"Diego": 80}

salaries["Mike"] ## return Mike's salary
salaries["Mike"] = 90 ## Mike got a raise! Update his salary
salaries["Allison"] = 105 ## New Hire! Add a new Key:Value pair

print(salaries)

{'Jane': 100, 'Mike': 90, 'Ali': 85, 'Diego': 80, 'Allison': 105}

You Try

Create a Dictionary using political parties as keys, and politicians who are members of that party as values. For example, you could have a key value pair of Democrat: Kamala Harris or Labour: Keir Starmer.

Dictionary Methods

.keys() returns the keys
.values() returns the values
.items() returns (key, value) pairs
.get(key, default) returns default if the key is missing
.update() merges another dictionary into this one
.pop(key) removes a key and returns its value

Let’s see each in action.

.keys() and .values()

salaries = {"Jane": 100, "Mike": 80, "Ali": 85, "Diego": 80}

print(salaries.keys())
print(salaries.values())

dict_keys(['Jane', 'Mike', 'Ali', 'Diego'])
dict_values([100, 80, 85, 80])

.items()

Returns each (key, value) pair — cleaner than looking up each value by key

for name, salary in salaries.items():
    print(f"{name} makes ${salary}")

Jane makes $100
Mike makes $80
Ali makes $85
Diego makes $80

.get()

Safe key access. Returns None (or a supplied default) if the key isn’t there, instead of raising KeyError.

print(salaries["Jane"])            # 100
print(salaries.get("Jane"))        # 100
print(salaries.get("Ghost"))       # None — no crash
print(salaries.get("Ghost", 0))    # 0

100
100
None
0

Use .get() when you’re not sure the key exists. Common with messy real-world data.

.update()

Merges another dictionary into this one.

raises = {"Mike": 90, "Allison": 105}   # Mike got a raise; Allison is a new hire

salaries.update(raises)
print(salaries)

{'Jane': 100, 'Mike': 90, 'Ali': 85, 'Diego': 80, 'Allison': 105}

Existing keys are overwritten; new keys are added.

Adding to a list value

If a dictionary’s value is a list, you can grow it in place with list methods

sponsors = {"HR1": ["Sanders", "Warren"]}

sponsors["HR1"].append("Booker")
print(sponsors)

{'HR1': ['Sanders', 'Warren', 'Booker']}

If the key might not exist yet:

key = "HR2"
if key in sponsors:
    sponsors[key].append("Klobuchar")
else:
    sponsors[key] = ["Klobuchar"]

print(sponsors)

{'HR1': ['Sanders', 'Warren', 'Booker'], 'HR2': ['Klobuchar']}

.pop()

Removes a key and returns its value.

diego_salary = salaries.pop("Diego")
print(diego_salary)
print(salaries)

80
{'Jane': 100, 'Mike': 90, 'Ali': 85, 'Allison': 105}

Raises KeyError if the key doesn’t exist

Iterating over a dictionary

We can iterate over dictionaries. Consider the following dictionary, and write a for loop to print out each student’s grade.

grades = {"Ali": "A+", "Bella": "A", "Will": "C-", "Sam": "B"}

Hint: grades[person] (or whatever index) will return that person’s grade.

Your Turn

Create a dictionary using any method of your choosing
Find the value(s) associated with the key in the second position
print all the keys
Create a second dictionary from a list
Add that dictionary to the original
Iterate over the dictionary to print “The key is [key] and the value is [value]” for each entry

Break?

If we have yet to take our first break, we should take a break now!

elif it’s been over an hour since our last break, this would also be a good time to take a break!

else let’s continue on with class!

Functions

Reusable pieces of code
- If you are going to do something over and over again, make it into a function!
functions are not run until they are called/invoked somewhere.
- Once you create a function, it’s saved in your (global) environment

Function characteristics

name
parameters/arguments (0 or more)
might have a docstring explaining use
has a body
returns something

Example Function

def is_even(i):
    """
    Input: i, an integer
    Returns True if i is even, otherwise False
    """
    return i % 2 == 0

is_even(4)

True

Functions: Names and Arguments

def is the keyword used to define the function
- : triggers the body of the function
- same indentation rules
name of the function comes after def
() contains the parameters or arguments of the function

Functions: Docstring

the docstring, enclosed in “““, provides info on how to use the function
the docstring can be called with help()

def is_even(i):
    """
    Input: i, an integer
    Returns True if i is even, otherwise False
    """
    return i % 2 == 0

help(is_even)

Help on function is_even in module __main__:

is_even(i)
    Input: i, an integer
    Returns True if i is even, otherwise False

Caution! Variable Scope

Scope is the mapping of objects to environments
Usually, we are in global scope
- when we enter a function, a new scope/frame/environment is created
- unless we call something from the global environment, all our variables are within the local environment of the function call
- Once the function exits, any intermediate values will be discarded

Variable Scope: Lookup

def p(y):
    print(x)

x = 5

p(10)

What will this print?

def p(x):
    print(x)

x = 5

p(10)

What about this?

Variable Scope: Reassignment

x = 3
def square(x):
    x = x**2
    return x

z = square(x)
print(x)

What will this print?

Functions: Return

return can only be used inside of a function
- functions can have multiple returns
- only one of them will be used each time a function is invoked
  - Think conditional logic w/in function
Once return is hit, the function’s scope is exited and nothing else in the function is run

Returns in Functions

def check_number(number):
    if number > 0:
        return "positive"
    elif number < 0:
        return "negative"
    else:
        return "zero"

As soon as we hit return, the function will exit. Let’s try this out in colab.

Your Turn!

Write a function to check whether or not a number is divisible by 6. Test that function on the following numbers; 17, 64, 108, 157, 200.

Then, add an additional condition to check if the number is also divisible by 9.

An Answer

def is_div_six(number):
    if number % 6 == 0:
        return "Number is divisible by 6"
    else:
        return "Number is not divisible by 6"

is_div_six(200)

'Number is not divisible by 6'

def is_div_six(number):
    if number % 6 == 0:
        if number % 9 == 0:
            return "Number is divisible by 6 and 9"
        else:
            return "Number is divisible by 6"
    else:
        return "Number is not divisible by 6"

is_div_six(108)

'Number is divisible by 6 and 9'

Dictionaries in Functions

def word_freq(text):
    words_list = text.split()
    freq = {}
    for word in words_list:
        if word in freq:
            freq[word] += 1
        else:
            freq[word] = 1
    return freq

Run this function on some quote you find (or make up). What does it return? How might it be useful?

What would you include as a docstring?

You Try

Imagine we had a list of employees and salaries from some company, and we wanted to extract a new dictionary of high earners (specifically, those earning more than the mean salary at the company).

Write a function to do this. Then, test it out with a dictionary you create.

Comprehensions

Shorthand code that can replace while/for loops and if/elif/else statements
Can be used for lists, sets and dictionaries
Can make code shorter and easier to read

Comprehension Syntax

In general, a comprehension will look something like this

[expr for value in object if condition] ## do something to a value in some object (optionally if some condition is met)

As a loop, it might look like this

result = []
for value in object:
    if condition:
        result.append(expr)

Example

Simple example - create a copy of a list

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
new_list = []
for number in numbers:
    new_list.append(number)
print(new_list)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Simpler with a list comprehension

new_list = [number for number in numbers]
print(new_list)

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

Another List Comprehension

Square every item in a list, creating a new list of squared numbers

numbers = [2, 3, 4, 5]
new_list = []
for number in numbers:
    new_list.append(number**2)
print(new_list)

[4, 9, 16, 25]

With a list comprehension

numbers = [2, 3, 4, 5]

squared = [num**2 for num in numbers]

print(squared)

[4, 9, 16, 25]

One More List Comprehension

Example - Square only even numbers

some_squared = []

for num in numbers:
    if num % 2 == 0:
        some_squared.append(num**2)

print(some_squared)

[4, 16]

With a list comprehension

some_squared = [num**2 for num in numbers if num % 2 == 0]

print(some_squared)

[4, 16]

You Try

Write your own list comprehension for a list of strings that creates a new list of only long words (words of more than 5 characters).

Zip()

zip() will combine two lists by index, which can be useful if we have lists that match by index. For example, consider

countries = ["USA", "Canada", "Mexico"]
capitals = ["Washington, DC.", "Ottawa", "Mexico City"]

merged = zip(countries, capitals)

print(merged)

<zip object at 0x10832f0c0>

So - this is kind of weird. It returns a zip object - how can we make this useful?

Note, pandas has nicer options for joining that aren’t just by index, so we aren’t limited to having indexes match!

A Dictionary Comprehension

We might wish to combine two dictionaries

countries = ["USA", "Canada", "Mexico"]
capitals = ["Washington, DC.", "Ottawa", "Mexico City"]
capital_dict = {}
for i in range(len(countries)):
    capital_dict[countries[i]] = capitals[i]

print(capital_dict)

{'USA': 'Washington, DC.', 'Canada': 'Ottawa', 'Mexico': 'Mexico City'}

Dictionary Comprehension (with zip())

capital_dict = {key:value for key, value in zip(countries, capitals)}

print(capital_dict)

{'USA': 'Washington, DC.', 'Canada': 'Ottawa', 'Mexico': 'Mexico City'}

Zip objects are iterables, so we can iterate over them with loops or comprehensions

More Dictionary Comprehensions

Extract just the USA key:value pair (Get the capital of the USA)

usa_cap = {key: value for key, value in zip(countries, capitals) if key == "USA"}

print(usa_cap)

{'USA': 'Washington, DC.'}

Or, use the value to extract the key

ottawa_country = {key: value for key, value in zip(countries, capitals) if value == "Ottawa"}

print(ottawa_country)

{'Canada': 'Ottawa'}

Your Turn — Dictionary Comprehension

You have two parallel lists — senators and their years of service:

senators = ["Sanders", "Warren", "Cruz", "Cornyn", "Schumer"]
years    = [34, 12, 13, 23, 26]

Write a dictionary comprehension that builds a {senator: years} mapping, but only includes senators with more than 15 years of service.

Hint: you’ll need zip() and an if clause.

Python Modules

Python modules are files (.py) that (mainly) contain function definitions
they allow us to organize, distribute code; to share and reuse others’ code too
- Can easily create, save and load our own custom modules
keep code coherent and self-contained
one can import modules or some functions from modules

Python Modules: The Standard Library

The standard library already contains a bunch of useful modules. For example, we can load the math module.

Instead of writing our own power function

def raise_to_power(a, b):
    return a**b

raise_to_power(2, 3)

We can import a module that has already defined this function

import math
math.pow(2, 3)

8.0

Another example

There are tons of useful modules in the standard library

from datetime import date

today = date.today()

print("Today's date is:", today)

Today's date is: 2026-07-08

Here, I don’t necessarily want to import the whole datetime module, so I can instead just import date.

The standard modules come with your Python install. Colab also has many common libraries installed. Tomorrow we will cover how to install other libraries!

NumPy

NumPy is short for numerical Python. It’s a foundational package for data analysis in Python, and many packages depend on numpy arrays as a data type.

Many later libraries, like pandas, are built on the functions and data structures of NumPy. Machine learning frameworks like tensorflow also build on this infrastructure.

The most common and useful data structure in NumPy is the array. Arrays are structured objects in Python that contain data of all the same type.

Importing Libraries

numpy is not a built-in feature of Python, so we need to load it. If you don’t have it installed, type pip install numpy in your terminal.

by convention we load numpy as np

import numpy as np

NP arrays

numpy Arrays are useful because they are very computationally efficient ways of storing multi-dimensional data. On average, between 10x to 100x faster than other Python approaches.

Most data in the social sciences is multidimensional, so this is crucial for our purposes!

# create an array of data
data = np.array([[1.5, -0.1, 3], [0, -3, 6.5]])

data

array([[ 1.5, -0.1,  3. ],
       [ 0. , -3. ,  6.5]])

Math with Arrays

data * 10

data + 5

data * data

array([[2.250e+00, 1.000e-02, 9.000e+00],
       [0.000e+00, 9.000e+00, 4.225e+01]])

Array Methods

We can check the shape of data stored in an array

data.shape

(2, 3)

Or the type of data stored in the array

data.dtype

dtype('float64')

More Array Methods

We can make an array of zeros with arbitrarily many dimensions like so

np.zeros((2, 3, 2))

array([[[0., 0.],
        [0., 0.],
        [0., 0.]],

       [[0., 0.],
        [0., 0.],
        [0., 0.]]])

Or an ordered array from 0 to 19 like this

np.arange(20)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

Slicing and Indexing Arrays

Like with lists, we can slice and index arrays. Let’s start with a 1D array. As with other data types, arrays are zero indexed in Python

arr = np.arange(10)

arr[5]

arr[5:8]

array([5, 6, 7])

We can update the values of arrays

arr[5:8] = 12

arr

array([ 0,  1,  2,  3,  4, 12, 12, 12,  8,  9])

Warning!

You do need to be careful with slicing arrays. Even if you save a slice of an array to a new object, NumPy still recognizes that slice as part of the original array. And if you change the values in that new object, it will also change the original object in memory.

# uh-oh

arr_slice = arr[5:8]
arr_slice

arr_slice[1] = 12345

arr

array([    0,     1,     2,     3,     4,    12, 12345,    12,     8,
           9])

Comparing Arrays

arr = np.array([[1., 2., 3.], [4., 5., 6.]])
arr2 = np.array([[0., 4., 1.], [7., 2., 12.]])

arr2 > arr

array([[False,  True, False],
       [ True, False,  True]])

Multidimensional Arrays

We can create arrays of arbitrarily many dimensions, although if things are getting very complex we might want to think about other ways to store our data

arr_3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

arr_3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

Pay close attention to the placement of the brackets. Easy to mess this up!

We’ll work with indexing and slicing multidimensional arrays tomorrow.

Tomorrow

Tomorrow — Data analysis with pandas, plus accessing APIs
Questions: come to office hours (10 AM – 12 PM daily), or email me
Recommended reading: McKinney, Python for Data Analysis, Ch. 3–7
Slides will be posted after class on Canvas and at will-horne.github.io/icpsr-2026