P02: Python basics

Nicky Wakim

2026-04-09

To run the code in this lecture

If you open up this qmd (for the slides) in R Studio, then you need to run the following in the Terminal:

python3 -m pip install pandas plotly statsmodels

Coding in the Interactive Window (The Console)

When you open VS Code and start a Python Interactive window, it acts like the R Console.

Type code at the bottom prompt.
Press Shift + Enter to execute.
Output appears immediately above.

Working with Scripts vs. Notebooks

.py files: Pure scripts (like .R files).
.ipynb files: Jupyter Notebooks (like Quarto/Rmd).
Recommendation: Use Notebooks for this class so you can keep your notes and code together!

Math calculations in Python

Python follows standard order of operations (PEMDAS).

10**2  # Note: Power is ** in Python (not ^)

3 ** 7

6 / 9

0.6666666666666666

9 - 43

-34

4**3 - 2 * 7 + 9 / 2

54.5

The equation above is computed as \[4^3 − (2 \cdot 7) + \frac{9}{2}\]

Variables (Python Objects)

We assign variables using =.
Unlike R, there is no <- operator in Python.

Assign a single value:

x = 5
print(x)

Assign a “List” or “Array”

Consecutive integers using range():

a = list(range(3, 11)) # 3 up to (but not including) 11
print(a)

[3, 4, 5, 6, 7, 8, 9, 10]

Create a list of numbers:

b = [5, 12, 2, 100, 8]
print(b)

[5, 12, 2, 100, 8]

Let’s try it out!

Create a new variable y and assign it the value of 8.
Create a new variable c that is a list of values 15 through 20.
Create a new variable d that is a list containing 16, 17, 18, 19, and 22.

Hint: Python’s range(start, stop) stops before the stop number!

Doing math with variables

Single values:

x = 5
print(x + 3)

y = x**2
print(y)

Element-wise math: In Python, standard lists don’t do element-wise math easily. We use NumPy for that!

import numpy as np
a = np.array([3, 4, 5, 6])

a+3

array([6, 7, 8, 9])

print(a + 2)

[5 6 7 8]

print(a * 3)

[ 9 12 15 18]

print(a * a)

[ 9 16 25 36]

Variables can include text (Strings)

hi = "hello"
print(hi)

hello

# A list of strings
greetings = ["Guten Tag", "Hola", hi]
print(greetings)

['Guten Tag', 'Hola', 'hello']

Using Functions

Functions in Python always use ().
We often “call” functions from libraries using the . notation (e.g., np.mean()).

Keyword Arguments:

np.mean(a=[1, 2, 3, 4])

np.float64(2.5)

# Using a range function
#list(range(start=1, stop=12, step=3))

Positional Arguments (In order):

np.mean([1, 2, 3, 4])

np.float64(2.5)

#list(range(1, 12, 3))

Getting Help in Python

Use help() or ?:
- help(np.mean) or np.mean? in your console will pull up the documentation.
Google/StackOverflow:
- Search for “pandas mean returns NaN” or “how to filter rows in pandas.”
AI Tools:
- Great for explaining errors. If you get a TypeError, paste it into the chat to see a breakdown.

Let’s try with an example dataset

The `iris` dataset in Python

To use datasets in Python, we use the pandas library.

import pandas as pd
import seaborn as sns

# Load the built-in iris dataset
iris = sns.load_dataset('iris')

Exploring the Data

View the first few rows:

iris.head()

   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa

Get data types and info:

iris.info()

<class 'pandas.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    str    
dtypes: float64(4), str(1)
memory usage: 6.0 KB

Summary statistics:

iris.describe()

       sepal_length  sepal_width  petal_length  petal_width
count    150.000000   150.000000    150.000000   150.000000
mean       5.843333     3.057333      3.758000     1.199333
std        0.828066     0.435866      1.765298     0.762238
min        4.300000     2.000000      1.000000     0.100000
25%        5.100000     2.800000      1.600000     0.300000
50%        5.800000     3.000000      4.350000     1.300000
75%        6.400000     3.300000      5.100000     1.800000
max        7.900000     4.400000      6.900000     2.500000

Get dimensions:

iris.shape # (rows, columns)

(150, 5)

The `.` (The Python equivalent of `$`)

To select a single column in Python, we use . or [''].
DataFrame.ColumnName or DataFrame['ColumnName'].

# These do the same thing
print(iris.petal_width)

0      0.2
1      0.2
2      0.2
3      0.2
4      0.2
      ... 
145    2.3
146    1.9
147    2.0
148    2.3
149    1.8
Name: petal_width, Length: 150, dtype: float64

print(iris['petal_width'])

0      0.2
1      0.2
2      0.2
3      0.2
4      0.2
      ... 
145    2.3
146    1.9
147    2.0
148    2.3
149    1.8
Name: petal_width, Length: 150, dtype: float64

Summary Stats on Columns

print(iris.petal_width.mean())

1.1993333333333336

print(iris.petal_width.std())

0.7622376689603465

print(iris.petal_width.median())

1.3

Common Python Errors

IndentationError - Python cares about spaces at the start of lines!

NameError - Usually means you misspelled a variable or haven’t run the cell where it was defined.

SyntaxError - You might be missing a closing parenthesis ) or a quote ".