P02: Python basics

Nicky Wakim

2026-04-09

To run the code in this lecture

  • If you open up this qmd (for the slides) in R Studio, then you need to run the following in the Terminal:

python3 -m pip install pandas plotly statsmodels

Coding in the Interactive Window (The Console)

When you open VS Code and start a Python Interactive window, it acts like the R Console.

  • Type code at the bottom prompt.
  • Press Shift + Enter to execute.
  • Output appears immediately above.

Working with Scripts vs. Notebooks

  • .py files: Pure scripts (like .R files).
  • .ipynb files: Jupyter Notebooks (like Quarto/Rmd).
  • Recommendation: Use Notebooks for this class so you can keep your notes and code together!

Math calculations in Python

Python follows standard order of operations (PEMDAS).

10**2  # Note: Power is ** in Python (not ^)
100
3 ** 7
2187
6 / 9
0.6666666666666666
9 - 43
-34
4**3 - 2 * 7 + 9 / 2
54.5

The equation above is computed as \[4^3 − (2 \cdot 7) + \frac{9}{2}\]

Variables (Python Objects)

  • We assign variables using =.
  • Unlike R, there is no <- operator in Python.

Assign a single value:

x = 5
print(x)
5

Assign a “List” or “Array”

  • Consecutive integers using range():
a = list(range(3, 11)) # 3 up to (but not including) 11
print(a)
[3, 4, 5, 6, 7, 8, 9, 10]
  • Create a list of numbers:
b = [5, 12, 2, 100, 8]
print(b)
[5, 12, 2, 100, 8]

Let’s try it out!

  • Create a new variable y and assign it the value of 8.
  • Create a new variable c that is a list of values 15 through 20.
  • Create a new variable d that is a list containing 16, 17, 18, 19, and 22.

Hint: Python’s range(start, stop) stops before the stop number!

Doing math with variables

Single values:

x = 5
print(x + 3)
8
y = x**2
print(y)
25

Element-wise math: In Python, standard lists don’t do element-wise math easily. We use NumPy for that!

import numpy as np
a = np.array([3, 4, 5, 6])

a+3
array([6, 7, 8, 9])
print(a + 2)
[5 6 7 8]
print(a * 3)
[ 9 12 15 18]
print(a * a)
[ 9 16 25 36]

Variables can include text (Strings)

hi = "hello"
print(hi)
hello
# A list of strings
greetings = ["Guten Tag", "Hola", hi]
print(greetings)
['Guten Tag', 'Hola', 'hello']

Using Functions

  • Functions in Python always use ().
  • We often “call” functions from libraries using the . notation (e.g., np.mean()).

Keyword Arguments:

np.mean(a=[1, 2, 3, 4])
np.float64(2.5)
# Using a range function
#list(range(start=1, stop=12, step=3))

Positional Arguments (In order):

np.mean([1, 2, 3, 4])
np.float64(2.5)
#list(range(1, 12, 3))

Getting Help in Python

  1. Use help() or ?:
    • help(np.mean) or np.mean? in your console will pull up the documentation.
  2. Google/StackOverflow:
    • Search for “pandas mean returns NaN” or “how to filter rows in pandas.”
  3. AI Tools:
    • Great for explaining errors. If you get a TypeError, paste it into the chat to see a breakdown.

Let’s try with an example dataset

The iris dataset in Python

To use datasets in Python, we use the pandas library.

import pandas as pd
import seaborn as sns

# Load the built-in iris dataset
iris = sns.load_dataset('iris')

Exploring the Data

View the first few rows:

iris.head()
   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa

Get data types and info:

iris.info()
<class 'pandas.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    str    
dtypes: float64(4), str(1)
memory usage: 6.0 KB

Summary statistics:

iris.describe()
       sepal_length  sepal_width  petal_length  petal_width
count    150.000000   150.000000    150.000000   150.000000
mean       5.843333     3.057333      3.758000     1.199333
std        0.828066     0.435866      1.765298     0.762238
min        4.300000     2.000000      1.000000     0.100000
25%        5.100000     2.800000      1.600000     0.300000
50%        5.800000     3.000000      4.350000     1.300000
75%        6.400000     3.300000      5.100000     1.800000
max        7.900000     4.400000      6.900000     2.500000

Get dimensions:

iris.shape # (rows, columns)
(150, 5)

The . (The Python equivalent of $)

  • To select a single column in Python, we use . or [''].
  • DataFrame.ColumnName or DataFrame['ColumnName'].
# These do the same thing
print(iris.petal_width)
0      0.2
1      0.2
2      0.2
3      0.2
4      0.2
      ... 
145    2.3
146    1.9
147    2.0
148    2.3
149    1.8
Name: petal_width, Length: 150, dtype: float64
print(iris['petal_width'])
0      0.2
1      0.2
2      0.2
3      0.2
4      0.2
      ... 
145    2.3
146    1.9
147    2.0
148    2.3
149    1.8
Name: petal_width, Length: 150, dtype: float64

Summary Stats on Columns

print(iris.petal_width.mean())
1.1993333333333336
print(iris.petal_width.std())
0.7622376689603465
print(iris.petal_width.median())
1.3

Common Python Errors

IndentationError - Python cares about spaces at the start of lines!

NameError - Usually means you misspelled a variable or haven’t run the cell where it was defined.

SyntaxError - You might be missing a closing parenthesis ) or a quote ".

Key Sources for Python Basics: