Week 3: Introduction to Python

This lab is to teach you the basics of Python.

For data analysis, we need to do a lot of calculations. Handheld calculators are not good enough for this job. In this course, you'll learn how to use your computer as a much more advanced calculator.

To communicate with computers, we need a language that both humans and computers can understand. Python is one such language. We will write instructions in Python, called code or program, and the computer will run this code, i.e., carry out these instructions.

Installation

You need to install a few things to have your computer understand your Python code. There are two ways to do this.

  1. Automatic: Install Anaconda (https://www.anaconda.com/download/). Anaconda is a python distribution (interpreter + packages + other useful programs).
  2. Manual: Install a Python interpreter (multiple installation methods available, depending on your OS). Then install each package manually.

The automatic method is better for data scientists.

There are two ways to run python code: in an interactive shell and as a script. We will look at scripts later. Let's first start with the interactive shell. The interactive shell is good for exploring python.

To start the interactive shell, run python3 on Linux/macOS and py or py3 on windows. You will be greeted with a message like this:

Python 3.11.7 (main, Dec  4 2023, 18:10:11) [Clang 15.0.0 (clang-1500.1.0.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

Operators and Simple Objects

You can use python as a calculator.

>>> 2 + 3 * 4
14
>>> (2 + 3) * 4
20
>>> 3 ** 3 / 3
9.0

Arithmetic operators: +, -, *, /, **.

Numbers are not the only things in Python. Python has other types of objects too, like booleans.

>>> 8 > 9
False
>>> 2 * 4 == 8
True

Relational/comparison operators: >, <, >=, <=, ==, !=.

>>> 2 * 4 == 8 or 2 * 4 == 9
True
>>> True or False
False
>>> not True
False

Strings are sequences of characters.

>>> 'abc' + 'xyz'
'abcxyz'
>>> "hello " * 3 + "bye"
'hello hello hello bye'
>>> len("Hello! Bye!")
11

Errors:

>>> 3 / 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: division by zero
>>> 3 * * 3
  File "<stdin>", line 1
    3 * * 3
        ^
SyntaxError: invalid syntax

Variables

Variables help us name objects so that we can use them again later.

>>> x = 123456789
>>> x
123456789
>>> x * 8
987654312
>>> x = 2 + 3
>>> x**2 + 2*x + 1
36
>>> x = x + 2
>>> x
7

Note how we made x change the object it referred to: from 123456789 to 5.

Finding the type of an object:

>>> type(x)
<class 'int'>
>>> type('hello')
<class 'str'>

x -= 1 is the same as x = x - 1. We also have +=, *=, /=.

If you want a variable x to not refer to any object, write x = None.

Containers

Container objects are used to store multiple objects together. There are many types of containers in Python. We will look at lists, tuples, and dicts.

list

>>> x = [30, 20, 10]
>>> type(x)
<class 'list'>
>>> x[0]
30
>>> x[2]
10
>>> x[-1]
10
>>> x[-2]
20
>>> len(x)
3
>>> sum(x)
60
>>> 20 in x
True
>>> sorted(x)
[10, 20, 30]
>>> a, b, c = x
>>> a
30
>>> b
20

Lists are mutable, i.e., they can be modified.

>>> x[0] = 'hello'
>>> x
['hello', 20, 30]
>>> x.append(100)
>>> x
['hello', 20, 30, 100]
>>> x.pop()
100
>>> y = x.pop()
>>> y
30
>>> x
['hello', 20]

Multiple variables can refer to the same object.

>>> x = [100, 200, 300]
>>> y = [100, 200, 300]
>>> z = x
>>> x == y
True
>>> x is y
False
>>> x is z
True
>>> x[0] = 'hello'
>>> x
['hello', 200, 300]
>>> y
[100, 200, 300]
>>> z
['hello', 200, 300]
>>> z[1] = 'bye'
>>> x
['hello', 'bye', 300]

Lists can be nested.

>>> a = [[1, 2, 3], ['hello', 'bye'], [], 5]

tuple

They are like lists, except that they are immutable.

>>> x = (10, 20, 30)
>>> type(x)
<class 'tuple'>
>>> x[0]
10
>>> x[0] = 100
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> list(x)
[10, 20, 30]

set

Like lists, except that order doesn't matter and repetitions are ignored.

>>> s = {'a', 'p', 'p', 'e', 'a', 'l'}
>>> s
{'e', 'l', 'p', 'a'}
>>> t = set()
>>> t.add('l')
>>> t.add('e')
>>> t.add('a')
>>> t.add('p')
>>> s == t
True
>>> list(s)
['e', 'l', 'p', 'a']

A set's elements are not allowed to be mutable.

>>> s = {[0, 1], [2, 3]}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> s = {(0, 1), (2, 3)}

dict

A dict contains key-value pairs. It is defined using curly brackets.

>>> capitals = {'IL': 'Springfield', 'AZ': 'Phoenix'}
>>> capitals['IL']
'Springfield'
>>> len(capitals)
2
>>> 'AZ' in capitals.keys()
True

Modifying a dict:

>>> capitals['TX'] = 'Austin'
>>> capitals
{'IL': 'Springfield', 'AZ': 'Phoenix', 'TX': 'Austin'}
>>> capitals['IL'] = 'Urbana'
{'IL': 'Urbana', 'AZ': 'Phoenix', 'TX': 'Austin'}
>>> capitals['CA']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'CA'
>>> del capitals['AZ']
>>> capitals
{'IL': 'Urbana', 'TX': 'Austin'}

A dict's keys are not allowed to be mutable.

print

When using python non-interactively, we need to use print to show the result.

>>> x = 'hello'
>>> x
'hello'
>>> print(x)
hello
>>> print(x, 10, 'bye')
hello 10 bye

Control-flow

Conditional statements

>>> x = 8
>>> if x > 10:
...     print('hello')
...
>>> if x < 9:
...     print(x, 'is less than 9')
... elif x > 9:
...     print(x, 'is more than 9')
... else:
...     print(x, 'is equal to 9')
...
8 is less than 9

For loop

Used to repeat an action.

>>> a = [10, 20, 30, 40]
>>> for x in a:
...     print('I have', x, 'apples!')
...
I have 10 apples!
I have 20 apples!
I have 30 apples!
I have 40 apples!
>>> for i in range(3, 6):
...     a.append(i * 1000)
...
>>> a
[10, 20, 30, 40, 3000, 4000, 5000]
>>> capitals = {'IL': 'Springfield', 'AZ': 'Phoenix'}
>>> for state, city in capitals.items():
...     print('The capital of', state, 'is', city)
...
The capital of IL is Springfield
The capital of AZ is Phoenix
>>> for state in capitals.keys():
...     print('The capital of', state, 'is', capitals[state])
...
The capital of IL is Springfield
The capital of AZ is Phoenix
>>> for state in sorted(capitals.keys()):
...     print('The capital of', state, 'is', capitals[state])
...
The capital of AZ is Phoenix
The capital of IL is Springfield

while loop

Keeps running till a condition is satisfied.

i = 5
while i > 0:
    print(i)
    i -= 1

Output:

5
4
3
2
1
x = 0
while True:
    x += 1

Functions, Classes, Modules

We pass 'parameters' to a function, and it 'returns' a value.

>>> def f(x, y):
...     s = x + y
...     d = x - y
...     return s * d
...
>>> f(4, 2)
12
>>> f(y=2, x=4)
12

Python already has many useful functions (called built-in functions). We have already seen some: len, sum, print.

We have seen many types of objects so far: int, str, list, dict. It's also possible to define your own type, a.k.a. class. This is beyond the scope of this document.

A module is a group of functions, variables, and classes. We need to import a module to use it.

>>> import math
>>> math.pi
3.141592653589793
>>> math.sin(math.pi / 6)
0.49999999999999994

The set of modules that are part of python (like math) is called the 'standard library'. To learn more about python's built-in functions and python's standard library, you can look up the documentation: https://docs.python.org/3/.

You can view the documentation for any function, class, or module in the interactive shell using help:

>>> help(math.sin)
>>> help(sum)

You can also write your own modules. That's beyond the scope of this lab. Other people have written useful modules and made them available on the internet for us to use. A group of related modules is called a package.

Running scripts

(Demonstrate example.py)

In the interactive shell, you need to enter and run your code line-by-line. Instead, you can write all your code in a file (called a script), and then run the file. Advantage: you can run the file as many times as you want without having to type each line of your code again in the interactive shell.

strings

Strings are not considered containers, but they behave a bit like tuples of their characters.

>>> s = 'hello'
>>> s[2]
'l'
>>> s[-1]
'o'
>>> list(s)
['h', 'e', 'l', 'l', 'o']

There is a difference between numbers and their string representations.

>>> str(123)
'123'
>>> int('123')
123
>>> float('2.5')
2.5

'\n' is a single character, and means 'new line'.

>>> print('hello\nworld')
hello
world

Splitting and joining strings:

>>> 'abc,def,ghi'.split(',')
['abc', 'def', 'ghi']
>>> ' '.join(['This', 'is', 'a', 'sentence.'])
'This is a sentence.'

Reading and writing files

Reading a file:

with open('file.txt') as fobj:
    s = fobj.read()
print(s)

Writing content to a file:

with open('file.txt', 'w') as fobj:
    fobj.write(s)

Alternatively,

with open('file.txt', 'w') as fobj:
    print(s, file=fobj)

Difference: fobj.write takes a single string as input, print can take multiple arguments. print appends a newline to its output. fobj.write doesn't.

Exercise

For any positive integer x, let digitSum(x) be the sum of its digits. If we repeatedly replace x by digitSum(x) till we get a single-digit number, the result is called digitRoot(x). For example, digitRoot(9776) = 2, since 9776 → 29 → 11 → 2.

Among all 4-digit numbers (i.e., numbers from 1000 to 9999), what are the frequencies of different digitRoots? I.e., for each number i from 1 to 9, how many 4-digit numbers are there whose digitRoot is i?