Python Tuples and Advanced List Manipulation

In previous classes we've looked at Python lists, which are similar to the array datatype in many languages.

Python has very strong list manipulation capabilities, which is one of the major reasons it's widely used in data science and scientific computing. In fact, one of the most popular Python libraries, numpy, is designed to make Python's list functions faster and more powerful.

In this class we'll look at some unique features of Python lists and list manipulations which distinguish them from the array types in other languages.

We're also going to take a look at tuples, which are a feature of Python and some other languages (e.g. Rust) but not found universally.

Tuples

Tuples are similar to lists, but with the key difference they can't be changed after being created. This makes them helpful in cases where we want the result of some operation to be well-defined, especially for math and physics simulation purposes.

def add_velocities(a, b):
  return (a[0] + b[0], a[1] + b[1])

# we indicate tuples with commas, usually in brackets
a = (10, 10)
b = (-5, -5)

# add velocities A and B, and store in C
c = add_velocities(a,b)
print(c)

# we access values in tuples the same way as in lists
print(c[0])

Exercise

In a game, every player has a position stored as a tuple:

player_1 = (1,1)
player_2 = (10,15)
player_3 = (12,14)
player_4 = (16,25)

By using Pythagoras' theorem dx^2 + dy^2 = c^2, where dx is the difference between two x coordinates and dy is the difference between two y coordinates, we can calculate the distance between two points.

Write a program to calculate the distance of players 2-4 from player 1 and output them as tuples.

List Comprehensions

In Python, lists can be created through the use of a list comprehension. They can be created from any other iterable object - i.e. any object a for... in loop will work on.

# range() returns an iterable
one_to_ten = [x for x in range(1,11)]
print(one_to_ten)

# strings are iterable
split = [char for char in "hello"]
print(split)

# so are tuples
as_list = [num for num in (1,2,3,4,5)]
print(num)

Tuples cannot be created with comprehensions (although some other types can) because their value can't be changed after they've been created.

The first argument of the list comprehension doesn't need to be related to what's being iterated over:

from random import randint

random_list = [randint(1,10) for x in range(10)]
print(random_list)

squares = [x**2 for x in range(10)]
print(squares)

Exercises

Using list comprehensions, create the following lists.

  1. The numbers 1-100.
  2. The square roots of each number in the tuple (4, 9, 144, 10000). Note that a square root is equivalent to a power of 0.5.
  3. (Challenging) Strings of two words at a time from the standard Lorem Ipsum text.

Common List Operations

Some of the most common things we want to do with a list are sort it, find the maximum or minimum value, or take a sum. Python has built-in functions for all of these.

There are two simple ways to sort lists: using the sorted() function and the list.sort() function.

from random import randint

rand_list = [randint(1,10) for x in range(10)]
print(rand_list)

# .sort() changes the list
rand_list.sort()
print(rand_list)

rand_list_2 = [randint(1,10) for x in range(10)]
print(rand_list_2)

# sorted() creates a new list; here we sort in reverse order
sorted_list = sorted(rand_list_2, reverse=True)
print(sorted_list)
print(rand_list_2)

We can also configure sorted by changing the key input, and we'll cover this in our class on lambdas and higher-order functions.

The other list operations mentioned above are simple:

one_to_ten = [x for x in range(1,11)]

summed = sum(one_to_ten)
minimum = min(one_to_ten)
maximum = max(one_to_ten)

average = sum(one_to_ten) / len(one_to_ten)

for n in (summed, minimum, maximum, average):
  print(n)

Exercise

Most businesses have to keep track of numerous accounts:

accounts = [
  ["Business Loan", -23000],
  ["Big Bank", 1000],
  ["Wealth Management Inc", 100000],
  ["Credit Card", -1500],
]

Write a script which will return the net cash of the business and the lowest and highest amounts in given accounts. It should also return a list of the accounts, sorted from highest to lowest balance.

Destructuring

Tuples and lists can be destructured into multiple variables. One common use of this is with the enumerate() function over a list.

# create a list from 100 to 1
my_list = [x for x in range(100,0,-1)]

# enumerate returns the tuple (index, value)
for i, value in enumerate(my_list):
  print(f"Index: {i}; Value: {value}")

You can also destructure a list into several variables and a list by using the spread operator *.

# create a list from 2 to 100 by twos
my_list = [x for x in range(2,101,2)]

two, four, six, *rest = my_list

print(two)
print(four)
print(six)
print(rest)

Exercise

The following snippet of code will generate a 5 x 5 grid of random numbers. By using destructuring and the built-in functions we've covered so far, write a short piece of code to return the average value of each column without using an explicit for loop.

from random import randint

random_grid = [ [randint(1,10) for y in range(5)] for x in range(5) ]

The zip() Function

Sometimes we have several separate lists which we would prefer to be a list of lists. For example, we might have a list of phone numbers, a list of names, and a list of email addresses, but want a list of people with their name, e email, address and phone number.

The zip() function converts these separate lists into a list of lists in this way.

names = ("Joe Bloggs", "Jolene Walker", "Alexander Hamilton")
email = ("joe@bloggs.me", "jolene.walker@email.com", "editor@thefederalist.com")
phone = ("+331990001","+331990103","+331990807")

people = zip(names, email, phone)

for person in people:
  print(person)

Exercise

You're given a grid of user information stored as a list of people. We want to make it easier to search by field. To do this, we need to convert this into a list of columns with the same type of information in each.

Please convert the following grid into 3 lists by using zip(): name, email, and salary.

data = [
  ["John Smith", "jsmith@email.com", 65000],
  ["Joe Bloggs", "jbloggs@email.com", 50000],
  ["Jolene Baker", "jbaker@email.com", 80000],
]

Further Reading