Sorting Algorithms Review
Let's review the sorting algorithms we've covered so far: bubble sort, insertion sort, merge sort, quicksort, and timsort. Let's also review the master theorem for evaluating algorithm time complexity.
Exercise 1
Fill out the following table about the properties of different sort algorithms. As we're going through, please talk to me about when each of these cases occur, and in the case of space complexity, why it's this way.
Algorithm | Best Case | Average Case | Worst Case | Space complexity |
---|---|---|---|---|
Bubble sort | ||||
Insertion sort | ||||
Merge sort | ||||
Quicksort | ||||
Timsort |
Exercise 2
For both of the following Python algorithms, write out its time complexity in T
notation, then use the master theorem to derive its overall time complexity.
The Master Theorem is restated here for your convenience:
Let a be the number of times we recurse in an algorithm.
Let b be the amount we divide a list of length n by.
Let d be the time complexity of additional processing at a given level.
For an equation of the form T(n) = aT(n/b)+O(n^d):
a = b^d => T(n) = O((n^d)logb(n))
a < b^d => T(n) = O(n^d)
a > b^d => T(n) = aT(n/b)+O(n^d)
def A(input):
third = len(input) / 3
A(input[: third])
A(input[third : third * 2])
return sum(input)
def B(input, target):
if len(input) == 1:
return target == input[0]
mid = len(input) / 2
left = input[:mid]
right = input[mid:]
if B(left) or B(right): return True
return False
Exercise 3
For each of the following use cases, describe a suitable sorting algorithm and justify why it is suitable for this use case.
- You have an unsorted set of cards 1-10 which you must sort by hand, but you must use a formal approach to sort them.
- You're a maintainer on a new programming language. You want a sorting method which is stable (it doesn't swap elements with the same value) and will work across a wide variety of data sets.
- You are a staff engineer at a large tech company. You have access to massive computing power distributed across 500 servers. However, to reduce costs, these servers use hard drives for the majority of their storage, making random reads and writes expensive and slow. In addition, you are dealing with around 10tb of user data per day, which must be sorted for further analysis.
- You're writing a sorting algorithm for prioritising tasks in-memory, for example as part of a scheduler. The number of tasks is non-trivial but easily fits in system memory. You want to minimise the use of system resources in sorting tasks, but also get fast performance for the sort.
Bonus: The Fisher-Yates Modern Shuffle (Unsort)
Notice how this logic reflects the logic of sort algorithms generally in iterating and swapping elements. However, by using a random number to select the swap element, we transform this sorting algorithm into a shuffling algorithm.
Optional: Write an implementaion of the Fisher-Yates Modern Shuffle algorithm.
Answers to Exercises
Exercise 1
Algorithm | Best Case | Average Case | Worst Case | Space complexity |
---|---|---|---|---|
Bubble sort | O(n) | O(n^2) | O(n^2) | O(1) |
Insertion sort | O(n) | O(n^2) | O(n^2) | O(1) |
Merge sort | O(nlogn) | O(nlogn) | O(nlogn) | O(n) |
Quicksort | O(nlogn) | O(nlogn) | O(n^2) | O(logn) |
Timsort | O(n) | O(nlogn) | O(nlogn) | O(n) |
Exercise 2
function A:
a = 2
b = 3
d = 1
T(n) = 2T(n/3) + O(n)
b^d = 3
a < b^d, so case 2 applies.
T(n) = O(n)
function B:
a = 2
b = 2
d = 0
T(n) = 2T(n/2) + O(1)
a > b^d, so case 3 applies.
T(n) = O(n^logb(a))
T(n) = O(n^log2(2))
T(n) = O(n^1)
T(n) = O(n)
Exercise 3
- We likely want to use insertion sort. Insertion sort performs best over short lists like the numbers 1-10.
- This is the use case for Timsort. It will work well on sorted and unsorted datasets, and mitigates the issues with insertion sort and merge sort by combining them.
- We probably want to use merge sort. It's helpful for on-disk sorting because it doesn't require that we hold our whole list in memory, and can be parallelised easily.
- We probably want to use quicksort. While quicksort has some disadvantages in sorting, it has very good performance in practice, and works well on items that we have memory access to.