Python Strings and String Manipulation

In our previous classes we've touched on strings as a way of taking input from the user and showing output to them. However, we haven't looked at strings much in themselves.

This is because strings are a bit more complex than our other types of variable. Unfortunately, they're also really common! This is because strings can be used to represent almost any other value provided there's an agreed-on way to interpret them. You'll find strings being used for data types as diverse as:

  • Numbers
  • Dates
  • Objects
  • Web requests

The best way to think of strings is as lists of characters, and in fact in low-level languages like C this is how they're represented.

There are many ways of manipulating strings, and we'll cover some of them in this class. Probably the most common thing we want to do with a string is find out how long it is:

my_string = "Hello!"
print(len(my_string))

Because strings are a list of characters you can get a character at any position or index in the string (starting from 0) by using square brackets. One special feature of Python is that negative indexes like -1 are allowed. string[-1] means the same thing as string[len(string)-1]. This is very convenient for writing code.

You can use this with a loop to look at or manipulate strings on a character-by-character basis.

my_string = "Hello world!"
print(my_string[5])

for char in my_string[::-1]:
  print(char)

Notice that when you use a for loop over a string, it will give you each character of the string individually.

Another very common thing we want to do with strings is to take extract section of them. The easiest way to do this is to use the slice syntax. This is a special feature of Python which lets us extract parts of a string or list very easily.

Slice syntax has the format string[start:end:step]. All of them can be positive or negative, and you can leave out the arguments that you're not interested in using.

start is the index of the character you want to slice from. It's included in the output.

end is the index of the character you want to slice to. It's not included in the output.

step is the amount we increment each time. For example string[::2] will return every other character of the output.

my_string = "Hello world!"

print(my_string[:5]) # Hello
print(my_string[6:11]) # World
print(my_string[::2]) # Hlowrd

Another really common operation on strings is to concatenate them. This can be done with the + operator.

hello = "Hello"
world = "World!"
hello_world = hello + world
digits = ""
for i in range(10):
  digits += str(i)
print(digits)

Python has an interesting feature which lets us multiply strings to repeat them:

hello = "hello "
print(hello * 10)
print(("a" * 5) + ("b" * 10)) 

You can also compare strings with > and < in the same way as numbers, which will compare their alphabetical value.

"a" > "b" 
"dog" < "cat"

Not every character you might want in a string is easy to type on a keyboard or makes sense in a string. For example, what if you want to put the " character inside a string? """ will break your code.

To solve this problem, there are special escape codes for certain characters.

\n - new line`
\" 
\' - quote marks
\\ - backslash
\u - any unicode character by number

This can lead to "interesting" pieces of code that are a little bit challenging to read.

my_string = "Hello \"world!\""
print(my_string)
my_string = "\n\u2705\n\u2705\n\u2705"
print(my_string)

As we've covered previously, you can also include other variables in strings by using format string syntax:

base = 10
output = f"{base}: {base * 10}, {base + 100}"

This is equivalent to the .format() function:

base = 10
output = "{}: {}, {}".format(base,base*10, base+100)

A full description of the format syntax can be found in the Python standard documentation.

Exercise 1

  1. A palindrome is a string which reads the same both forward and reversed. For example: madam, or eve. Write a piece of code that checks whether a string is a palindrome, returning true if so or false if not.
  2. Write a program which takes two words from the user, checks which one comes first in the alphabet, and tells the user.

Some Helpful Functions

While covering every single Python string function in a single document is unlikely to be helpful, we'll cover some of the most commonly used and helpful here.

For full documentation of all the functions available for strings, check the Python standard docs.

string.upper() returns a copy of the string in UPPERCASE.

string.lower() RETURNS A COPY OF THE STRING IN lowercase.

string.strip() returns a copy of the string with whitespace (i.e. tab, space, newline) removed.

string.lstrip() and string.rstrip() do the same for the left and right-hand sides of the string only.

string.replace(old, new) replaces every copy of old in the string with new. For example:

rainbow = "red,orange,yellow,green,blue,indigo,violet"
rainbow = rainbow.replace(",","\n")
print(rainbow)

Finally there are a couple of methods which aren't part of the string, but built-in Python functions which are helpful for working with strings:

ord(x) returns the character code of a single character of input. chr(x) returns the character corresponding to a character code.

num = ord("h")
print(num)

string = chr(128578) * 3
print(string)

A Note on Regexps

One of the most powerful ways to work with strings is to use regular expression (regexp) syntax. This is a special language to work with strings, especially for search, replace, and validation operations. JS supports regexps by default.

Unfortunately regexps are quite complex and in fact comprise their own type of language (the regular languages). They are also notorious for being difficult to understand and debug. For these reasons we're not going to cover them in this part of the course, but will instead cover them as part of form validation in frontend development.

Exercise 2

2.1

Start from the following example:

from random import choice

to_upper = choice([True,False])

Write a Spongebob meme generator. This should take an input string from the user. For each character, use the random method provided. If it yields True, then make the current character uppercase. If it's false, make the current character lowercase.

Finally, print out the resulting string.

2.2

Start from the following example code:

triangle = [
  "    x",
  "   xx",
  "  xxx",
  " xxxx",
  "xxxxx",
]

for row in triangle:
  print(row)

Figure out a way to make the triangle lean the other way without ruining the shape.

Assignment

The Vigenere cipher takes some message (the plain text) and encodes it to make it harder to read. It does this by shifting characters according to some key phrase. For each character in the key phrase, shift the corresponding character in the plain text by a number corresponding to its position in the alphabet, starting from 0. If this is higher than 26, return to a.

For example:

Key: apple
Plain text: the bird is singing

Key letter: a (0)
Plain text: t (19)
Output: t (19)

Key letter: p (15)
Plain text: h (7)
Output: w (22)

The whole ciphertext is: twt mmrs xd wicvtrg

To experiment more you can use this tool.

Your task is to write a version of the Vigenere cipher which takes a key phrase and a plain text and outputs a cipher text. You must also write a decoder, which takes the cipher text and outputs the cipher text.

For security reasons, you must treat all input as suspicious and regularise it by:

  1. Removing all whitespace
  2. Converting the entire text to either uppercase or lowercase, but not a mixture of the two.
  3. Removing any non-alphabetical character (i.e. one that's not between a and z).

You must also make sure that your code loops around at Z. For example, if we shift Z with the key character b, it must output a, not a non-alphabetical character.