Basics of Networking

In the previous class, we introduced the basics of setting up a web server with Node.js. However, even with our simple sample code, there were substantial questions we didn't answer. For example why do we need to type localhost:3000 into our browser? What does that mean?

The Web Protocols

In order for computers to communicate with each other over networks, they need to have common standards for communication. These are called protocols. Protocols are different to the software in which they're implemented: they're ideas and standards which are agreed on by organisations with an interest in web governance. The most venerable of these is the IETF, which was foundational to the existence of ARPANet and later, the internet.

Web protocols are usually developed via RFCs or Requests for Comment. The first RFC is from 1969!

Protocols are structured as a series of layers. The exact number of these varies depending on definition, but the OSI Model defines 7. Each layer builds on the previous one and becomes more abstract in turn. Let's look at them from the lowest level and build up.

As software engineers (rather than network engineers) we typically only care about layers 3 through 7, but it's helpful to have some understanding of all 7 layers.

The physical layer (1) deals with how information is physically communicated between two devices - for example, via electricity in copper wires, via laser in fiber connections. Examples of protocols here are the CAT standards for ethernet cables, USB standards for connecting devices.

The data link layer (2) deals with how errors and anomalies in the physical layer (for example, electrical interference) are handled in order for signals to be communicated. It also deals with how physical devices can be distinguished from one another on a single network. The most important part of this for backend engineers is the MAC address, which controls how devices on a local network are recognised.

The network layer (3) deals with how information is routed across multiple interconnected networks (i.e. intranets and the internet), mostly dealing with technical details, like how network switches direct your web requests to the correct server. At this level we begin to care about IP addresses, which provide a way to connect to local and remote devices or networks. IPv4 addresses look like 192.0.0.1, while IPv6 addresses look like 2001:db8:3333:4444:5555:6666:7777:8888.

The transport layer (4) deals with how different types of data are transferred from one device to another across a network. Two very important protocols in this layer are TCP/IP and UDP/IP.

  • TCP/IP guarantees no data is lost between devices, and that data always arrives in the correct order. However, in order to do this it needs to error check and re-request any data that's missing or damaged. This makes it slower than UDP. It's the most common protocol for sending data over networks.
  • UDP/IP doesn't guarantee that data is not lost, or that it arrives in any particular order. However, in exchange for possibly losing data, it can operate at much higher speeds. This makes it suitable for applications where we care less about exact data, and want higher speed - e.g. video streaming.

Another important protocol which operates around the transport layer is TLS (transport layer security), which handles encrypting and decrypting secure data between two devices. The older version of TLS was called SSL, and you'll still often see references to TLS as SSL.

The session layer (5) deals with setting up and controlling connections between two different devices. One common protocol on this layer is the DNS (domain name service) protocol, which allows us to use convenient domain names like fd93.me instead of remembering the IP address of every computer we want to connect with.

The presentation layer (6) deals with transforming data from lower-level specifications (i.e. likely a string of bytes - zeroes and ones) into a format that can be used by software running on a device - i.e. it prepares data for the application layer.

The application layer (7) deals with data which directly interacts with an application running on a device. This contains the majority of things that you're likely to interact with on a day-to-day basis as a developer. There are many application-layer protocols, some of which only exist for one program, others which are widespread. Some examples of application layer protocols are HTTP (for webpages) and SSH (used for remotely connecting to servers).

Knowledge Check

  1. What's the difference between a MAC address and an IP address?
  2. When and why would we want to use TCP?
  3. When and why would we want to use UDP?
  4. What's the main benefit of the DNS protocol?
  5. Which OSI layer implements the HTTP protocol?

Discussion Questions

  1. Why might we be interested in MAC addresses, given they operate on a much lower level than we're normally addressing?
  2. One use of UDP is for fast streaming video. Can you think of any other applications where UDP might be helpful?

IP Addresses, Ports and Domain Names

As discussed above, an IP address allows us to connect to another device or network, and DNS addresses provide a convenient shorthand for this. We're now in a good position to understand what localhost:3000 means.

There are a number of special IP addresses and ranges which are used for different purposes:

  • 127.0.0.1 is a 'loopback' and usually goes back to the computer you access it on (the local host).
  • 192.168.*.* is used for local networks (i.e. not open to the internet).
  • 10.*.*.* is also used for private networks (often larger-scale than 192.168 as it's 256 times larger).
  • 0.0.0.0 is used as a shortcut to mean "listen on all devices". If you listen on 0.0.0.0 you are normally letting other devices access your computer.

The DNS protocol lets us assign an easy-to-remember name to any IP address. As it so happens, the normal shortcut for 127.0.0.1 is localhost, and most web browsers will respect this.

The last missing piece of the puzzle is the port. This is a number which is assigned to listen for requests on. Most application-layer protocols have a regular port which they listen on:

  • HTTP is on port 80
  • HTTPS is on port 443
  • SSH is on port 22

However, it's not mandatory to listen on these ports for a given application. In fact, for a number of reasons we might not want to do so! It's also common to listen on unusual ports like 3000, 5000, 8000 and so on, then use a piece of software called a reverse proxy to connect these to the usual web ports.

In order to connect to a given port on a server we can add :PORT to the end of its domain name or IP address.

So entering localhost:3000 in our address bar really means "connect to this computer (127.0.0.1) on port 3000 and display what you get in a response."

Knowledge Check

  • You want to test some software by connecting to port 1337 on your local computer. What should you type into your address bar?
  • You want to connect to your phone, which is on the same home network as you. Which IP address is it likely to have?
  • You're at work as a video editor and download some files from a file server. You notice that you're connected to 10.137.10.3 but aren't sure what this is. Should you be worried? Why or why not?
  • What's the usual port for HTTP?
  • Do we always need to use the same ports for the same application layer protocols?

Exercise

Try running ip addr on your system. You should get a lot of information about your own IP addresses. You probably have at least 2 and maybe more!

A common tool for testing and finding out information about networks is nmap, which is available on all UNIX systems via package manager.

You can use nmap URL to get some basic information about computers, including their IP address and open ports under 1000. Note that you probably shouldn't do this to random web servers as an nmap scan is a common first step in hacking a system.

Let's try it with the following URLs:

  • localhost
  • google.com
  • fd93.me

What's the same? What's different?

Try running our backend app from last week and running nmap localhost again. What happens?

Further Reading