CS101 part 4: Software

You suffered admirably through my necessary but dense preliminary discussions of boolean logic, binary arithmetic and memory hierarchy. Now comes the payoff - a series of posts about things you’ve actually heard of. First up: software.

I’m sure you have at least a rough idea of what hardware and software are. In fact, if you’re reading this, you probably know a lot of people who write software for a living. But you may be wondering what it means to “write software” or “run a program”. Or you may still marvel at how it is that we can make a pile of electronic circuits into some magical device that can show us pictures of kittens on skateboards.  Read on to find out more!

Instruction Sets

As you no doubt recall from a previous post, computers implement very basic logic and arithmetic using electronic circuits that operate on numbers represented as sequences of ‘bits’ - 0s and 1s. These circuits are known as hardware, to emphasize that they are a physical thing, built out of tiny pathways etched in silicon, that behave in certain useful ways when you run electrical current through them.

An instruction set consists of all the basic logic and arithmetic operations that are available in a particular processor’s circuits. Examples of such basic instructions are: “add the number in memory location A to the number in memory location B, storing the result in B” or “if the number in memory location A is zero, perform the next instruction, otherwise jump to this other instruction”.

These basic instructions are the smallest building blocks of all computer programs, just as written characters are the smallest building blocks of all books. And different types of processor (e.g., an Intel Core i5 vs. a Qualcomm Snapdragon S4) have different instruction sets, just as different cultures have different writing systems.

CISC vs. RISC

The two most common instruction sets in use today are that of the x86 family of processors, used primarily in desktop and laptop computers, and the ARM family of processors, used primarily in mobile devices.

x86 is an example of a CISC (Complex Instruction Set Computing) instruction set: it contains over a thousand different instructions. ARM is an example of a RISC (Reduced Instruction Set Computing) instruction set: it contains only a few dozen different instructions.

Note that this doesn’t make ARM a weaker or less capable processor. It’s just a different hardware design, analogous to the fact that English is written with 26 characters while Chinese is written with thousands of them, but both are equally expressive.

Hardware and Software

A computer program is a sequence of these basic instructions. It tells the computer how to combine these elementary operations in order to perform complex computation. Instructions combine into computer programs just as letters of the alphabet combine to make books.

Running a program is when the computer takes each instruction in a program in turn and activates the hardware circuit it corresponds to. If the program is constructed properly, the result of all those hardware circuits firing will be something useful. Or kittens on skateboards, who knows…

The term software is just a synonym for “computer program”, implying that it’s something logical, as distinct from the physical hardware. 

War and PC 

The analogy to books is instructive: a printing press knows how to typeset individual letters. It is, quite literally, hardware. In this analogy, War and Peace is software. The bridge between the two is the English alphabet: it exists in both worlds, as a physical thing that can be typeset and as a logical thing that can be used to construct words, sentences, paragraphs, chapters, and eventually whole works of literature.

Similarly, the instruction set is the bridge between hardware and software. Each instruction lives in both worlds: as a building block of the software, and as a representative of a physical circuit.  The software is a recipe for how to combine hardware operations, and that recipe is written in their common language - the instruction set.

One important difference from literature, though, is that while novels are linear, intended to be read in a straight line from start to finish, programs are not. Some instructions cause execution to “jump” to a different part of the program, depending on various conditions, rather like those choose-your-own-adventure books from childhood. This has the interesting and useful effect that a running program doesn’t have to end: it can “loop” over the same instructions over and over again. For example, a web server program loops over and over again on instructions that take an incoming browser request for a URL and return the contents of the requested page.

Grains of Sand 

Useful computer programs typically consist of hundreds of millions of instructions. Computers can execute these instructions very, very quickly. The processor clock rate (those ‘gigahertz’ numbers you may be familiar with) is the rate at which the processor can execute instructions. For example, a 1.6 GHz processor can execute 1.6 billion (that’s billion, with a ‘b’) instructions per second. As you can imagine, this degree of complexity is intractable to humans.

Composing programs directly from the instruction set is like building a skyscraper out of individual grains of sand. It’s completely impractical to write any but the most trivial programs this way.

Instead, modern computer systems consist of layer upon layer of increasingly complex building blocks, each constructed from simpler units, all the way down to individual instructions. Software engineers write programs using these building blocks, but in the end, everything converts back down to millions and millions of instructions, each causing a circuit to fire for a tiny billionth of a second.

In my next few CS 101 posts, we’ll discuss some of these building blocks: firmware, drivers, operating systems, programming languages and more.