Primaries and Replicas

The Django project, a well-known Python webapp framework, recently changed all its uses of the database terminology “master/slave” to the alternative terms “primary/replica”. 

The “master/slave” terminology has been in use for many years in the database world, but those terms can have racially-charged and offensive connotations. So I wholeheartedly applaud Django for getting rid of them. 

However you may be wondering what on earth those terms were used to describe in the first place. To explain, we need a little background on database systems.

Persistence and Consistency

"Database" is one of those generic computer science/engineering terms that applies to a bunch of different things. However in this context we’re talking about software that stores and retrieves organized collections of data.

Two of the most important properties of a database system are persistence and consistency. Persistence means that the database doesn’t lose data. Consistency means that data always obeys any rules required of it. For example, a voting system will likely require that every user can only vote once. It turns out to be very tricky to guarantee both persistence and consistency in a reliable way, a topic I may return to in a future post.

Persistence typically requires that the data be stored on more than one machine. Otherwise the failure of a single machine may cause data to be permanently lost. This is known as replication.

Replication makes achieving consistency harder. For example, if two different machines have different versions of the data, which is the true version? One common way of dealing with this is “primary/secondary” replication.

Primary/Secondary Replication

In Primary/Secondary replication, one of the machines in the system is chosen as a primary or master, and all the other machines are designated as secondaries or replicas. All write operations (creating, updating and deleting data) go to the primary. After processing the write operation, the primary sends a message to each of the secondaries telling them to apply the same write operation.

In this scheme the secondaries are always a little bit “behind” the primary. The time between processing a write on the primary and processing it on a secondary is called the replication lag of that secondary. Because the secondaries may not be completely up-to-date, the primary is considered to be the “source of truth”. If you need absolute consistency you must read your data from the primary. However if you can tolerate a little staleness, you can read from the secondaries, and take some of the load off the primary.

If the primary breaks down, one of the secondaries can be promoted to become the new primary. However before it can do so it must process all the outstanding replication messages it received from the old primary, to ensure that it’s up to date. The process of choosing a secondary to promote is called master election.

What’s in a Name?

As you can see, the old “master/slave” terminology is not only offensive but also a misleading analogy (can a “slave” be promoted to “master”?). The term “replica” is both inclusive and informative, and using it is a win-win. Kudos to Django for this change I hope many more projects follow suit.

You Say it to Their Face

“They have no character. They have no guts. They lack courage. I’m old school. If I’ve got something to say, I’ll say it to your damn face. A lot of times people don’t like that, and they’ll punch you. But that’s their opportunity, and that’s the way you do business in this life: You say it to their face.”

Russell Stookey 

A recent episode of This American Life told the story of Gene Cooley, a resident of the small town of Blairsville, GA whose reputation was destroyed by vicious anonymous posts on Topix. The quote above is from Cooley’s lawyer, who successfully sued to unmask Cooley’s online tormentors. The segment is fascinating and well worth a listen, especially given the rise of Secret.

Secret, the recently launched “anonymous twitter” app, has taken the tech industry by storm with its combination of personal confessions, trolling, Silicon Valley inside baseball, and bashing and trashing of companies and people.

Anonymous discourse has its place: It provides a safe space in which vulnerable people can express themselves without fear. It can give a voice to people marginalized by incumbent power structures in other media. But anonymity also creates a place where lies, slander and bullying go unchecked.

In a notable recent example, someone published an anonymous Medium post attempting to discredit Julie Ann Horvath and her claims of harassment at GitHub. Contrast this with Horvath’s courage in discussing her experience. She described specific incidents, named names, and put her own credibility on the line to do so. She took a risk in order to speak out, and I don’t doubt that she’s been threatened and intimidated for it. “Jane Doe” on the other hand, wants to trash Horvath without putting anything on the line, and it’s hard to believe anything he/she says when there are no consequences to lying.

Anonymity can be a corrective against privilege. When the deck is stacked against you, it may give you a safe way to be heard. But, on Secret and elsewhere, it’s too often used in exactly the opposite way. And this is especially grating when so many courageous people, especially women, do in fact speak up publicly, often at great risk.

If you’re going to attack someone publicly, even if you’re convinced you’re telling the truth, use your real name, and stand behind your allegations. Because that’s the way you do business in this life: You say it to their face.

CS101 part 9: Programming Languages (II)

In my last CS101 post we began to discuss different programming language paradigms, and how they affect the performance, correctness and universality of programs. In this installment we’ll provide more detail on one of the most important distinctions between programming languages, namely how they provide modularity.

Modularity

Programs can be huge: Even a modest webapp may consist of hundreds of thousands of lines of code. One of the main concerns a programmer has is how to structure a program so that it’s readable and maintainable by herself and others. Having one gigantic lump of code is as untenable as trying to build a house from a single giant pile of bricks, lumber and hardware.

Instead, much of the art of programming consists of breaking the program down into components, or modules, and then further breaking down those modules into submodules, and so on down. Modules provide organization and structure through separation of concerns, i.e., a single module is responsible for a single area of functionality.

Read More

Git, GitHub and the Ethics of Engineering Collaboration

GitHub is in the news right now, and not in a good way: Engineer Julie Ann Horvath has resigned from the company, alleging serious incidents of harassment and inappropriate behavior. 

People seem to be focusing on some of the more prurient allegations - office hula-hooping and so on - but are largely ignoring what to me might be the worst offense: another engineer “passive-aggressively ripping out [her] code from projects [they] had worked on together without so much as a ping or a comment.”

Unfortunately, outside the engineering community it’s not always obvious what GitHub is, and why this kind of behavior, bad anywhere, should be especially unacceptable there. But to understand GitHub, you first need to understand how software engineers work together. 

Read More

CS101 part 8: Programming Languages (I)

In my last CS101 post I described how programming languages are an intermediary between human language and machine code: the logic operations implemented by a computer’s circuits. In this, and my next few posts, we’ll look at programming languages in more detail, and discuss different language designs and capabilities. 

Assembly Language

The earliest and simplest programming languages were assembly languages. An assembly language is just a slightly more human-readable version of a system’s machine code.  It uses mnemonics to represent individual logic instructions, and allows the programmer to use labels to refer to memory locations. A program called an assembler turns assembly language code into proper machine code.

Read More

Hacking Prejudice in Silicon Valley

Y Combinator co-founder Paul Graham got into some hot water recently for controversial comments about women in tech. This follows a previous dodgy statement of his about “founders with foreign accents”, and another saying “I can be tricked by anyone who looks like Mark Zuckerberg”.

I’ve never met Graham, but by all accounts he’s a well-meaning guy who doesn’t maliciously discriminate. And he acknowledges at least some of his prejudice, which is more than most people who worship at the cult of meritocracy are willing to do. Unfortunately, however, the underlying bias emerging from Graham’s statements is common in Silicon Valley. 

This bias is typically not due to an intent to exclude women, immigrants, people of color or older people. Instead it’s the result of an unhealthy and extreme reverence for one, and only one, archetype: The Hacker

Boyz n the Hoodie

The Hacker archetype is a 20-something, hoodie-wearing, white male. He grew up with enough economic privilege to be able to teach himself programming as a teen. He’s so consumed by computing that he no time for such trivia as social skills or a dress sense. His drive and ambition to “innovate”, “disrupt” and “change the world” leave him with little patience for rules or standards of conduct. His mantra is “Move Fast and Break Things” (especially other people’s things). He’s the Silicon Valley realization of two tropes rolled into one: the pioneer and the self-made man (who is almost always a man, and almost never self-made).

The platonic form of The Hacker is, of course, Mark Zuckerberg.

Now, Paul Graham claims to have been misquoted. I take him at his word that his comments about women were only intended in a narrow context. But his correction, with its further harping on and on about “hacker this” and “hacker that”, is actually more revealing than his original statement. All his statements above, including this correction, make it abundantly clear that, to him, the only kind of person who counts is The Hacker. And Graham is far from alone in this thinking. 

When Did a Hack Become a Good Thing?

In journalism, a “hack” is a pejorative term for a writer who churns out large amounts of low-quality content. In programming, a “hack” denotes an inelegant solution, typically to band-aid over a problem until a more comprehensive solution can be written.

Yet somehow, in the past decade, “hacker” became a compliment. Facebook, for example, built its entire corporate culture around “The Hacker Way”, a mish-mash of truisms about fast iteration and prototyping, such as “done is better than perfect”.

Graham takes this even further, staking out a distinction between CS majors and hackers, to the detriment of the former. For example:

The problem with that is I think, at least with technology companies, the people who are really good technology founders have a genuine deep interest in technology. In fact, I’ve heard startups say that they did not like to hire people who had only started programming when they became CS majors in college.

Somehow, being a self-taught, college-dropout programmer with no industry experience has become not a liability but a badge of honor. This is a great shame, because true technological innovation often requires knowledge, experience and maturity.

Many conversations about “tech” are actually about products, or worse, about money. Modest UX tweaks are frequently lauded as “innovation”. But there’s also a lot of truly heavy lifting to be done in the tech industry, and this requires expertise, talent and rigor, qualities that we must look beyond the “hacker pool” to find.

Prejudice Matching

It’s hard for an early-stage investor to predict eventual returns based on little more than a pitch deck. There are few objective measures by which to judge an early-stage startup. So VCs fall back on “intuition”, sometimes more honestly referred to as “pattern matching”. And what better pattern to match than ur-hacker Zuck, the founder of a company that went from $0 to $100B in eight years?

The trouble is, what’s really going on is mostly just confirmation bias and selection bias, and on a hopelessly small sample size at that. “Pattern matching” is really just an anodyne synonym of “prejudice”.

It may not look like prejudice, because the focus is less on what you are (a woman, a person of color, over 40) than on what you’re not (a Hacker). So it may not be grounded in overt sexism or racism, but it’s all the more insidious for that. At least with Pax Dickinson you know what you’re getting into. It’s harder to deal with discrimination that isn’t aware of its own existence.

Hacking The Hacker

This prejudice’s obsession with a single archetype is also its weak spot: Deconstruct the Hacker and you weaken the bias it engenders. By challenging various aspects of this archetype we can reduce its near-mystical significance in Silicon Valley. Take away the pattern, and pattern matching becomes much harder.

So think of this as a call to arms: Let’s hack The Hacker!

It’s not that conforming to the Hacker archetype is bad of itself. It’s that mythologizing just one type of person necessarily excludes others from access to capital, jobs and other resources. Not to mention the fact that it also creates a lot of false positives: bad ideas that get funding because the founder “looked right”. And such poor allocation of capital is bad for everyone: investors, hackers and the rest of us.

So the goal is not to take down any individual, but to rid the Hacker ethos of its glamor. To say that it’s fine to be a Hacker, and equally fine not to be one. Whatever your background, and however you got to where you are, investors like Graham should have open minds about you and your ideas.

CS101 part 7: Programming

As we’ve discussed in previous installments, computer programs are sequences of instructions that tell computers what to do. Any software that runs on a computer - be it Mac OS X, Google Chrome, the Apache web server or Candy Crush Saga - is a program, and someone has to write that program.

The problem is that people speak English(*) while computers understand the 0s and 1s that trigger their circuits. This is where programming languages come in.

Meeting The Computer Half Way

A programming language is an intermediary between English and the low-level instructions computers understand. It’s a compromise between the looseness of natural languages and the structured formality required for machine interpretation.

For example, a human might say:

I want to print all the whole numbers from 0 to 9.

And another human might express the same idea with a different sentence:

I want to print all the non-negative integers less than ten.

This loose informality of natural language makes it unsuitable for communicating with computers. A computer natively understands only the very low-level machine code instructions baked into its circuits. But humans can’t easily compose machine code directly. 

Instead, the human writes instructions as a program, in a programming language.

For our example, we’ll use the programming language Python:

for number in range(0, 10): print(number)

This program is structured enough for a computer to interpret, but also “English-y” enough for a human to write. In this case, even with no programming experience at all, you can probably figure out what it means.

If your computer has Python installed you can see for yourself that computers have no problem understanding this program. On a Mac, go to Finder -> Applications -> Utilities -> Terminal, type 

python -c “for number in range(0, 10): print(number)”

into the terminal window and hit enter.

Compilation and Execution

What’s going on here?

"python" is a command (**) that runs programs written in the Python programming language. When you run it as above it does two things:

  1. Compiles the program.
  2. Executes the compiled program.

Compilation is the act of turning the program from Python into machine code. Execution is the act of applying the machine code to the computer’s circuits. If the program is written correctly then the execution will yield the result the programmer intended.

The program above compiles into machine code that looks something like this:

0 SETUP_LOOP 28
3 LOAD_GLOBAL 0
6 LOAD_CONST 1
9 LOAD_CONST 2
12 CALL_FUNCTION 2
15 GET_ITER 
16 FOR_ITER 11
19 STORE_FAST 0
22 LOAD_FAST 0
25 PRINT_ITEM 
26 PRINT_NEWLINE 
27 JUMP_ABSOLUTE 16
30 POP_BLOCK 
31 LOAD_CONST 0
34 RETURN_VALUE

Note that even this machine code is illuminated using English-y words. What the computer really sees, of course, is just a lot of zeros and ones.

Programming is the art and science of turning informal ideas into a description just formal enough for a computer to understand. The computer then takes it the rest of the way:

image

In my next post I’ll discuss some of the programming languages in common use today (and also explain that I slightly cheated in describing the output of Python compilation as machine code). 

(*) No anglocentrism intended. But in practice programming languages are always English-based.

(**) This so-called command is itself a program: a special kind of program whose job is to run other programs! You may be wondering what language the “python” program itself is written in? It can’t be written exclusively in Python because then what would run it? This and more will be discussed in my next post.