36 posts tagged feature
36 posts tagged feature
The “master/slave” terminology has been in use for many years in the database world, but those terms can have racially-charged and offensive connotations. So I wholeheartedly applaud Django for getting rid of them.
However you may be wondering what on earth those terms were used to describe in the first place. To explain, we need a little background on database systems.
Persistence and Consistency
"Database" is one of those generic computer science/engineering terms that applies to a bunch of different things. However in this context we’re talking about software that stores and retrieves organized collections of data.
Two of the most important properties of a database system are persistence and consistency. Persistence means that the database doesn’t lose data. Consistency means that data always obeys any rules required of it. For example, a voting system will likely require that every user can only vote once. It turns out to be very tricky to guarantee both persistence and consistency in a reliable way, a topic I may return to in a future post.
Persistence typically requires that the data be stored on more than one machine. Otherwise the failure of a single machine may cause data to be permanently lost. This is known as replication.
Replication makes achieving consistency harder. For example, if two different machines have different versions of the data, which is the true version? One common way of dealing with this is “primary/secondary” replication.
In Primary/Secondary replication, one of the machines in the system is chosen as a primary or master, and all the other machines are designated as secondaries or replicas. All write operations (creating, updating and deleting data) go to the primary. After processing the write operation, the primary sends a message to each of the secondaries telling them to apply the same write operation.
In this scheme the secondaries are always a little bit “behind” the primary. The time between processing a write on the primary and processing it on a secondary is called the replication lag of that secondary. Because the secondaries may not be completely up-to-date, the primary is considered to be the “source of truth”. If you need absolute consistency you must read your data from the primary. However if you can tolerate a little staleness, you can read from the secondaries, and take some of the load off the primary.
If the primary breaks down, one of the secondaries can be promoted to become the new primary. However before it can do so it must process all the outstanding replication messages it received from the old primary, to ensure that it’s up to date. The process of choosing a secondary to promote is called master election.
What’s in a Name?
As you can see, the old “master/slave” terminology is not only offensive but also a misleading analogy (can a “slave” be promoted to “master”?). The term “replica” is both inclusive and informative, and using it is a win-win. Kudos to Django for this change I hope many more projects follow suit.
In my last CS101 post we began to discuss different programming language paradigms, and how they affect the performance, correctness and universality of programs. In this installment we’ll provide more detail on one of the most important distinctions between programming languages, namely how they provide modularity.
Programs can be huge: Even a modest webapp may consist of hundreds of thousands of lines of code. One of the main concerns a programmer has is how to structure a program so that it’s readable and maintainable by herself and others. Having one gigantic lump of code is as untenable as trying to build a house from a single giant pile of bricks, lumber and hardware.
Instead, much of the art of programming consists of breaking the program down into components, or modules, and then further breaking down those modules into submodules, and so on down. Modules provide organization and structure through separation of concerns, i.e., a single module is responsible for a single area of functionality.
GitHub is in the news right now, and not in a good way: Engineer Julie Ann Horvath has resigned from the company, alleging serious incidents of harassment and inappropriate behavior.
People seem to be focusing on some of the more prurient allegations - office hula-hooping and so on - but are largely ignoring what to me might be the worst offense: another engineer “passive-aggressively ripping out [her] code from projects [they] had worked on together without so much as a ping or a comment.”
Unfortunately, outside the engineering community it’s not always obvious what GitHub is, and why this kind of behavior, bad anywhere, should be especially unacceptable there. But to understand GitHub, you first need to understand how software engineers work together.
In my last CS101 post I described how programming languages are an intermediary between human language and machine code: the logic operations implemented by a computer’s circuits. In this, and my next few posts, we’ll look at programming languages in more detail, and discuss different language designs and capabilities.
The earliest and simplest programming languages were assembly languages. An assembly language is just a slightly more human-readable version of a system’s machine code. It uses mnemonics to represent individual logic instructions, and allows the programmer to use labels to refer to memory locations. A program called an assembler turns assembly language code into proper machine code.
Y Combinator co-founder Paul Graham got into some hot water recently for controversial comments about women in tech. This follows a previous dodgy statement of his about “founders with foreign accents”, and another saying “I can be tricked by anyone who looks like Mark Zuckerberg”.
I’ve never met Graham, but by all accounts he’s a well-meaning guy who doesn’t maliciously discriminate. And he acknowledges at least some of his prejudice, which is more than most people who worship at the cult of meritocracy are willing to do. Unfortunately, however, the underlying bias emerging from Graham’s statements is common in Silicon Valley.
This bias is typically not due to an intent to exclude women, immigrants, people of color or older people. Instead it’s the result of an unhealthy and extreme reverence for one, and only one, archetype: The Hacker.
Boyz n the Hoodie
The Hacker archetype is a 20-something, hoodie-wearing, white male. He grew up with enough economic privilege to be able to teach himself programming as a teen. He’s so consumed by computing that he no time for such trivia as social skills or a dress sense. His drive and ambition to “innovate”, “disrupt” and “change the world” leave him with little patience for rules or standards of conduct. His mantra is “Move Fast and Break Things” (especially other people’s things). He’s the Silicon Valley realization of two tropes rolled into one: the pioneer and the self-made man (who is almost always a man, and almost never self-made).
The platonic form of The Hacker is, of course, Mark Zuckerberg.
Now, Paul Graham claims to have been misquoted. I take him at his word that his comments about women were only intended in a narrow context. But his correction, with its further harping on and on about “hacker this” and “hacker that”, is actually more revealing than his original statement. All his statements above, including this correction, make it abundantly clear that, to him, the only kind of person who counts is The Hacker. And Graham is far from alone in this thinking.
When Did a Hack Become a Good Thing?
In journalism, a “hack” is a pejorative term for a writer who churns out large amounts of low-quality content. In programming, a “hack” denotes an inelegant solution, typically to band-aid over a problem until a more comprehensive solution can be written.
Yet somehow, in the past decade, “hacker” became a compliment. Facebook, for example, built its entire corporate culture around “The Hacker Way”, a mish-mash of truisms about fast iteration and prototyping, such as “done is better than perfect”.
Graham takes this even further, staking out a distinction between CS majors and hackers, to the detriment of the former. For example:
The problem with that is I think, at least with technology companies, the people who are really good technology founders have a genuine deep interest in technology. In fact, I’ve heard startups say that they did not like to hire people who had only started programming when they became CS majors in college.
Somehow, being a self-taught, college-dropout programmer with no industry experience has become not a liability but a badge of honor. This is a great shame, because true technological innovation often requires knowledge, experience and maturity.
Many conversations about “tech” are actually about products, or worse, about money. Modest UX tweaks are frequently lauded as “innovation”. But there’s also a lot of truly heavy lifting to be done in the tech industry, and this requires expertise, talent and rigor, qualities that we must look beyond the “hacker pool” to find.
It’s hard for an early-stage investor to predict eventual returns based on little more than a pitch deck. There are few objective measures by which to judge an early-stage startup. So VCs fall back on “intuition”, sometimes more honestly referred to as “pattern matching”. And what better pattern to match than ur-hacker Zuck, the founder of a company that went from $0 to $100B in eight years?
The trouble is, what’s really going on is mostly just confirmation bias and selection bias, and on a hopelessly small sample size at that. “Pattern matching” is really just an anodyne synonym of “prejudice”.
It may not look like prejudice, because the focus is less on what you are (a woman, a person of color, over 40) than on what you’re not (a Hacker). So it may not be grounded in overt sexism or racism, but it’s all the more insidious for that. At least with Pax Dickinson you know what you’re getting into. It’s harder to deal with discrimination that isn’t aware of its own existence.
Hacking The Hacker
This prejudice’s obsession with a single archetype is also its weak spot: Deconstruct the Hacker and you weaken the bias it engenders. By challenging various aspects of this archetype we can reduce its near-mystical significance in Silicon Valley. Take away the pattern, and pattern matching becomes much harder.
So think of this as a call to arms: Let’s hack The Hacker!
It’s not that conforming to the Hacker archetype is bad of itself. It’s that mythologizing just one type of person necessarily excludes others from access to capital, jobs and other resources. Not to mention the fact that it also creates a lot of false positives: bad ideas that get funding because the founder “looked right”. And such poor allocation of capital is bad for everyone: investors, hackers and the rest of us.
So the goal is not to take down any individual, but to rid the Hacker ethos of its glamor. To say that it’s fine to be a Hacker, and equally fine not to be one. Whatever your background, and however you got to where you are, investors like Graham should have open minds about you and your ideas.
As we’ve discussed in previous installments, computer programs are sequences of instructions that tell computers what to do. Any software that runs on a computer - be it Mac OS X, Google Chrome, the Apache web server or Candy Crush Saga - is a program, and someone has to write that program.
The problem is that people speak English(*) while computers understand the 0s and 1s that trigger their circuits. This is where programming languages come in.
Meeting The Computer Half Way
A programming language is an intermediary between English and the low-level instructions computers understand. It’s a compromise between the looseness of natural languages and the structured formality required for machine interpretation.
For example, a human might say:
I want to print all the whole numbers from 0 to 9.
And another human might express the same idea with a different sentence:
I want to print all the non-negative integers less than ten.
This loose informality of natural language makes it unsuitable for communicating with computers. A computer natively understands only the very low-level machine code instructions baked into its circuits. But humans can’t easily compose machine code directly.
Instead, the human writes instructions as a program, in a programming language.
For our example, we’ll use the programming language Python:
for number in range(0, 10): print(number)
This program is structured enough for a computer to interpret, but also “English-y” enough for a human to write. In this case, even with no programming experience at all, you can probably figure out what it means.
If your computer has Python installed you can see for yourself that computers have no problem understanding this program. On a Mac, go to Finder -> Applications -> Utilities -> Terminal, type
python -c “for number in range(0, 10): print(number)”
into the terminal window and hit enter.
Compilation and Execution
What’s going on here?
"python" is a command (**) that runs programs written in the Python programming language. When you run it as above it does two things:
Compilation is the act of turning the program from Python into machine code. Execution is the act of applying the machine code to the computer’s circuits. If the program is written correctly then the execution will yield the result the programmer intended.
The program above compiles into machine code that looks something like this:
0 SETUP_LOOP 28
3 LOAD_GLOBAL 0
6 LOAD_CONST 1
9 LOAD_CONST 2
12 CALL_FUNCTION 2
16 FOR_ITER 11
19 STORE_FAST 0
22 LOAD_FAST 0
27 JUMP_ABSOLUTE 16
31 LOAD_CONST 0
Note that even this machine code is illuminated using English-y words. What the computer really sees, of course, is just a lot of zeros and ones.
Programming is the art and science of turning informal ideas into a description just formal enough for a computer to understand. The computer then takes it the rest of the way:
In my next post I’ll discuss some of the programming languages in common use today (and also explain that I slightly cheated in describing the output of Python compilation as machine code).
(*) No anglocentrism intended. But in practice programming languages are always English-based.
(**) This so-called command is itself a program: a special kind of program whose job is to run other programs! You may be wondering what language the “python” program itself is written in? It can’t be written exclusively in Python because then what would run it? This and more will be discussed in my next post.
"We set sail on this new sea because there is new knowledge to be gained, and new rights to be won, and they must be won and used for the progress of all people."
- President John F. Kennedy, “Moon Speech”
Rice University, September 12th 1962
Much of the coverage of Calico, the Google-backed venture to extend human longevity, has focused, sometimes skeptically, on the end goal. People are asking: can anyone - even Google - really defeat aging?
That question misses the point.
Larry Page has referred to Calico as “moonshot thinking around healthcare and biotechnology”, and that metaphor is no accident. The original moonshot did achieve its end goal. But, equally importantly, it triggered a wave of basic R&D that transformed the technology landscape.
The huge budgets ($25 billion, or well over $100 billion in today’s dollars) spent on the Apollo program in the 1960s had three complementary benefits beyond the direct achievement of landing a human on the moon:
A biomedical “moonshot” program like Calico could drive similar benefits, albeit on a smaller scale, given the more modest budgets. Promoting basic R&D in biochemistry, robotics and other sciences may yield spinoff technologies that will benefit our lives and capture our imagination, regardless of whether it actually achieves a large increase in human longevity.
A “failure” of Calico may still be a huge success, and we should weight the merits of this new venture just as much by the supposedly ancillary benefits as by the progress towards the end goal.
Fittingly, the venture of extending life has this in common with life itself: the true purpose is not the destination but the journey.
It’s often the case that the true value of a startup lies not with its technology, or even with its user base, but with its data. When millions of people use your service every day, you almost can’t help gathering large amounts of interesting data about what they do. For example:
It doesn’t take much to see the value of this data: Google can rank the results people actually click on higher in future searches, Foursquare can use check-ins to make better, more personalized recommendations, and Uber can use ride data to predict demand and ensure an adequate supply of cars at needed locations.
I wouldn’t normally see a Frat Pack comedy in theaters. That’s what mainscreen entertainment on United Airlines flights is for. But as a Google alum, I was curious to watch the The Internship. So I went to see it this weekend, with a group of current and former Googlers. Spoiler alert: It’s
[Update 7/1/2013: Since BI decided to link to this post, and at least one person who worked on the movie took offense, I’ve revisited this and have some clarifications.
I regret the use of the word ‘terrible’. From a craft perspective the movie is actually very well-made. The BI writer is correct in saying that my objections pertain to sociology, not moviemaking. But when the movie mocks ‘my people’ as laughable stereotypes, I take that personally.
There is PLENTY to poke fun at in Silicon Valley. Our inflated sense of self-importance, for one. In fact, the best moment of the movie is when Max Minghella’s character, assembling his rival team, asks an intern:
- “Where did you go to school?”
- "The University of-"
That had the ring of truth to it. It was a deft poke at Google’s (former?) obsession with academic excellence.
But the movie made too few forays down that path, and instead went mostly for stereotype-pandering, especially of women. If you’re offended by my review then I’m sorry, but I’m also offended by your movie…]
To an ex-Googler the movie may be mildly entertaining. Not because it’s particularly funny, but as an extended game of “spot the cafeteria”. And I don’t mind the obvious nonsense, such as the Hunger Games-like intern job competition, or the apparent lack of any distinction between different roles at a company. I can stomach those as fictions necessary to create a story. No, what makes The Internship excruciating is the lazy pandering to every imaginable Silicon Valley stereotype.
The human digestive system is wondrous. Complex organs and glands process a wide variety of foods, channeling energy and nutrients into the bloodstream while diverting waste out. We each walk around with an amazing little factory inside us.
Creationists use complex biological systems like these to argue for the existence of a divine creator. They say that no evolutionary process could have created something so marvelous. And it does indeed seem miraculous. Rather, they claim, these have to be the product of ‘intelligent design’ (ID).
However, on closer inspection, the digestive system does exhibit some puzzling design choices. For example, the digestive and respiratory systems share an entrance: Every bite of food we eat passes perilously close to the trachea, stopped only by the epiglottis contracting when we swallow. And indeed, thousands of people choke to death every year in the US. Doesn’t sound very intelligent at all, does it?
Biological ‘hacks’ like the epiglottis betray the fact that it is not, after all, intelligently designed. Rather, it’s the result of blind evolution by natural selection.
From Biology to Software
Why am I going on about the digestive system? Because software systems, like biological ones, involve large, complex designs built up from small simple ‘cells’. And so software design too can either be evolved or ‘intelligent’ (*).