The “master/slave” terminology has been in use for many years in the database world, but those terms can have racially-charged and offensive connotations. So I wholeheartedly applaud Django for getting rid of them.
However you may be wondering what on earth those terms were used to describe in the first place. To explain, we need a little background on database systems.
Persistence and Consistency
"Database" is one of those generic computer science/engineering terms that applies to a bunch of different things. However in this context we’re talking about software that stores and retrieves organized collections of data.
Two of the most important properties of a database system are persistence and consistency. Persistence means that the database doesn’t lose data. Consistency means that data always obeys any rules required of it. For example, a voting system will likely require that every user can only vote once. It turns out to be very tricky to guarantee both persistence and consistency in a reliable way, a topic I may return to in a future post.
Persistence typically requires that the data be stored on more than one machine. Otherwise the failure of a single machine may cause data to be permanently lost. This is known as replication.
Replication makes achieving consistency harder. For example, if two different machines have different versions of the data, which is the true version? One common way of dealing with this is “primary/secondary” replication.
In Primary/Secondary replication, one of the machines in the system is chosen as a primary or master, and all the other machines are designated as secondaries or replicas. All write operations (creating, updating and deleting data) go to the primary. After processing the write operation, the primary sends a message to each of the secondaries telling them to apply the same write operation.
In this scheme the secondaries are always a little bit “behind” the primary. The time between processing a write on the primary and processing it on a secondary is called the replication lag of that secondary. Because the secondaries may not be completely up-to-date, the primary is considered to be the “source of truth”. If you need absolute consistency you must read your data from the primary. However if you can tolerate a little staleness, you can read from the secondaries, and take some of the load off the primary.
If the primary breaks down, one of the secondaries can be promoted to become the new primary. However before it can do so it must process all the outstanding replication messages it received from the old primary, to ensure that it’s up to date. The process of choosing a secondary to promote is called master election.
What’s in a Name?
As you can see, the old “master/slave” terminology is not only offensive but also a misleading analogy (can a “slave” be promoted to “master”?). The term “replica” is both inclusive and informative, and using it is a win-win. Kudos to Django for this change I hope many more projects follow suit.