Easily understand cryptocurrency fault tolerance with Byzantine Generals
Vitalik Buterin's theoretical 99% fault tolerant model is a great time to dig up the Byzantine Generals.
Fault tolerance isn't new or unique to blockchain. It's found in almost everything in various forms and generally refers to how many bad things you can have in a system before it stops working. On blockchains and other decentralised networks, it generally refers to how many hostile or uncooperative nodes you can have before the network is compromised. Different types of blockchain will have different limits.
Bitcoin, for example, is generally assumed to be 51% fault tolerant (although some argue it's more like 25% or 33%), meaning the network will continue to be functional until at least 51% of bitcoin mining power is hostile or broken. This is the theory behind 51% attacks.
Now, Ethereum founder Vitalik Buterin has proposed a potential model for a 99% fault tolerant blockchain.
This presents a perfect opportunity for using the old Byzantine Generals problem to get an easy and non-technical understanding of the technical side of crypto and what 99% fault tolerance really means.
The Byzantine Generals
The mathematical principles behind fault tolerance in computing systems were illustrated in the 1982 Byzantine Generals paper (PDF), which has become a cornerstone work in cryptocurrency. It's also why the Ethereum network's current version is named Byzantium.
The paper uses the analogy of Byzantine generals trying to lay siege to an enemy city. They have the city surrounded and need to collectively decide on the same plan of action (reach consensus) while communicating with all the other generals through messengers. These generals are equivalent to the different nodes in a decentralised network, sending and receiving information to the others but unable to always trust it at face value.
That's because some of the Byzantine generals might be traitors, just like some of the nodes might be hostile or malfunctioning. The paper laid out a way for these generals to communicate with each other (how many other generals they need to talk to, etc) and found that the mighty Byzantine army was 33% fault tolerant. So, as long as at least two-thirds of the generals are loyal, they can all reach the same correct plan of action even with a minority of traitors muddying the waters and sending false information.
By the same token, 33% fault tolerance is kind of a "ground level" of assumed fault tolerance for some blockchains.
Proof of work and proof of stake as Byzantine Generals
Proof of stake (PoS) is an alternative to the proof of work (PoW) systems used by bitcoin and many others.
Proof of work in Byzantine General terms
In proof of work, only one general at a time sends information to all other generals, and its accuracy is determined by subsequent generals verifying the information when it's their turn to tell a story. Each re-telling is known as a "confirmation," and generals keep discovering and building on the truth by adding a little piece to the story one piece at a time. Generals compete for the right to tell the next part of the story through mining.
When it's someone's turn to tell part of the story, they talk about the block they just mined, including the following:
- The block that came before it. This shows that they're agreeing with the previous general's story.
- The transactions that were on the block. Who is sending and receiving cryptocurrencies, and where transactions are going to and from.
- The mining reward earned for that block. This is how much the general earned by giving their information. It's baked into the story, so if subsequent generals give a different version of events, the prior general is also determined to have been lying about earning a mining reward.
Because participating in the storytelling carries a cost (being a bitcoin miner is expensive) and generals are only rewarded if subsequent generals also confirm their version of events, all participants are motivated to continue telling the same story and to be as honest as possible.
The vulnerability comes when the traitorous generals are able to get away with a lie and turn it into the new truth by having enough storytelling power (hashing power) to continually build on their own story and convince the honest generals that their information is the real truth. It's thought that traitorous generals will need to have at least 51% of the network's storytelling power to successfully pull off a lie.
The fundamental rule of traitorous generals is that they can't undo too much of the previous story. They can only undo it a little bit and take it in a new direction.
For example, you generally can't lie about suddenly owning a million bitcoin because those coins need to come from somewhere, and most of the prior story doesn't allow for it. Instead, a traitorous general might sell a bunch of bitcoin, pocket the money and then change the story to one where they never actually sold the coins.
Now they have both the money and the bitcoin, and as far as the victim is concerned, the cryptocurrency they purchased just vanished. Bitcoin Gold was recently hit with this kind of attack to the tune of $18 million.
In Byzantine General terms, a hard fork might be thought of as a group of generals knowingly making up their own story (e.g. "bitcoin has an 8MB block size"), without necessarily trying to fool anyone, and then repeating it among themselves until it becomes a separate truth and forms a separate network.
Proof of stake in Byzantine General terms
Proof of stake is a very different system. Here, all generals are constantly talking and listening to each other, and repeating what they hear. One becomes a general by putting up funds as collateral in a specially designed cryptocurrency wallet.
The actual storyteller is chosen semi-randomly from the speakers, but generally, the more cryptocurrency a general has in their staking wallet, the more storytelling power they have.
This chatty situation is more like the original Byzantine General problem, and it's also a 33% fault tolerant environment. So if more than a third of all generals – or more than a third of all staked wealth in the system – starts telling the same lie, they might eventually convince everyone else.
Proof of stake brings new ways of lying, and other challenges like accounting for the fact that not all generals get messages at the same time and might start repeating outdated information.
It's more efficient than proof of work because the generals are theoretically just focused on actually telling the story (processing transactions), rather than wasting all their energy on simply winning the right to tell part of the story. But it's still highly experimental, and no one's entirely sure if it's secure enough to prevent traitorous generals from taking over the story.
To solve this problem, different projects are looking at new ways of identifying, motivating and punishing generals. For example, a project might introduce a reputation system where the system grades and rewards generals based on their seniority and storytelling history to skew the story towards the version of events given by the most honest-seeming generals.
Ethereum's plan is to use a stick and carrot approach to continually reward honest generals and harshly punish those who try to propagate lies by "slashing" their stake. Slashing refers to confiscating a large portion of a general's stake, which will make attacks prohibitively expensive.
The initial slashing proposals called for a human-driven system where people could earn a finder's fee, or a portion of the slashed amount, by discovering lying generals. But like any human-driven system, it brings a lot of uncertainties.
Buterin's most recent model proposes an automated checking process instead, which can theoretically make a proof-of-stake system 99% fault tolerant.
99% fault tolerant on paper
The suggestion is a system that automatically scans generals to look for ones who are misbehaving as defined by the generals violating specific slashing conditions. This might be thought of as a little robot that roams the generals' camps and keeps an eye out for traitors.
The challenge is that the movements of the robot will also need to be part of the ongoing story the generals are telling. This means it's subject to two main problems:
- Generals might unintentionally give outdated or conflicting descriptions of where the robot is and what it's doing – Buterin calls this the latency problem.
- Generals might intentionally lie about where the robot is and what it's doing – Buterin calls this the threshold problem.
Solving these problems means making certain assumptions when programming and using the slashbot.
One assumption is around network latency and how many generals are sending and receiving up-to-date information. The other assumption is, somewhat paradoxically, around how many generals are traitors. When programming the robot, one needs to program it for a certain fault tolerance (what percentage of generals are lying) threshold. The system is designed to be able to keep running effectively even if one of the programmed assumptions is incorrect. It's only when both are incorrect that the robot will break.
The good news and the bad news
The good news is that this robot can be successful even if one of the assumptions is incorrect. It will only break down if both are incorrect. Significantly, it can also operate in situations where more than half of all generals are traitors, theoretically achieving 99% (or higher) fault tolerance. It does this by sending out information on its findings as it goes, so all the generals are constantly chatting about what the robot has discovered while it works.
The catch is that for it to actually work, the two set assumptions need to mirror each other. So if you want to be on the safe side and assume a high-treachery environment, you must also assume potentially unrealistically low latency. This means it's hard for the robot to operate in a simultaneously high latency and highly-traitorous environment.
Buterin partially gets around this by suggesting a dynamic system that can adjust to exiting network conditions and respond accordingly by undoing some of the incorrect parts of the story if it's discovered that a treacherous general has been getting away with lies.
You can look at any cryptocurrency through the Byzantine General problem to get a better understanding of the differences between protocols and how secure, or insecure, a network is without getting too technical.
This can be applied to almost any cryptocurrency. Just look at how generals are chosen and how they tell their stories.
- Bitcoin: Proof of work as above.
- Ethereum: Currently proof of work as above, but working on implementing proof of stake, also as above.
- EOS: There are 21 generals who were promoted to the position in a sham election. They are able to lie with impunity and there are no mechanisms to stop collusion. If 15 of 21 generals tell the same lie, it becomes the truth. Network security runs on the honour system.
- IOTA: Each separate IoT-connected device is a gossipy general. Each time a general wants to pass along information, they're required to listen to two other randomly-selected generals' information. Eventually, the generals will theoretically gossip themselves to consensus.
- XRP: Anyone can become a general in XRP as long as the other generals are willing to trust them. Most of the generals to date are banks and tech companies. Theoretically, Ripple is just another general, but in practice it might have played an outsized role by cultivating the list of trusted generals that has become the de-facto standard.
Disclosure: At the time of writing, the author holds ETH, IOTA, ICX, VET, XLM, BTC and ADA.