How CSIRO’s Red Belly blockchain manages 30k TPS and what it means
Where does Red Belly stand in the grander scheme of blockchain things?
CSIRO and Sydney University have completed the first global test of their Red Belly blockchain system, clocking it at 30,000 transactions per second (TPS) when deployed across 1,000 virtual machines over 14 Amazon Web Service Geographic regions.
To put this in context, when the Red Belly system was tested a year ago it racked up on-paper numbers of 660,000 transactions per second. But that was only on 300 machines in a single data centre.
That the numbers are so much lower in more realistic conditions highlights one of the main problems facing blockchain technology today: networks need to reach consensus quickly, but real world conditions are unpredictable.
Context is important. Understanding the Red Belly numbers means comparing equivalents, not just on TPS alone, but also on how they manage to achieve those numbers.
The 0 - 2,000 TPS range full public blockchain
Bitcoin does about 7 transactions per second, Ethereum is said to be about 10 to 15. That's fairly standard for clunky first generation public blockchains. Zilliqa might currently hold the crown for the highest fully public blockchain throughput, testing at 1,000 to 2,000 TPS through its use of sharding.
These are said to be fully public blockchains, meaning anyone can fire up a node and start validating transactions on these networks. It also means anyone can fire up a node to try to deliberately sabotage transactions or attack the network.
Bitcoin assumes that at least 51% of the network is honest, meaning it can theoretically be fine even if 49% of its hashing power is actively trying to destroy it from within. This makes it as sturdy as a rock, but extremely slow.
When someone talks about byzantine fault tolerance, they're talking about what kind of assumptions a network is making of its nodes.
By changing the network design in a way that lets you relax these assumptions, you can get a huge jump upwards in TPS.
The 2,000 to 10,000 TPS range semi-centralised blockchain
XRP Ledger clocks in north of 1,500 TPS with a system of innate node quality control. There are no direct rewards for being an XRP Ledger validator, so it's mostly a semi-organised group of institutional users who see XRP Ledger as a valuable tool. This level of organisation and public nature of validators means XRP can relax certain security assumptions in order to achieve more TPS.
In this case, it assumes that at least 80% of its nodes are doing the right thing and that the actual real users can simply get together and pursue the "real" fork even if there is a hostile attack, which kind of removes the incentive to even bother attacking it. So XRP's security assumptions are a bit more brittle and centralised than bitcoin's reassuring proof of work, but they allow for much higher TPS.
Others, such as EOS and VeChain Thor go even further by combining similarly relaxed assumptions with a capped amount of nodes.
EOS manages to test at about 5,000 TPS. It does this with a system that only has 21 nodes. With so few of them, information can fly through the network much faster resulting in much higher TPS, broadly similar to how Red Belly tested at 660,000 TPS on 300 machines but only 30,000 on 1,000 machines.
EOS also lets transactions pass through with the approval of only 14 nodes. This gives it a relatively low threshold for consensus (further improving TPS), but also means it assumes that no more than 33% of nodes are hostile. The end result is a system that's fast, but highly centralised and dependent on some eyebrow-raising assumptions.
VeChain Thor manages to test at 10,000 TPS with a broadly similar largely-centralised system which depends on masternodes staking their real world identity and reputation on their node behaviour, and the assumption that this will keep them in line. It also boosts TPS with a system that hands more decisions to active and faster nodes, letting them reach the required consensus threshold sooner.
Basically, this category manages to greatly increase TPS, but at the cost of centralisation and potentially riskier assumptions.
Visa's 56,000 TPS
Visa has tested at 56,000 TPS. It's a big number for the same reasons most distributed ledgers and blockchains have smaller numbers. Visa is entirely centralised with two USA-based "nodes", each of which is a data centre capable of running the network by itself. The main reason there are two centres is in case one gets struck by an earthquake, or something like that, rather than because both are contributing to network throughput.
So that 56,000 TPS comes from a single data centre housing a few hundred tightly connected servers. It's extremely centralised, non byzantine fault tolerant because it doesn't need to be, and consequently it doesn't need to wait for any kind of consensus at all. It can more or less just run at the maximum speed its hardware and infrastructure allows.
Red Belly's 30,000 TPS
With this context, it's much easier to appreciate the significance of Red Belly's 30,000 TPS test. It's a pretty big number by itself, but it's even bigger than it might seem for a byzantine fault tolerant system test across 1,000 virtual machines spread across 14 different geographic regions, including Australia, Canada, the United States, UK, Germany, Brazil, Japan, India, South Korea and Singapore.
This new research (PDF) is what makes it possible. It's a method the researchers call democratic byzantine fault tolerance (DBFT).
How is DBFT so damn fast?
Note that the following explanation is more analogous than technical. It's intended to help give a broad, rather than a detailed, understanding.
So, a typical blockchain works by beaming transactions around the network and waiting for enough nodes to acknowledge it. The number of nodes you have to wait for depends on the assumptions you make when designing the system. You can make "relaxed" assumptions for more TPS or "stricter" assumptions for more security.
The kinds of assumptions you can make depend on how the network is designed though. So a system that lets you relax assumptions can help a great deal. A central "coordinator", for example, can act as a central point of trust to kind of soak up some of the more annoying assumptions and give you more leeway to design a faster system. It's relatively rare in public networks, but IOTA is temporarily bootstrapping its network with a central coordinator, Hyperledger uses a coordinator, and a few others do too.
Of course, you now have to start making assumptions of the coordinator. Classically, the coordinator is a "strong coordinator", who is like the leader of the network, signing off and finalising transactions, with a new coordinator being chosen semi-randomly from the available nodes each time. It can be faster because you only have to wait for the coordinator to finish up, rather than waiting for the majority of the nodes.
The problem is that the entire thing can fall apart if the coordinator is unreliable, which it's almost guaranteed to be at some point whether due to Internet outage, earthquake, malicious intent or being a victim of a DDoS attack. Strong coordinators can theoretically get much more speed in a network, but require you to relax your assumptions past the point of reasonableness.
But DBFT is a new way of relying on a "weak coordinator". So even if the coordinator is slow or faulty, the system can keep working. So it's kind of like a system where you can offload some assumptions onto the coordinator (for more TPS), and can then relax those coordinator assumptions for even more TPS without crossing the line like you would with a traditional coordinator.
Rather than giving orders like a strong coordinator, the DBFT weak coordinator just gives suggestions, and the network isn't dependent on its suggestions to finish signing off on transactions.
This opens the floor to further designs which can add even more speed. In the case of Red Belly's DBFT, this includes a way for processes to complete asynchronous rounds as they reach certain thresholds. In other words, subsets of nodes can sign off on transactions and then start working on others, instead of needing to wait for a majority of nodes to chime in. The coordinator helps speed up the process dramatically, but isn't essential for the network to reach consensus.
Where does Red Belly fit in?
The end result of DBFT and Red Belly is a system intended for consortium blockchains, which are hybrids of public and private blockchains.
- Public: Tend to have lower TPS and requires strict security assumptions because no one can be trusted.
- Consortium: Middle TPS, security assumptions can be looser than public blockchains but still must be tighter than private blockchains. A certain amount of trustworthiness can be expected.
- Private: Tend to have highest TPS, can operate with loosest security assumptions because everyone is presumed to be trustworthy.
The biggest remaining question marks might be whether Red Belly can operate with similar efficiency in the real world, rather than just real world-like tests, and whether there are any obscure vulnerabilities or problems that might have flown under the radar.
Overall, DBFT and Red Belly seem to be extremely impressive.
Disclosure: At the time of writing the author holds ETH, IOTA, ICX, VET, XLM, BTC, ADA
- Dutch government exploring blockchain digital ID via hackathon
- Bitmain reveals new high-power Equihash miner
- Binance brings AUD-BTC to Australia with Binance Lite
- IBM’s World Wire launch will likely go down as a major event in payments history
- Ethereum likes ProgPoW but actually getting things done is hard