How I broke Solana's State Hash

How I broke Solana's State Hash

In this blog post I will describe an attack on the state hash that Solana was using until around January. The attack was disclosed to Solana, and they have switched to a different State Hash since then.
﻿
A little over a year ago myself and my co-founder at NEAR Protocol Illia started a series of videos called Whiteboard Series, in which we were talking in front of a whiteboard with founders and core engineers of various protocols digging deep into their technology.
﻿
The episode #2 was with Solana. Solana and NEAR take very different approaches to scalability. Solana scales up via low-level optimizations, while NEAR scales out via Sharding, and thus the episode was a great way to learn about an approach very different from ours. We also later organized a debate between me and Anatoly, the CEO of Solana, about sharding vs no-sharding.
State HashesDuring the whiteboard session Anatoly, mentioned that they use XOR of the hashes of all the accounts as their state hash.
﻿
In blockchain protocols state is the set of all the accounts in the protocol at a particular moment in time. In Bitcoin the state would be all the UTXOs, and in Ethereum it would be all the account balances and smart contract states.
The state hash correspondingly is the hash of the state. Having the state hash at regular intervals included on chain has two major benefits:
It enables light clients: nodes that only track block headers, and validate the proof of work, but do not maintain the state locally, and do not apply state transitions. Such a node can be presented with an arbitrary statement about the state (such as the balance of a particular account) with a proof against the state hash. The node can then validate the proof against the state hash on the chain it believes is canonical, and be certain that the statement is correct.
It enables state sync (also known as "warp sync"), in which a full node that does maintain the full state and does apply state transitions instead of getting up to speed by applying every single block from genesis only performs light client validation until some block in the recent past, downloads state as of after that, validates the state validity against the state hash, and then only fully validates blocks since that block onwards. (Worth noting that it is an active debate in the community whether a node launched this way can be called a full node).
Merkle TreesMost commonly the state hash is the root of a merkle tree. Refer to this blog post from 2015 about the motivation and implementation details in Ethereum. The biggest benefit of using the root of a merkle tree as the state hash is that it provides relatively short (< 50 hashes) proofs for light clients.
﻿
The figure from the post linked above
﻿
The biggest problem with using the merkle tree root as the state hash is that modifying any piece of the state involves updating all the nodes in the merkle tree all the way up to the root. For a simple balance transfer that otherwise would only involve two arithmetic operations and a signature verification it adds an overhead of several dozen hash computations, which becomes the computational bottleneck for the majority of the transactions.
﻿
Almost all the non-sharded protocols that claim tx/sec rates exceeding 200-300 do not account for updating merkle roots in their estimates.
State Hash in SolanaSolana aims to have high thousands of tx/sec without sharding, and thus recomputing merkle trees after each block was not an option. Instead they came up with the following neat idea:
﻿
Change the state hash to be the XOR of hashes of all the accounts. Updating such a state hash after modifying some account involves just two hash computations, and several very cheap binary operations on top.
Such a state hash cannot be used with light clients, but since Solana aims to support high thousands of txs, the expectation is that the tx price would be so low, that light clients can just submit read transactions, and once the read transaction is included on chain, use the output of the transaction with a proof against the outputs hash to confirm the validity of some statement about the state.
﻿
This would remove one of the biggest bottlenecks in the modern protocols!
State Hash CollisionsOne thing that is absolutely unacceptable for a state hash is an ability for an adversary to find another state that would hash to the same state hash. In other words, the state hash must be collisions resistant. If an adversary can find another state that hashes to the same state hash, they can trick light clients into believing certain statements about such other state are true, or trick warp-synced nodes to use the other state as their initial state when they begin operating.
﻿
It is well known that the root of a merkle tree is collision resistant. But is the Solana's hash collision resistant too?
﻿
Anatoly had a strong argument as to why it is. Say the entire state consists of six accounts
﻿
a1, a2, a3, a4, a5, a6﻿
The hash of the state is then the xor of the hashes of such accounts:
﻿
state_hash = h(a1) ^ h(a2) ^ h(a3) ^ h(a4) ^ h(a5) ^ h(a6)﻿
Solana maintains the number of accounts on chain, so for an adversary to create another state with the same state_hash , they'd need to find some other six accounts that have the same xor of hashes. For as long as the hash used is pre-image resistant, the only way for the adversary to do so is to brute-force some number of different account values for each account, hash them, and then choose one hash from each of the six buckets such that their xor equals to state_hash . It is a well-known problem called k-SUM, and there's no known polynomial solution to it. So it seems reasonable to assume that the hash that Solana uses collision resistant?
The AttackWhen analyzing the k-SUM problem, it is usually assumed that k is relatively small. In particular, what is important, it is usually assumed that k is smaller than the number of bits in the elements that are being summed.
﻿
In Solana it is not the case. The number of accounts in the system significantly exceeds the number of bits in the hashes. 
﻿
To see why it is a problem, say that the number of accounts just slightly exceeds the number of bits in the hash: e.g. there's 4096 bits in the hash, but the number of accounts is 5000. If for each account we create just one alternative value, and compute it's hash, by selecting each possible subset of the 5000 accounts for which to select the alternative value instead of the actual one we will be obtaining 2^5000 different state hashes, and naturally there's likely to be at least one collision that is equal to the desired state hash among them.
﻿
How hard is it to find one such collision? As a matter of fact, very easy. It is effectively solving a system of linear equations, and can be done in O(n^3) time where n is the number of accounts:
﻿
# given the desired binary string `a` and a set of random binary strings `b` that has more elements that the number of bits in `a` finds a subset of `b` that xors to `a`﻿
def break_xor(a, bs):
    ans = [0 for _ in range(LEN)]
    ans_mask = [0 for _ in range(NUM)]

    bs = [(one_hot(i), x) for (i, x) in enumerate (bs)]
    for i, v in enumerate(a):
        print("Solving for bit %s" % i)
        chosen = None
        for b in bs:
            if b[1][i] == a[i]:
                ans = xor(ans, b[1])
                ans_mask = xor(ans_mask, b[0])
                a = xor(a, b[1])
                break
        else:
            assert False

        for b in bs:
            if b[1][i] == 1:
                chosen = b
                break
        else:
            assert False


        new_bs = []
        for b in bs:
            if b[1][i] == 0:
                new_bs.append(b)
            else:
                new_bs.append((xor(b[0], chosen[0]), xor(b[1], chosen[1])))
        bs = new_bs

    return (ans_mask, ans)﻿
With the snipped above for an adversary to create a state hash collision for a b bit hash, they need to select slightly more than b accounts a_i , come up with random perturbation for each account a'_i , compute the xor of the hashes of the two accounts b_i = a_i ^ a'_i , and pass such b_i s into the method above, along with the desired state_hash . It will return a non-empty list of accounts for which the perturbed state needs to be used.
AftermathWe presented this attack to Solana in January.
﻿
It initiated a discussion on what other alternatives are there, and eventually Solana decided to use the root of a merkle tree as their state hash, but update it less frequently than once every block. Batched updates to the merkle trees are significantly faster than individual, and can be easily parallelized.
﻿
Note that the attack was only affecting warp-synced nodes. The nodes that applied every block from the genesis would not be affected. However even in Ethereum today the majority of nodes are warp-synced. With an even higher number of tx/sec that Solana targets it will be even more prevalent.
OutroIf you are an entrepreneur in the web3 space, consider joining our accelerator with mentors from tier1 companies and investment firms in the space: http://openwebcollective.com/. We help with all the aspects of building the business, including identifying the market, BD, marketing and fundraising.
﻿
Solana and NEAR collaborate frequently, both on tech discussions, events, and other efforts.
﻿
We invite all other participants in the community to join the collaborative effort. Both NEAR and Solana are open source, and have most of the discussions and decision making public.
﻿
Join the development:
﻿https://github.com/nearprotocol﻿
﻿https://github.com/solana-labs﻿
﻿
Learn more and follow:
﻿near.org / @NEARProtocol﻿
﻿solana.com / @solana﻿

To reply you need to sign in.