State Compression
On Solana, State Compression is the method of creating a "fingerprint" (or hash) of offchain data and storing this fingerprint on-chain for secure verification. Effectively using the security of the Solana ledger to securely validate offchain data, verifying it has not been tampered with.
This method of "compression" allows Solana programs and dApps to use cheap blockchain ledger space, instead of the more expensive account space, to securely store data.
This is accomplished by using a special binary tree structure, known as a
concurrent merkle tree, to create a hash of
each piece of data (called a leaf
), hashing those together, and only storing
this final hash on-chain.
What is State Compression?
In simple terms, state compression uses "tree" structures to cryptographically hash offchain data together, in a deterministic way, to compute a single final hash that gets stored on-chain.
These trees are created in this "deterministic" process by:
- taking any piece of data
- creating a hash of this data
- storing this hash as a
leaf
at the bottom of the tree - each
leaf
pair is then hashed together, creating abranch
- each
branch
is then hashed together - continually climbing the tree and hashing adjacent branches together
- once at the top of the tree, a final
root hash
is produced
This root hash
is then stored onchain, as a verifiable proof of all of
the data within every leaf. Allowing anyone to cryptographically verify all the
offchain data within the tree, while only actually storing a minimal amount
of data on-chain. Therefore, significantly reducing the cost to store/prove
large amounts of data due to this "state compression".
Merkle trees and concurrent merkle trees
Solana's state compression used a special type of merkle tree that allows for multiple changes to any given tree to happen, while still maintaining the integrity and validity of the tree.
This special tree, known as a "concurrent merkle tree", effectively retains a "changelog" of the tree on-chain. Allowing for multiple rapid changes to the same tree (i.e. all in the same block), before a proof is invalidated.
What is a merkle tree?
A merkle tree, sometimes called a
"hash tree", is a hash based binary tree structure where each leaf
node is
represented as a cryptographic hash of its inner data. And every node that is
not a leaf, called a branch
, is represented as a hash of its child leaf
hashes.
Each branch is then also hashed together, climbing the tree, until eventually
only a single hash remains. This final hash, called the root hash
or "root",
can then be used in combination with a "proof path" to verify any piece of data
stored within a leaf node.
Once a final root hash
has been computed, any piece of data stored within a
leaf
node can be verified by rehashing the specific leaf's data and the hash
label of each adjacent branch climbing the tree (known as the proof
or "proof
path"). Comparing this "rehash" to the root hash
is the verification of the
underlying leaf data. If they match, the data is verified accurate. If they do
not match, the leaf data was changed.
Whenever desired, the original leaf data can be changed by simply hashing the new leaf data and recomputing the root hash in the same manner of the original root. This new root hash is then used to verify any of the data, and effectively invalidates the previous root hash and previous proof. Therefore, each change to these traditional merkle trees are required to be performed in series.
This process of changing leaf data, and computing a new root hash can be a very common thing when using merkle trees! While it is one of the design points of the tree, it can result in one of the most notable drawbacks: rapid changes.
What is a Concurrent merkle tree?
In high throughput applications, like within the Solana runtime, requests to change an on-chain traditional merkle tree could be received by validators in relatively rapid succession (e.g. within the same slot). Each leaf data change would still be required to be performed in series. Resulting in each subsequent request for change to fail, due to the root hash and proof being invalidated by the previous change request in the slot.
Enter, Concurrent merkle trees.
A Concurrent merkle tree stores a secure changelog of the most recent
changes, their root hash, and the proof to derive it. This changelog "buffer" is
stored on-chain in an account specific to each tree, with a maximum number of
changelog "records" (aka maxBufferSize
).
When multiple leaf data change requests are received by validators in the same
slot, the on-chain concurrent merkle tree can use this "changelog buffer" as a
source of truth for more acceptable proofs. Effectively allowing for up to
maxBufferSize
changes to the same tree in the same slot. Significantly
boosting throughput.
Sizing a concurrent merkle tree
When creating one of these on-chain trees, there are 3 values that will determine the size of your tree, the cost to create your tree, and the number of concurrent changes to your tree:
- max depth
- max buffer size
- canopy depth
Max depth
The "max depth" of a tree is the maximum number of hops to get from any data
leaf
to the root
of the tree.
Since merkle trees are binary trees, every leaf is connected to only one
other leaf; existing as a leaf pair
.
Therefore, the maxDepth
of a tree is used to determine the maximum number of
nodes (aka pieces of data or leafs
) to store within the tree using a simple
calculation:
Since a trees depth must be set at tree creation, you must decide how many
pieces of data you want your tree to store. Then using the simple calculation
above, you can determine the lowest maxDepth
to store your data.
Example 1: minting 100 nfts
If you wanted to create a tree to store 100 compressed nfts, we will need a minimum of "100 leafs" or "100 nodes".
We must use a maxDepth
of 7
to ensure we can store all of our data.
Example 2: minting 15000 nfts
If you wanted to create a tree to store 15000 compressed nfts, we will need a minimum of "15000 leafs" or "15000 nodes".
We must use a maxDepth
of 14
to ensure we can store all of our data.
The higher the max depth, the higher the cost
The maxDepth
value will be one of the primary drivers of cost when creating a
tree since you will pay this cost upfront at tree creation. The higher the max
tree depth, the more data fingerprints (aka hashes) you can store, the higher
the cost.
Max buffer size
The "max buffer size" is effectively the maximum number of changes that can
occur on a tree, with the root hash
still being valid.
Due to the root hash effectively being a single hash of all leaf data, changing any single leaf would invalidate the proof needed for all subsequent attempts to change any leaf of a regular tree.
But with a concurrent tree, there is
effectively a changelog of updates for these proofs. This changelog buffer is
sized and set at tree creation via this maxBufferSize
value.
Canopy depth
The "canopy depth," also known as the canopy size, refers to the number of proof node levels that are cached or stored onchain for a given proof path.
When performing an update action on a leaf
, like transferring ownership (e.g.
selling a compressed NFT), the complete proof path must be used to verify
original ownership of the leaf and therefore allow for the update action. This
verification is performed using the complete proof path to correctly compute
the current root hash
(or any cached root hash
via the onchain "concurrent
buffer").
The larger a tree's max depth is, the more proof nodes are required to perform
this verification. For example, if your max depth is 14
, there are 14
total
proof nodes required to be used to verify. As a tree gets larger, the complete
proof path gets larger.
Normally, each of these proof nodes would be required to be included within each
tree update transaction. Since each proof node value takes up 32 bytes
in a
transaction (similar to providing a Public Key), larger trees would very quickly
exceed the maximum transaction size limit.
Enter the canopy. The canopy enables storing a set number of proof nodes on chain (for any given proof path). Allowing for less proof nodes to be included within each update transactions, therefore keeping the overall transaction size below the limit.
For example, a tree with a max depth of 14
would require 14
total proof
nodes. With a canopy of 10
, only 4
proof nodes are required to be submitted
per update transaction.
Canopy depth of 1 for a Concurrent Merkle Tree of max depth of 3
Consider another example, this time with a tree of max depth 3
. If we want to
apply an action to one of the tree’s leaves—such as updating R4
—we need to
provide proofs for L4
and R2
. However, we can omit R1
since it is already
cached/stored onchain due to our canopy depth of 1
, which ensures that all
nodes at level 1 (L1
and R1
) are stored onchain. This results in a total of
2 required proofs.
Therefore, the number of proofs required to update a leaf is equal to the max
depth minus the canopy depth. In this example, 3 - 1 = 2
.
The larger the canopy depth value, the higher the cost
The canopyDepth
value is also a primary factor of cost when creating a tree
since you will pay this cost upfront at tree creation. The higher the canopy
depth, the more data proof nodes are stored onchain, the higher the cost.
Smaller canopy limits composability
While a tree's creation costs are higher with a higher canopy, having a lower
canopyDepth
will require more proof nodes to be included within each update
transaction. The more nodes required to be submitted, the larger the transaction
size, and therefore the easier it is to exceed the transaction size limits.
This will also be the case for any other Solana program or dApp that attempts to interact with your tree/leafs. If your tree requires too many proof nodes (because of a low canopy depth), then any other additional actions another on-chain program could offer will be limited by their specific instruction size plus your proof node list size. Limiting composability, and potential additional utility for your specific tree.
For example, if your tree is being used for compressed NFTs and has a very low canopy depth, an NFT marketplace may only be able to support simple NFTs transfers. And not be able to support an on-chain bidding system.
Cost of creating a tree
The cost of creating a concurrent merkle tree is based on the tree's size
parameters: maxDepth
, maxBufferSize
, and canopyDepth
. These values are all
used to calculate the on-chain storage (in bytes) required for a tree to exist
onchain.
Once the required space (in bytes) has been calculated, and using the
getMinimumBalanceForRentExemption
RPC method, request the cost (in lamports) to allocate this amount of bytes
on-chain.
Calculate tree cost in JavaScript
Within the
@solana/spl-account-compression
package, developers can use the
getConcurrentMerkleTreeAccountSize
function to calculate the required space for a given tree size parameters.
Then using the
getMinimumBalanceForRentExemption
function to get the final cost (in lamports) to allocate the required space for
the tree on-chain.
Then determine the cost in lamports to make an account of this size rent exempt, similar to any other account creation.
Example costs
Listed below are several example costs, for different tree sizes, including how many leaf nodes are possible for each:
Example #1: 16,384 nodes costing 0.222 SOL
- max depth of
14
and max buffer size of64
- maximum number of leaf nodes:
16,384
- canopy depth of
0
costs approximately0.222 SOL
to create
Example #2: 16,384 nodes costing 1.134 SOL
- max depth of
14
and max buffer size of64
- maximum number of leaf nodes:
16,384
- canopy depth of
11
costs approximately1.134 SOL
to create
Example #3: 1,048,576 nodes costing 1.673 SOL
- max depth of
20
and max buffer size of256
- maximum number of leaf nodes:
1,048,576
- canopy depth of
10
costs approximately1.673 SOL
to create
Example #4: 1,048,576 nodes costing 15.814 SOL
- max depth of
20
and max buffer size of256
- maximum number of leaf nodes:
1,048,576
- canopy depth of
15
costs approximately15.814 SOL
to create
Compressed NFTs
Compressed NFTs are one of the most popular use cases for State Compression on
Solana. With compression, a one million NFT collection could be minted for
~50 SOL
, vice ~12,000 SOL
for its uncompressed equivalent collection.
If you are interested in creating compressed NFTs yourself, read our developer guide for minting and transferring compressed NFTs.