Data Structures in Popular Blockchains
tags: blockchain rust bitcoin ethereum polkadot substrate near nervos@hacking
Description/Summary
Data structures in popular rust blockchains.Content
Not as simple as I thought, data structure are designed quite differently in different projects.
Let’s take a look at some examples.
Bitcoin
A block header contains these fields: 4 Bytes Version Block version number 32 Bytes hashPrevBlock 256-bit hash of the previous block header 32 Bytes hashMerkleRoot 256-bit hash based on all of the transactions in the block 4 Bytes Time Current block timestamp as seconds since 1970-01-01T00:00 UTC 4 Bytes Bits Current target in compact format 4 Bytes Nonce 32-bit number (starts at 0)
I started with the well-known example of Bitcoin. Its protocol is simple and clearly explained in the whitepaper.
src: rust-bitcoin/src/blockdata/block.rs
I like developers put an introduction in front of a page!
//! Bitcoin Block
//!
//! A block is a bundle of transactions with a proof-of-work attached,
//! which commits to an earlier block to form the blockchain. This
//! module describes structures and functions needed to describe
//! these blocks and the blockchain.
//!
A Bitcoin block data structure is simple, and it looks like this:
/// A Bitcoin block, which is a collection of transactions with an attached
/// proof of work.
#[derive(PartialEq, Eq, Clone, Debug)]
pub struct Block {
/// The block header
pub header: BlockHeader,
/// List of transactions contained in the block
pub txdata: Vec<Transaction>
}
And the blockheader:
/// A block header, which contains all the block's information except
/// the actual transactions
#[derive(Copy, PartialEq, Eq, Clone, Debug)]
pub struct BlockHeader {
/// The protocol version. Should always be 1.
pub version: i32,
/// Reference to the previous block in the chain
pub prev_blockhash: BlockHash,
/// The root hash of the merkle tree of transactions in the block
pub merkle_root: TxMerkleNode,
/// The timestamp of the block, as claimed by the miner
pub time: u32,
/// The target value below which the blockhash must lie, encoded as a
/// a float (with well-defined rounding, of course)
pub bits: u32,
/// The nonce, selected to obtain a low enough blockhash
pub nonce: u32,
}
In order to save disk space, Bitcoin uses Merkle tree to compress transaction history. Merkle tree is a binary hash tree. The `merkle_root` in the `BlockHeader` struct refers to the root of a Merkle tree, which indicates the proof of transaction history.
Bitcoin Whitepaper is simple paper that explains its architecture.
Then let’s take a look at its value field, `txdata: Vec`.
//! Bitcoin Transaction
//!
//! A transaction describes a transfer of money. It consumes previously-unspent
//! transaction outputs and produces new ones, satisfying the condition to spend
//! the old outputs (typically a digital signature with a specific key must be
//! provided) and defining the condition to spend the new ones. The use of digital
//! signatures ensures that coins cannot be spent by unauthorized parties.
//!
//! This module provides the structures and functions needed to support transactions.
//!
Struct bitcoin::blockdata::transaction::Transaction
In Bitcoin, transactions issues started with searching for recievers’ addresses. Then find the input UTXOs with approved signature. So the input and output data structures are a bit different.
/// A transaction output, which defines new coins to be created from old ones.
#[derive(Clone, PartialEq, Eq, Debug, Hash)]
pub struct TxOut {
/// The value of the output, in satoshis
pub value: u64,
/// The script which must satisfy for the output to be spent
pub script_pubkey: Script
}
/// A transaction input, which defines old coins to be consumed
#[derive(Clone, PartialEq, Eq, Debug, Hash)]
pub struct TxIn {
/// The reference to the previous output that is being used an an input
pub previous_output: OutPoint,
/// The script which pushes values on the stack which will cause
/// the referenced output's script to accept
pub script_sig: Script,
/// The sequence number, which suggests to miners which of two
/// conflicting transactions should be preferred, or 0xFFFFFFFF
/// to ignore this feature. This is generally never used since
/// the miner behaviour cannot be enforced.
pub sequence: u32,
/// Witness data: an array of byte-arrays.
/// Note that this field is *not* (de)serialized with the rest of the TxIn in
/// Encodable/Decodable, as it is (de)serialized at the end of the full
/// Transaction. It *is* (de)serialized with the rest of the TxIn in other
/// (de)serialization routines.
pub witness: Vec<Vec<u8>>
}
Ethereum
OpenEthereum is Ethereum 1.0 Rust client.
Different from Bitcoin’s data structure, Ethereum introduces one more field.
src: openethereum/ethcore/types/src/block.rs
/// A block, encoded as it is on the block chain.
#[derive(Default, Debug, Clone, PartialEq)]
pub struct Block {
/// The header of this block.
pub header: Header,
/// The transactions in this block.
pub transactions: Vec<UnverifiedTransaction>,
/// The uncles of this block.
pub uncles: Vec<Header>,
}
Its header data structure is much more complicated compared to Bitcoin.
src: openethereum/ethcore/types/src/header.rs
/// A block header.
///
/// Reflects the specific RLP fields of a block in the chain with additional room for the seal
/// which is non-specific.
///
/// Doesn't do all that much on its own.
#[derive(Debug, Clone, Eq, MallocSizeOf)]
pub struct Header {
/// Parent hash.
parent_hash: H256,
/// Block timestamp.
timestamp: u64,
/// Block number.
number: BlockNumber,
/// Block author.
author: Address,
/// Transactions root.
transactions_root: H256,
/// Block uncles hash.
uncles_hash: H256,
/// Block extra data.
extra_data: Bytes,
/// State root.
state_root: H256,
/// Block receipts root.
receipts_root: H256,
/// Block bloom.
log_bloom: Bloom,
/// Gas used for contracts execution.
gas_used: U256,
/// Block gas limit.
gas_limit: U256,
/// Block difficulty.
difficulty: U256,
/// Vector of post-RLP-encoded fields.
seal: Vec<Bytes>,
/// Memoized hash of that header and the seal.
hash: Option<H256>,
}
I can’t fully understand the header data structure design just from the code. I guess I need to reread Ethereum Yellow Paper.
src: openethereum/ethcore/types/src/transaction/transaction.rs
/// A set of information describing an externally-originating message call
/// or contract creation operation.
#[derive(Default, Debug, Clone, PartialEq, Eq, MallocSizeOf)]
pub struct Transaction {
/// Nonce.
pub nonce: U256,
/// Gas price.
pub gas_price: U256,
/// Gas paid up front for transaction execution.
pub gas: U256,
/// Action, can be either call or contract create.
pub action: Action,
/// Transfered value.
pub value: U256,
/// Transaction data.
pub data: Bytes,
}
Yellow paper:
The Transaction. A transaction (formally, T) is a single cryptographically-signed instruction constructed by an actor externally to the scope of Ethereum. on1
There are two types of transactions: those which result in message calls and those which result in the creation of new accounts with associated code (known informally as ‘contract creation’).
/// Transaction action type.
#[derive(Debug, Clone, PartialEq, Eq, MallocSizeOf)]
pub enum Action {
/// Create creates new contract.
Create,
/// Calls contract at given address.
/// In the case of a transfer, this is the receiver's address.'
Call(Address),
}
The “contract” in the code comments are referring to a “smart contract” on the Ethereum blockchain platform, which can be considered as backend service in traditional internet programming, and a dapp, which with the full name of “decentralized application”, can be regarded as the frontend.
Polkadot
Polkadot is built on Substrate, a blockchain framework. The Substrate doc has explainations of its Block Structure
A block in Substrate is composed of a header and an array of extrinsics. The header contains a block height, parent hash, extrinsics root, state root, and digest.
src: core-primitives/src/lib.rs
src: parachain/test-parachains/adder/src/lib.rs
#[derive(Default, Clone, Hash, Eq, PartialEq, Encode, Decode)]
pub struct HeadData {
/// Block number
pub number: u64,
/// parent block keccak256
pub parent_hash: [u8; 32],
/// hash of post-execution state.
pub post_state: [u8; 32],
}
/// Block data for this parachain.
#[derive(Default, Clone, Encode, Decode)]
pub struct BlockData {
/// State to begin from.
pub state: u64,
/// Amount to add (overflowing)
pub add: u64,
}
/// Execute a block body on top of given parent head, producing new parent head
/// if valid.
pub fn execute(
parent_hash: [u8; 32],
parent_head: HeadData,
block_data: &BlockData,
) -> Result<HeadData, StateMismatch> {
debug_assert_eq!(parent_hash, parent_head.hash());
if hash_state(block_data.state) != parent_head.post_state {
return Err(StateMismatch);
}
let new_state = block_data.state.overflowing_add(block_data.add).0;
Ok(HeadData {
number: parent_head.number + 1,
parent_hash,
post_state: hash_state(new_state),
})
}
src: substrate/primitives/blockchain/src/header_metadata.rs
/// Handles header metadata: hash, number, parent hash, etc.
pub trait HeaderMetadata<Block: BlockT> {
/// Error used in case the header metadata is not found.
type Error;
fn header_metadata(
&self,
hash: Block::Hash,
) -> Result<CachedHeaderMetadata<Block>, Self::Error>;
fn insert_header_metadata(
&self,
hash: Block::Hash,
header_metadata: CachedHeaderMetadata<Block>,
);
fn remove_header_metadata(&self, hash: Block::Hash);
}
src: substrate/primitives/blockchain/src/backend.rs
NEAR
NEAR has two types of block structure that looks more or less the same. I choose to use the “V2” version as the example code.
src: nearcore/core/primitives/src/block.rs
#[derive(BorshSerialize, BorshDeserialize, Serialize, Debug, Clone, Eq, PartialEq)]
pub struct BlockV2 {
pub header: BlockHeader,
pub chunks: Vec<ShardChunkHeader>,
pub challenges: Challenges,
// Data to confirm the correctness of randomness beacon output
pub vrf_value: near_crypto::vrf::Value,
pub vrf_proof: near_crypto::vrf::Proof,
}
src: nearcore/core/primitives/src/block_header.rs
/// Versioned BlockHeader data structure.
/// For each next version, document what are the changes between versions.
#[derive(BorshSerialize, BorshDeserialize, Serialize, Debug, Clone, Eq, PartialEq)]
pub enum BlockHeader {
BlockHeaderV1(Box<BlockHeaderV1>),
BlockHeaderV2(Box<BlockHeaderV2>),
}
Still, we choose to use V2 code.
/// V1 -> V2: Remove `chunks_included` from `inner_reset`
#[derive(BorshSerialize, BorshDeserialize, Serialize, Debug, Clone, Eq, PartialEq)]
#[borsh_init(init)]
pub struct BlockHeaderV2 {
pub prev_hash: CryptoHash,
/// Inner part of the block header that gets hashed, split into two parts, one that is sent
/// to light clients, and the rest
pub inner_lite: BlockHeaderInnerLite,
pub inner_rest: BlockHeaderInnerRestV2,
/// Signature of the block producer.
pub signature: Signature,
/// Cached value of hash for this block.
#[borsh_skip]
pub hash: CryptoHash,
}
src: nearcore/core/primitives/src/sharding.rs
#[derive(BorshSerialize, BorshDeserialize, Serialize, Clone, PartialEq, Eq, Debug)]
#[borsh_init(init)]
pub struct ShardChunkHeaderV2 {
pub inner: ShardChunkHeaderInner,
pub height_included: BlockHeight,
/// Signature of the chunk producer.
pub signature: Signature,
#[borsh_skip]
pub hash: ChunkHash,
}
#[derive(BorshSerialize, BorshDeserialize, Serialize, Clone, PartialEq, Eq, Debug)]
pub struct ShardChunkHeaderInner {
/// Previous block hash.
pub prev_block_hash: CryptoHash,
pub prev_state_root: StateRoot,
/// Root of the outcomes from execution transactions and results.
pub outcome_root: CryptoHash,
pub encoded_merkle_root: CryptoHash,
pub encoded_length: u64,
pub height_created: BlockHeight,
/// Shard index.
pub shard_id: ShardId,
/// Gas used in this chunk.
pub gas_used: Gas,
/// Gas limit voted by validators.
pub gas_limit: Gas,
/// Total balance burnt in previous chunk
pub balance_burnt: Balance,
/// Outgoing receipts merkle root.
pub outgoing_receipts_root: CryptoHash,
/// Tx merkle root.
pub tx_root: CryptoHash,
/// Validator proposals.
pub validator_proposals: Vec<ValidatorStake>,
}
src: nearcore/core/primitives/src/challenge.rs
#[derive(BorshSerialize, BorshDeserialize, Serialize, PartialEq, Eq, Clone, Debug)]
#[borsh_init(init)]
pub struct Challenge {
pub body: ChallengeBody,
pub account_id: AccountId,
pub signature: Signature,
#[borsh_skip]
pub hash: CryptoHash,
}
pub type Challenges = Vec<Challenge>;
Seems NEAR heavily typed their data. I couldn’t find NEAR’s architecture design from its paper or documentation.
Nervos CKB
src: ckb/util/types/src/core/cell.rs
#[derive(Clone, Eq, PartialEq, Default)]
pub struct CellMeta {
pub cell_output: CellOutput,
pub out_point: OutPoint,
pub transaction_info: Option<TransactionInfo>,
pub data_bytes: u64,
/// In memory cell data and its hash
/// A live cell either exists in memory or DB
/// must check DB if this field is None
pub mem_cell_data: Option<(Bytes, Byte32)>,
}
src: ckb/util/types/src/generated/blockchain.rs
#[derive(Debug, Default)]
pub struct BlockBuilder {
pub(crate) header: Header,
pub(crate) uncles: UncleBlockVec,
pub(crate) transactions: TransactionVec,
pub(crate) proposals: ProposalShortIdVec,
}
#[derive(Debug, Default)]
pub struct UncleBlockBuilder {
pub(crate) header: Header,
pub(crate) proposals: ProposalShortIdVec,
}
There is almost no comment on these codes, and I don’t understand that `ProposalShortIdVec` means. How they define `UncleBlockVec`, which raises new questions about “an uncle block”. I hope there is a meaningful explanation here to eliminate the confusion of learning its code.
I finally found an explaination of data structure from its GitHub doc. It would be much convenient if I could read this information in code comments.
In Bitcoin, the uncle block is stored in the header data; In CKB, it is stored in the block directly as a field. I don’t have any further thoughts here at the moment.
After reading this page, I still don’t know what `ProposalShortIdVec` is. I guess that might because I lack some pre-knowledge to start with.
src: ckb/util/types/src/core/cell.rs
#[derive(Default)]
pub struct CellMetaBuilder {
cell_output: CellOutput,
out_point: OutPoint,
transaction_info: Option<TransactionInfo>,
data_bytes: u64,
mem_cell_data: Option<(Bytes, Byte32)>,
}
src: ckb/util/types/src/core/transaction_meta.rs
#[derive(Default, Debug, PartialEq, Eq, Clone)]
pub struct TransactionMeta {
pub(crate) block_number: u64,
pub(crate) epoch_number: u64,
pub(crate) block_hash: Byte32,
pub(crate) cellbase: bool,
/// each bits indicate if transaction has dead cells
pub(crate) dead_cell: BitVec,
}
src: ckb/util/types/src/core/blockchain.rs
More to do
- Solana