With the specification of Phases 0 & 1 of Eth2.0 reaching a comfortable level of detail, the research focus is shifting towards Phase 2: State Execution. One of the most important aspect of this phase is the handling of cross-shard communication, which determines the scalability of the sharded blockchain system, capability of execution environments, and the overall user experience. This post aims to provide readers with an overview of the design space for cross-shard communication, and look at the leading proposals among the available choices.

The design of cross-shard communication can be separated into two layers:

Consensus layer, which handles the delivery of cross-shard messages. Design choices here affect the scalability of the sharded blockchain system.
Execution layer, which concerns the interface for cross-shard transfers and contract calls. Design choices here affect the capabilities of execution environments.

Consensus Layer

The consensus layer of cross-shard communication is responsible for delivering cross-shard messages across partitions of our sharded blockchain system. The main challenge is keeping the design scalable while providing strong guarantees about the liveness of cross-shard messages. This layer can be divided into two parts:

Send/Receive finality
Data delivery

Send/Receive Finality

The source & destination shards have to finalize the send & receive of the cross-shard message respectively. The design choices for this are:

Asynchronous: The source shard decides on sending the message first, and the destination shard can decide on receiving this message at any time in the future.
Synchronous: The destination shard finalizes the receive of the message within a bounded period after the source shard finalizes the send. There are various ways to achieve this:
- Shards run some consensus protocol between themselves and decide on sending & receiving at the same time, e.g.: Sharded Byzantine Atomic Commit
- The source shard individually decides on sending first, and the destination shard's fork choice must receive within some period of time, e.g.: CBC Casper cross-shard messaging. This approach requires a shard hierarchy to exist between the source and destination, otherwise deadlocks can arise due to conflicting sends and receives.
- Put cross-shard messages on the beacon chain and force the destination shards to receive them before their next crosslink. (Note: this imposes a scalability limit, see protocol-delivered mechanism in the next section)

The synchronous approach is incompatible with the design of Eth2.0, since it requires the shards to coordinate the finalization of sends and receives in some manner.

Data Delivery

The previous mechanism concerns the finality of the send & receive, which is not the same as actually completing the sending or reception of the message. This is the task of the data delivery mechanism. (The difference between these will be highlighted in the "user-delivered" approach in this section)

The design of Eth2.0 enforces that any consensus activity happens only in the beacon chain. This means that all cross-shard messages must "flow" through the beacon chain. This presents us with two choices regarding the delivery of the cross-shard message data:

Protocol-delivered: The protocol delivers the complete data of the cross-shard message by making it available on the beacon chain. This increases the overhead on the beacon chain and seriously affects the scalability of the system.
User-delivered: The protocol comes to consensus only on the minimal information about cross-shard messages - merkle roots of cross-shard messages from each shard block. The user is then responsible for delivering the merkle branch associated with the cross-shard message to the destination shard. This approach is more Eth2.0-esque, since it follows the general principle of forming consensus only over merkle roots on the beacon chain.

Proposed Design for the Consensus Layer

In order to prioritize the scalability of the system in the tradeoff spectrum, the solution with asynchronous send/receive finality and user-delivered data is the leading proposal. The workflow for, say, sending ether from user 1 on EE1 in shard A to user 2 on EE2 in shard B is as follows:

Cross-Shard Transaction Workflow

user 1 creates transaction TX1 on shard A that debits it's balance from EE1, and states the target to apply the credit as user 2 in EE2.
When a crosslink from shard A is included in the beacon chain, a merkle root that collects all cross-shard transactions since the last crosslink appears on the beacon chain. This is the evidence for the inclusion of TX1 in shard A.
shard B eventually becomes aware of this merkle root on the beacon chain, and user 2 creates transaction TX2 that shows the merkle proof of inclusion of TX1 to shard B. This allows a debit of the appropriate amount to user 2's balance on EE2.

Execution Layer

The execution layer of cross-shard communication provides an interface for users and contracts to make cross-shard transfers and contract calls. The design space of this layer has not been explored well yet. Recent discussions about this layer include:

Cross-shard calls in execution environments
Reliable transfer of value between shards

Cross-Shard Calls

The basic question that this component answers is: What happens when an EE calls a function of another EE on a different shard? The design space for this is not unique to sharded-blockchain-land. It's the same as any system where the execution of an application is separated across multiple partitions, e.g.:

single threaded vs. multi-threaded systems
single computer vs. network application systems

Inspired by the above systems, a simple design for this component would be the following types of calls (Vlad Zamfir pointed this out in Dec 2018):

Asynchronous call with no return
Asynchronous call with callback specified
Synchronous call

Alternative approached include various advanced concurrent programming paradigms such as protolambda's commit capabilities post.

Exploring Cross-Shard Communication in Eth2.0