Skip to content
This repository was archived by the owner on Jul 5, 2024. It is now read-only.
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 7 additions & 11 deletions src/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,22 +27,18 @@ List of circuits and tables they generate/verify:
In the end, the circuits would be assembled depending on their dimension and the desired capacity. For example, we can just combine 2 different circuits by using different columns, or stack them using same columns with extra selectors.

In order to reduce the time required to build a proof of a full block and to
simplify the verification step, an aggregation circuit is being build so that condenses the
verification of each sub-circuit proofs shown in the diagram. See [Design
simplify the verification step, an aggregation circuit is being built to condenses the
verification of each sub-circuit proof shown in the diagram. See [Design
Notes, Recursion](./design/recursion.md) for details on the recursion strategy
used in the aggregation circuit.

## Circuit as a lookup table

In halo2, the lookup is flexible to be configured. Anything able to be turned into `Expression` could be used as `item: Tuple[int, ...]` or `table: Set[Tuple[int, ...]]` in lookup. Enabling `assert item in table`. The `Expression` includes `Constant`, `Fixed`, `Advice` or `Instance` column at arbitrary rotation.
In halo2, lookup configuration is quite flexible, therefore anything that can be represented as an `Expression` could be used as an `item: Tuple[int, ...]` or `table: Set[Tuple[int, ...]]` in lookup. This enables assertion of items in the lookup table e.g. `assert item in table`. Examples of the `Expression` include `Constant`, `Fixed`, `Advice` or `Instance` column at arbitrary rotation.

The motivation to have multiple circuits as lookup tables is that EVM contains many circuit unfriendly operations like random read-write data access, "wrong" field operation (ECDSA on secp256k1), traditional hash functions like `keccak256`, etc... And many of them accept variable lenght input.
The motivation to have multiple circuits as lookup tables is that the EVM contains many operations which are not suited to circuits, such as random read-write data access, "wrong" field operation (ECDSA on secp256k1) and traditional hash functions like `keccak256` to name a few. This is not suited to circuits because they all accept variable length inputs. Designing an EVM circuit that can verify computation traces become much more complex because each step could possibly contain some of the aforementioned operations. In order to solve this problem, we separated these expensive operations into single-purpose circuits which have a more friendly layout, and use them via lookups to communicate the requisite input and output. This is made possible by the fact that the lookup table is configured with constraints to ensure that the input and output have a relationship. For example, the Bytecode circuit contains a set of tuples `(code_hash, index, opcode)`, and each `code_hash` is the keccak256 digest of opcodes it contains, thus in the EVM circuit we can load `opcode` with `(code_hash, program_counter)` by performing a lookup the Bytecode table.

These expensive operations make it hard to design an EVM circuit to verify computation traces because each step could possibly contain some of the operations mentioned above. So we tried to separate these expensive operations into single-purpose circuits which have a more friendly layout, and use them via lookups to communicate it's input and output, Outsourcing the effort.

The reason input-output lookups could be used to outsource the effort is that we know the that the lookup-ed table is configured with constraints to verify the input and output are in some relationship. For example, we let Bytecode circuit to hold a set of tuple `(code_hash, index, opcode)`, and each `code_hash` is verified to be the keccak256 digest of opcodes it contains, then in EVM circuit we can load `opcode` with `(code_hash, program_counter)` by looking up the Bytecode table.

However, there are some properties we can't ensure only with lookups (which ultimately only prove that the contents of all the lookups are a subset of a table). We want to constraint that the amount of all (looked-up) `item`s should be equal to the size of `table`, which is required by the EVM circuit and State circuit to prevent extra malicious writes in the `table`. In such case (the set of looked up items define the table exactly), we need some extra constraint to ensure the relationship is correct. A naive approach is to count all `item` in State circuit (which in the end is the size of the `table`) and constraint it to be equal to the value counted in the EVM circuit.
Whilst this method may work well for many cases, there are some properties that can't be verified exclusively with lookups (because the contents of all the lookups are only a subset of a table). Therefore, the number of all *looked-up items* should be equal to the size of `table`. This constraint is required by the EVM circuit and State circuit to prevent malicious writes in the `table`. In such case (the set of looked up items define the table exactly), we need some extra constraint to ensure the relationship is correct. A naive approach is to count all `item` in State circuit (which in the end is the size of the `table`) and ensure that it is equal to the value counted in the EVM circuit.


## EVM word encoding
Expand All @@ -62,6 +58,6 @@ See [Design Notes, Random Linear Combination](./design/random-linear-combinaion.
| `MAX_ETHER` | `2**256 - 1` | max value of ether allowed [^2] |


[^1]: The explicit max memory address in EVM is actually `32 * (2**32 - 1)`, which is the one that doesn't make memory expansion gas cost overflow `u64`. In our case, memory address is allowed to be 5 bytes, but will constrain the memory expansion gas cost to fit `u64` in success case.
[^1]: The explicit max memory address in EVM is actually `32 * (2**32 - 1)`,in order to prevent memory expansion gas cost from overflowing `u64`. In theory, a memory address is allowed to be 5 bytes, but will limit the memory expansion gas cost to fit `u64` in success case.

[^2]: I didn't find a explicit upper bound on value of ether (for `balance` or `gas_price`) in yellow paper, but handling unbounded big integer seems unrealistic in circuit, so with `u256` as a hard bound seems reasonable.
[^2]: There's no explicit upper bound on the value of ether (for `balance` or `gas_price`) in the yellow paper, but handling an unbounded big integer is impractical in the circuit, so it is limited to `u256`.
24 changes: 12 additions & 12 deletions src/architecture/evm-circuit.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,27 +4,23 @@

# Introduction

EVM circuit iterates over transactions included in the proof to verify that each execution step of a transaction is valid. Basically the scale of a step is the same as in the EVM, so usually we handle one opcode per step, except those opcodes like `SHA3` or `CALLDATACOPY` that operate on variable size of memory, which would require multiple "virtual" steps.
EVM circuit iterates over transactions included in the proof to verify that each execution step of a transaction is valid. Each iteration handles one opcode per step (as is the case in the EVM), except opcodes which operate on variable size of memory (such as `SHA3` or `CALLDATACOPY`), which would require multiple "virtual" steps [^1].

> The scale of a step somehow could be different depends on the approach, an extreme case is to implement a VM with reduced instruction set (like TinyRAM) to emulate EVM, which would have a much smaller step, but not sure how it compares to current approach.
>
> **han**

To verify if a step is valid, we first enumerate all possible execution results of a step in the EVM including success and error cases, and then build a custom constraint to verify that the step transition is correct for each execution result.
In order to verify if a step is valid, we must first enumerate all possible execution results of a step in the EVM - including success and error cases, and then build a custom constraint to verify that the step transition is correct for each execution result.

For each step, we constrain it to enable one of the execution results, and specially, to constrain the first step to enable `BEGIN_TX`, which then repeats the step to verify the full execution trace. Also each step is given access to next step to propagate the tracking information, by putting constraints like `assert next.program_counter == curr.program_counter + 1`.
For each step, we constrain it to enable one of the execution results, and specially, to constrain the first step to enable `BEGIN_TX`, which then repeats the step to verify the full execution trace. Also each step is given access to the next ste in order to propagate the tracking information. This is accomplished by putting constraints like `assert next.program_counter == curr.program_counter + 1`.

# Concepts

## Execution result

It's intuitive to have each opcode as a branch in step. However, EVM has so rich opcodes that some of them are very similar like `{ADD,SUB}`, `{PUSH*}`, `{DUP*}` and `{SWAP*}` that seem to be handled by almost identical constraint with small tweak (to swap a value or automatically done due to linearity), it seems we could reduce our effort to only implement it once to handle multiple opcodes in single branch.
One's first intuition may be to have each opcode as a branch in step, however, due to the similarities between opcodes, such as `{ADD,SUB}`, `{PUSH*}`, `{DUP*}` and `{SWAP*}` , we can handle many opcodes in a single branch with minor adjustments.

In addition, an EVM state transition could also contain serveral kinds of error cases, we also need to take them into consideration to be equivalent to EVM. It would be annoying for each opcode branch to handle their own error cases since it needs to halt the step and return the execution context to caller.
Each EVM state transition can contain several kinds of error cases, so we need to take them into consideration to be equivalent to EVM, rather than each opcode branch handling its own error cases as that would result in the execution being returned to the caller, thus halting the step. Fortunately, most error cases are easy to verify with a pre-built lookup table, there are some exceptions such as an out of gas error due to dynamic gas usage.

Fortunately, most error cases are easy to verify with some pre-built lookup table even they could happen to many opcodes, only some tough errors like out of gas due to dynamic gas usage need to be verified one by one. So we further unroll all kinds of error cases as kinds of execution result.
Because of this complexity, we unroll all kinds of error cases as a type of execution result.

So we can enumerate [all possible execution results](https://github.com/appliedzkp/zkevm-specs/blob/83ad4ed571e3ada7c18a411075574110dfc5ae5a/src/zkevm_specs/evm/execution_result/execution_result.py#L4) and turn EVM circuit into a finite state machine like:
This gives us the ability to enumerate [all possible execution results](https://github.com/appliedzkp/zkevm-specs/blob/83ad4ed571e3ada7c18a411075574110dfc5ae5a/src/zkevm_specs/evm/execution_result/execution_result.py#L4) and subsequently represent the EVM circuit as a finite state machine (see below)

```mermaid
flowchart LR
Expand Down Expand Up @@ -58,7 +54,7 @@ flowchart LR
- Beginning of a transaction.
- **EVMExecStates** = [ SuccessStep | ReturnStep ]
- **SuccessStep** = [ ExecStep | ExecMetaStep | ExecSubStep ]
- Set of states that suceed and continue the execution within the call.
- Set of states that succeed and continue the execution within the call.
- **ReturnStep** = [ ExplicitReturn | Error ]
- Set of states that halt the execution of a call and return to the caller
or go to the next tx.
Expand Down Expand Up @@ -183,3 +179,7 @@ For a simple `CALL` example with illustration (many details are hided for simpli
- [spec](https://github.com/appliedzkp/zkevm-specs/blob/master/specs/evm-proof.md)
- [python](https://github.com/appliedzkp/zkevm-specs/tree/master/src/zkevm_specs/evm)
- [circuit](https://github.com/appliedzkp/zkevm-circuits/tree/main/zkevm-circuits/src/evm_circuit)


[^1]: This approach could be modified in the future to utilize a reduced instruction set such as TinyRAM which emulates the EVM, but that would required further exploration to determine if it's more efficient than the current approach.