Skip to main content

RLP (Recursive Length Prefix)

Importance of RLP

Understanding Recursive Length Prefix is important for understanding how Web3 wallets work, especially in the context of Ethereum and other Ethereum-based blockchains.

For example, Web3 wallets are used to send transactions on blockchain networks. These transactions need to be encoded in a specific format, including the recipient's address, the amount to transfer, and potentially additional data. RLP is one of the encoding methods used for this purpose. The same applies to data serialization while interacting with smart contracts and reading the blockchain's state.

What is RLP

RLP is a binary encoding scheme used primarily in the Ethereum blockchain and other Ethereum-based blockchains to serialize and encode structured data. It is a fundamental component of Ethereum's data storage and communication protocols, used for encoding various types of data structures within transactions, blocks, and smart contracts.

The "Recursive" aspect of RLP refers to its ability to encode nested data structures by recursively applying the RLP encoding process. It means that lists within lists, or arrays within arrays, can be encoded efficiently.

RLP rules of encoding

The formal explanation of the RLP rules can be found in the Ethereum yellow paper page 19. We will provide a simple explanation of the same.

There are two options for what the encoded data can be:

  1. a string, for example, "cat"
  2. an array of strings, for example ["cat", "dog"], or more complex structures like trees with branches and leaves, for example [["cat", "dog"],["car", "bus"],...]

It is true because any data, regardless of its format, can be converted to a string or an array of strings.

Encoding happens according to the following rules

  1. All the incoming data is converted to its hexadecimal representation, unless it is empty.
  2. If you have a tiny piece of data, like a single number between 0 and 127, you don't change it. It stays the same.
  3. If you have a small piece of data, like a word or a short message (between 0 and 55 bytes long), you add a special code in front of it (0x80 plus the length of the data), and then you put the hex encoded data after it.
  4. If your data is bigger (more than 55 bytes long), you add a different special code (0xb7 plus the length of how big your data is in binary), then you put the length, and then you put the data.
  5. If you have a list of data, and the whole list is small (the combined length of everything in the list is 0 to 55 characters), you use 0xc0 plus the length of the list, and then you put all the data items together.
  6. If the list is bigger (more than 55 bytes), you use 0xf7 plus the length of how big your list is in binary, then you put the length, and then you put all the data items.

RLP decoding rules

RLP Decoding In RLP (Recursive Length Prefix) decoding, the input data is treated as an array of binary data. The RLP decoding process unfolds as follows:

  1. Examine the first byte, the prefix, within the input data to determine the data type, actual data length, and offset.
  2. Decoding the data following its type and offset.
  3. Continue decoding of the remaining input.

Regarding the decoding of data types and offsets, the rules are as follows:

  • If the first byte falls within the range [0x00, 0x7f], the data is considered a string, and the string consists solely of the first byte.
  • The data is interpreted as a string if the first byte falls within the range [0x80, 0xb7]. The length of this string is equal to the value of the first byte minus 0x80, and it directly follows the first byte.
  • If the first byte falls within the range [0xb8, 0xbf], it also signifies a string. The length of this string, measured in bytes, corresponds to the value of the first byte minus 0xb7. Following the first byte, the length of the string is specified, and subsequently, the string data itself.
  • In cases where the first byte is within the range [0xc0, 0xf7], it indicates that the data is a list. The RLP encodings of all items within the list, with a total payload equivalent to the value of the first byte minus 0xc0, are concatenated and follow the first byte.
  • Similarly, it designates a list when the first byte falls within the range [0xf8, 0xff]. The total payload of the list, with a length matching the value of the first byte minus 0xf7, is provided after the first byte. The concatenation of the RLP encodings of all items within the list is presented.

Implementation

A TypeScript code snippet below illustrates the concept via a recursive rlpEncode function walking over the leaves and the branches encoding the input into an EVM-compatible format.

TypeScript example
import {ethers} from 'ethers';

// Example encoding:
const encode = ethers.utils.RLP.encode
console.log("0x12345678", encode("0x12345678")); // Should return "0x8412345678"
console.log(["0x12345678"], encode(["0x12345678"])); // Should return "0xc58412345678"
console.log([new Uint8Array([0x12, 0x34, 0x56, 0x78])], encode([new Uint8Array([0x12, 0x34, 0x56, 0x78])])); // 0xc58412345678
console.log([], encode([])); //0xc0 - array indicator

// Example decoding:
const decode = ethers.utils.RLP.decode
console.log("0xc88363617483646f67", decode("0xc88363617483646f67"))
console.log("0x8412345678", decode("0x8412345678"))
console.log("0xc58412345678", decode("0xc58412345678"))
console.log("0xc0", decode("0xc0"))

Expected output:

0x12345678 0x8412345678
[ '0x12345678' ] 0xc58412345678
[ Uint8Array(4) [ 18, 52, 86, 120 ] ] 0xc58412345678
[] 0xc0
0xc88363617483646f67 [ '0x636174', '0x646f67' ]
0x8412345678 0x12345678
0xc58412345678 [ '0x12345678' ]
0xc0 []