Data Contract (aka Data Segregation)

Disclaimer: This is a summary of patterns we have observed during our research and should not be considered any form of technical or investment advice. Also, the given “known examples” do not imply they are the best implementations of the said pattern or any superior to any other implementation of the pattern not listed.

Summary

Store data in a separate smart contract.

Context

A smart contract may need to be eventually upgraded to fix bugs, overcome security weaknesses, or add new functionality. In general, logic and data change at different times and with different frequencies.

Problem

An address bound to a smart contract maintains the smart contract’s state in the ledger. When an instance of the smart contract is deployed it gets a new address. Hence, it cannot directly manipulate the data embedded in the previous contract version. Even if we can build functions into the old smart contract to export/extract all its data and use that to initialise the new version, it could be a costly (on public blockchains) and nontrivial process to transfer a large set of data using a set of transactions. For example, porting data to an updated version might require multiple transactions when the block gas limit on Ethereum prevents an overly complex data migration transaction. How to upgrade a smart contract?

Forces

  • Immutability – Every bit of data, including deployed smart contracts, stored on the blockchain is immutable.
  • Upgradability – There is a fundamental need to upgrade all but short-lived applications and their smart contracts over time.
  • Coupling – Data are embedded in a smart contract. A smart contract can live forever on the blockchain if not explicitly terminated. If a smart contract is deactivated in this way, the data stored in the smart contract cannot be accessed through the smart contract functions anymore – although it can still be accessed with some effort for provenance or audit purposes.
  • Cost – If a public blockchain is used, storing data on the blockchain costs money. Thus, copying data from an old version of a smart contract to a new version should be avoided or minimised.

Solution

The standard practice of separating data from business logic can be applied to solve this problem. We can maintain data and business logic in two separate smart contracts. The former contract stores and exposes basic data read and write functions with relevant access control much like a simple database. The latter contract reads and writes the required data by calling the data contract. We can generalise this pattern to have a set of logic contracts accessing the same data contract. It is also possible to have multiple data contracts, but less common.

Data contract pattern

To avoid moving data during upgrades of smart contracts logic, the data store is isolated from the rest of the code. The data contract could try to abstract away from much of the application-specific data representations and manipulations. For example, rather than supporting specific data types, a data contract could use loosely typed flat data structures. The more generic and flexible the data structure, it could be used by all the other logic smart contracts and is unlikely to require changes in the future. One example of a generic data structure is a hash map or key-value store.

First, deploy the data contract. Second, deploy the logic contract while specifying the data contract’s address. The logic contract can then use this address to call the data contract to manipulate its data. Third, send a transaction to the data contract to configure the logic contract’s address. Whenever a transaction attempts to write data to the data contract, the data contract must validate the caller’s address against the stored address of the logic contract. This ensures only an authorised logic contract can manipulate its data. Whenever a new logic contract version is deployed, the data contract’s address must be specified. The logic contract’s address configured on the data contract must also be updated to ensure the updated contract instance has access to the data.

Benefits

  • Upgradability – By separating data from the rest of the code, the application logic can be upgraded without affecting the data contract.
  • Cost – Because the data are separated from the rest of the code, there is no cost for migrating data when the application is upgraded.
  • Generality – If the data can be cleanly separated and generalised, there would be an additional benefit: the generic data contract can be used by all related logic smart contracts.

Drawbacks

  • Cost – If a public blockchain is used, storing a piece of data in a generic data structure costs more money than a strictly defined data structure. For example, as mentioned earlier, a generic data structure maintains a mapping between key and value pairs, but a more strictly defined data structure can be of smaller size and not require the key to be stored. Querying the data is also less straightforward. This is the cost of a generalised solution. Also, it is costly to call one smart contract from another.
  • Upgradability – If the data contract needs to be upgraded, data must be migrated to the new data contract instance.

Related Patterns

  • Contract registry and this pattern can work together to improve the upgradability of smart contracts further.
  • Proxy and diamond patterns can use this pattern to separate the data from business logic.

Known uses

  • Chronobank is a blockchain project that tokenises labour and provides a market for professionals to trade their labour time with businesses. It uses a smart contract with a generic data structure as the data store used by all the other logic smart contracts.
  • Colony is a platform for open organisations running on Ethereum. Similar to Chronobank, Colony has a data contract with a generic data structure.