Data Management Decision Model
Disclaimer: This is a summary of patterns we have observed during our research and should not be considered any form of technical or investment advice. Also, the given “known examples” do not imply they are the best implementations of the said pattern or any superior to any other implementation of the pattern not listed.
The patterns categorized under data management are further divided into on-chain and off-chain data management. While some of the on-chain computation using smart contracts is analogous to data management using blockchain transactions, we do not cover computation on a blockchain in this paper.
On-chain Data Management Decision Model
First, we consider the design goal of separating on-chain and off-chain data on the blockchain. Figure 1 illustrates the proposed decision model for choosing both on-chain data management and performance patterns. The performance patterns included within the dashed rectangle are discussed in the performance decision model. While both groups of patterns are included in the same figure to depict the decision flow in terms of data size (illustrate by the exclusive gateway), the selection of patterns can be performed parallelly.
Characteristics of the data to be managed by the blockchain determine what patterns should be selected. If the data to be stored surrogate asset ownership or identification, the tokenisation pattern could be used to create and manage digital, authoritative records of assets on the blockchain. A token can also be used to represent identification, which is mainly used in security design. When it comes to asset ownership, tokenization reduces the risk of handling high-value digital and physical fungible and non-fungible assets. A legal process in the physical world is required to make the ownership transfer on blockchain as an authoritative record. Blockchain transactions ensure authorized creation, transfer, and trading of tokens enhancing flexibility and cost-efficiency. However, the asset represented by the token could be manipulated outside the blockchain without recording the transaction on the blockchain. This compromises data integrity.
If a token is no longer needed, we should mark it as unspendable to avoid misuse, such as double-spending. For example, a non-fungible token should be destroyed once it is no longer needed, fungible tokens are destroyed to increase the value of tokens by reducing the supply, and a token should be destroyed on the source blockchain before it is swapped/migrated to the target blockchain. In such cases, the token burning pattern could be applied to make tokens or smart contracts unusable while ensuring consistency and accountability. Token burning complements tokenization.
Decentralized Applications (DApps) are designed to use a blockchain as a back-end server that runs business logic using smart contracts and stores data as a key-value database. However, the volume of data stored using a transaction is constrained by the transaction (TX) size and block size of the chosen blockchain platform. Similarly, in platforms such as Ethereum, the computational complexity of a smart contract is also bounded by transaction and block size. Therefore, the applicability of on-chain patterns depends on the size of data to be stored or computational complexity.
If the application data are small and not sensitive, storing all data on the blockchain is feasible. In such cases, the raw data on-chain pattern could be applied to store all application data immutably and transparently on the blockchain. There are different ways to store data on a blockchain, e.g., embedded into a transaction, as a smart contract variable, or smart contract log events. These options have different trade-offs, including cost and flexibility. Further, not only writing arbitrary data to a blockchain is slow and expensive but also less flexible. The constraint to this pattern is the transaction size, as it is smaller than the block size.
If the application data needs to be set while initializing the blockchain, the establish genesis pattern could be used to set the state on the genesis block. For example, when launching a new blockchain, the initial distribution of the native tokens can be set using this pattern. However, the size of the data included in the genesis block is limited by the block size. This pattern complements raw data on-chain. However, there are more constraints on what data (e.g., smart contracts) and how much data can be written during genesis; thus, the flexibility is low.
The data on the blockchain is accessible to all users on a blockchain network as they have the same level of privileges. If data are supposed to be accessible only to the transacting participates (e.g., commercially sensitive), the encrypting on-chain data – which complements raw data on-chain – could be applied to encrypt the data before adding onto the blockchain. We assume that the blockchain platform uses no advanced cryptography, like Zero-knowledge proof. However, as the encrypted data are stored immutably on the blockchain, it may be subject to brute force decryption attacks in the future.
Legal and smart contract pair is a pattern that evolves from the hash on-chain pattern discussed in performance decision model. This pattern provides a bidirectional binding between a legal agreement and the corresponding smart contract to provide an authoritative source to the legal contract. The two-directional binding is based on the hash of the legal agreement and the smart contract’s immutable address; thus, making sure that the legal agreement and smart contract have 1-to-1 mapping.
Off-chain Data Access Control Decision Model
As the data stored on a blockchain is accessible to all users on the blockchain network, one way to preserve privacy is to keep sensitive data off-chain while providing access to them using off-chain access control logic. The off-chain components within a blockchain-based application could apply conventional architectural patterns that fulfil the requirements of the given use case. Here we consider an example set of conventional data access control patterns presented in the context of Self-Sovereign Identity (SSI) to constrain the access to off-chain credentials. The following figure illustrates the related decision model.
Typically, when storing off-chain data, a developer needs to consider two types of constraints: namely content and temporal constraints. If data access needs to be constrained to enhance privacy, the selective content generation pattern could be applied to generate a customized response based on the access control rules to avoid leaking unnecessary data. For example, in the context of SSI, a certificate issuer could generate a customized credential based on the identity holder’s specific requirement, such as to show the grade earned for a module than exposing the entire transcript. Other conventional access control patterns such as policy-based and token-based access control could be included in this decision model to complement the selective content generation pattern.
If the access to data needs to be time-constrained, the time-constrained access pattern could be applied to enforce the period of access to off-chain data. For example, in the SSI context, a certificate holder could share a link to a verifier that is redirected to the credentials only within the defined period, e.g., while applying to a job. The one-off access pattern could be applied to allow only one-time access to the data. This pattern can be generalized to provide access up to a predefined number of times. One constraint of both temporal patterns is that a malicious user may take a snapshot of the data/content while accessing it, then it can read the data even after the restricted link is expired.