Data Migration Patterns
Disclaimer: This is a summary of patterns we have observed during our research and should not be considered any form of technical or investment advice. Also, the given “known examples” do not imply they are the best implementations of the said pattern or any superior to any other implementation of the pattern not listed.
With the rapid evolution of technological, economic, and regulatory landscapes, contemporary blockchain platforms are all but certain to undergo significant changes. Therefore, the applications that rely on them eventually need to migrate from one blockchain instance to another to remain competitive and secure. Further, data migration would be required to enhance the business process, performance, cost efficiency, privacy, and regulatory compliance. However, the differences in data and smart contract representations, modes of hosting, transaction fees; and the need to preserve consistency, immutability, and data provenance introduce unique challenges over database migration. The following collection presents a set of migration patterns to address those scenarios and the above data management challenges.
We explain the patterns in the context of data migration architecture illustrated in the following figure. Similar to database migration, we envision the migration team will utilize a tool (either developed in-house or off-the-shelf) to simplify the migration process. The migration tool could follow the Extract, Transform, and Load (ETL) process to copy data from the source blockchain and recreate them on the target blockchain. Due to the incompatibilities between the source and target blockchains’ data representations, and the creation of new accounts, smart contracts, and replay of transactions, changes may be needed at the Blockchain Access/API Layer (BAL). Similar to the data access layer in databases, BAL abstracts the connectivity to the blockchain. It may also map application-level references to blockchain identifiers (ID) as they are very different. For example, a username used by a DApp needs to be mapped to the user’s address or public key on the blockchain. Such application-level reference to blockchain ID mapping is usually maintained in a protected database within BAL, which we refer to as the ID database. When the application holds the user’s private key (e.g., custodial wallet), keys may also be maintained in this database. Therefore, in addition to updating the BAL to integrate the target blockchain, ID database within the BAL needs to be updated to reflect new account and smart contract addresses, keys, and transaction IDs during the migration. Moreover, ID database can be used to identify what accounts, states, transactions, and smart contracts to migrate, as blockchains try to be anonymous by not keeping track of applications and their users. Therefore, BAL and its ID database are likely to be an integral part of the migration architecture. The dotted lines in figure show the flow of account (accID
), smart contract (scID
), and transaction (txID
) identifiers from/to the ID database.
Pattern Collection
- State extraction
- Snapshotting – Get a snapshot of states, smart contracts, and transactions on the source blockchain
- State transformation
- State aggregation – Aggregate a set of states into a single (or a few) state
- Token burning – Make tokens, states, and smart contracts on the source blockchain unusable
- State and transaction load
- Node sync – Create a clone of a blockchain node by synchronizing blocks and history
- Establish genesis – Set the state on the target blockchain’s genesis block
- Hard fork – Change the global state of the target blockchain
- State initialisation – Initialise/recreate states on the target blockchain
- Exchange transfer – Transfer states via an exchange
- Smart contract
- Virtual machine emulation – Allow smart contracts written in one language to run on another blockchain platform
- Smart contract translation – Translate smart contract code from one language to another
- Non-functional
- Measure migration quality – Define and assess metrics that measure the quality of the migrated data
- Off-chain data storage (aka., Blockchain Anchor) – Use a hash to ensure the integrity of an arbitrarily large dataset that may not fit directly on the blockchain
- Encrypting on-chain data – Ensure confidentiality of the data stored on blockchain by encrypting them