Blockchain is an append-only technology, meaning data cannot be deleted. How then does it account for data regulations like the European Union’s right to be forgotten? And with all the data and components, how do we ultimately optimize these systems for maximized performance?
Different protocols have different solutions to these challenges. When building applications with Hyperledger Fabric, the subject of this article and our most recent experiments, we can turn to private data collection and network management strategies. More on these later.
In this article, we explore:
1. What Hyperledger Fabric is and how it can be used in an example business scenario requiring private data collections.
2. How IntellectEU’s Catalyst Blockchain Manager can be used to simplify infrastructure complexity in this context.
3. And which steps developers can take to optimize network performance for private data collection, using specific insights gained from our own experimentation.
After reading the article you will have some valuable general rules for handling private data and optimizing performance that you can apply immediately to your own projects with Hyperledger Fabric.
Additionally, in this article we will outline how organizations can use Catalyst Blockchain Manager, a best-in-class infrastructure management tool, to accelerate a multitude of processes. This allows organizations to create or join existing Hyperledger Fabric-based business ecosystems more quickly and easily, and results in better scalability for the shared infrastructure overall. You can read more about Catalyst Blockchain Manager in another of our articles.
Now, let’s begin by outlining the business scenario and the technology.
The Business Scenario
In our scenario, we will imagine that a company wants to join an existing business ecosystem. Company A wants to conduct business with Company B, which is already in an ongoing business arrangement with Company C.
Business B has a token service that allows users to purchase goods from markets and other businesses. Businesses B and C already have a working partnership and have built a Hyperledger Fabric network infrastructure.
Hyperledger Fabric is among the most widely employed open-source, private blockchain frameworks available. Its outstanding flexibility and reliability enable organizations to design and build all manner of enterprise-grade blockchain-based solutions and market infrastructures.
Business A wants to join the venture but wants to keep all of its data and transactions private, sharing them in between itself and Business B.
The idea is depicted visually below:
To add itself to the shared infrastructure, Company A needs to create its own infrastructure. We can do this in a multitude of ways. Usually, there would be a programmer or someone dedicated to implementing various scripts to deploy all the necessary aspects of a Hyperledger Fabric network, but the easiest method is to use Catalyst Blockchain Manager — so, that’s what we will do.
Using IntellectEU’s blockchain manager, Company A can easily create and deploy all the necessary infrastructure, doing so safely and without any scripting or programming knowledge.
Before exploring the creation of the network, let’s take a quick look at Hyperledger Fabric and its components.
Hyperledger Fabric Overview
Fabric, for short, is an open-source private blockchain technology. It comprises a network of computers that communicate between one another using a communication channel that only they, the network participants, can access.
It consists of 3 major network artifacts:
The Certificate Authority, which generates the security certificates and also authenticates all requests.
The Orderer, where all transactions are organized and put into their blocks.
And the Peer, which runs the smart contracts (chaincodes), executes the transactions, and maintains the ledger and world state.
The world state keeps all the current data while the ledger keeps all the blocks and all the transactions that have occurred until that moment in time. All of this data is stored in a channel. The channel is a means of communication between peers from different parties, where the smart contracts are running and where policies control who can see or make changes on that network.
Creating the Network
Now that we know the network components, let's look at creating Company A’s infrastructure.
First, we should start by creating its MSP. MSP is a component that offers an abstraction of membership operations; in particular, an MSP abstracts away all cryptographic mechanisms and protocols behind issuing certificates, validating certificates, and user authentication.
The next step is to create the CAs. Usually, we advise creating at least two CAs if it's a production environment, an intermediate CA can also be created.
After the creation of the CAs, we can now create the ordering service. It will be here that we can join a pre-existing system channel or we can create a new one. In this case, we will create a new one.
The next logical step is to create the Peer Set and the Peers. Here we can easily add or remove the peers as the needs of the infrastructure change.
At this moment the network infrastructure is almost ready. It has all the necessary network components, but it is missing its communication and business logic. The next step is to create the channel.
The final step is to add the business logic to the channel. This is called the chaincode.
Private Data Collection
Hyperledger Fabric is a private blockchain technology, but it does not mean that the private data is only accessed by its owner. It is still a distributed technology that shares a single source of truth: the ledger. Data can be protected, however. For that, Private Data Collection (PDC) was developed.
There are two types of PDCs: implicit and explicit. With the implicit version, only the organization that is executing the call can access it. With the explicit, we can configure accesses and policies. PDCs can be configured at the time of chaincode installation.
A PDC is like a channel inside the main channel. With this, we can store any private data that might be needed. The PDC requires two parts for it to work effectively, the full private data and a hash of that transaction. The hash will be stored on the main channel and every organization on that channel can verify that hash. Conversely, private data is stored only on peers that are authorized to access it.
Blockchain is a technology that can be used for many different areas and businesses, but some particularities must be addressed when designing and thinking about the usage of blockchain technology.
Shadow Reads
Hyperledger Fabric is a ledger and should not be used like a normal database (indeed it is supported by a database). As it is distributed, some problems need to be addressed. One of these is shadow reads. A shadow read occurs when we read data that is no longer up to date (data that is in the process of being updated/changed).
This can easily occur when using rich queries in Fabric. A rich query is a query to the ledger in which we search for multiple data attributes. We should avoid using those queries. For that, we should use a query by the key or partial key and then filter it on the chaincode. At this moment, the key structure must be very well thought out and planned. Here is a quick test (using Hyperledger Caliper®) on an Apple M1 Pro with 16GB of RAM:
Using Indexes
Another aspect that should be considered is having an index to support those rich queries. An index will speed up the query and the response for it. It has a downside: it needs to be updated with some frequency and can use a bit of the peer CPU. It can be worse if the index is created on fields that are constantly changing (for example a variable figure).
Implications for Performance
Another aspect to take into account is which data is stored in the world state and which data is stored on the ledger. The ledger is always reachable and acts as an archive, while the world state data has implications on performance. The world state is a NoSQL database, so it has maintenance tasks to update its indexes. Additionally, when making a query to a big database, it takes more time to execute and return the results of that query.
Several patterns can be used to mitigate some of these caveats when using the PDCs:
Use a corresponding public key for tracking the public state:
Optionally have a matching public key for tracking the public state and create a public-private relation in each organization
Chaincode access control:
Implement access control in chaincode to specify which clients can query private data in a collection
Sharing private data out of band:
As an off-chain option, share private data out of band with other organizations and they can hash the key/value to verify it matches the on-chain hash
Sharing private data with other collections:
Share the private data on-chain with chaincode that creates a matching key/value in the other organization’s private data collection.
Transferring private data to other collections:
Transfer the private data with chaincode that deletes the private data key in your collection and creates it in another organization’s collection.
Using private data for transaction approval:
If you want to get a counterparty’s approval for a transaction before it is completed (e.g. an on-chain record that they agree to purchase an asset for a certain price), the chaincode can require them to ‘pre-approve’ the transaction.
Keeping transactors private:
Variations of the prior pattern can also eliminate leaking the transactors for a given transaction.
© Copyright Hyperledger 2020-2023. “Private data" https://hyperledger-fabric.readthedocs.io/en/latest/private-data/private-data.html#private-data-sharing-patterns
PII and the Right to be Forgotten
If the project is done with EU citizen data, for example, there are GDPR considerations that must be taken into account.
Keep in mind that PII (private identifiable information) cannot be stored on the blockchain, even on PDCs. And even PII that has been encrypted cannot be stored on chain, so take that into account by anonymizing all necessary information. Since the blockchain keeps track of all data on the ledger, the use of an external system can be used to have the PII and an ID to correlate it to the blockchain data.
Another aspect of developing with Fabric is the obfuscation/occultation of the parameters that are passed into the chaincode. This is important because the arguments that are passed into the chaincode are stored in the blocks, so it needs to also be addressed. There are two ways to pass them, normally with arguments or by the transient field. This will also impact the way the parameters are deserialized and where to grab the parameters.
Experimenting with the Fabric Network
To keep things interesting, we decided to conduct several practical experiments into network performance that can benefit our readers. Specifically, we investigated how variations in the quantity of different Hyperledger Fabric components could be used to optimize throughput on the network. After some tests in different types of environments, with more peers, fewer peers, higher block timeouts and block sizes, more CPU, and less CPU — these were our findings:
- We found that the write throughput for the default setup with LevelDB* is around three times the maximum throughput with Apache CouchDB™
- We found that doubling the number of clients to distribute client workload to more workers had a 0% impact on performance with public transactions and a 4% increase on performance with private transactions
- We found that doubling the number of channels created a 2% decrease on performance with public transactions and a 13% increase in performance with private transactions
- We found that for the given setup, the optimum is 8 peers per organization with an endorsement policy of 2 out of 8. We could, therefore, improve public transaction performance by up to 32% by adjusting the number of peers. With private transactions, the improvements are a little less impressive yet still significant at a 21% performance increase with 8 peers per organization, rather than with 2 peers per organization.
- We found that scaling the number of organizations and the endorsement policy up proportioately, doesn’t affect the throughput for smaller networks; however, for larger network sizes throughput degrades considerably.
- We found that It would appear that, below a request rate of 1500 tx/s, the ordering service is not a bottleneck for up to 64 orderers; although it might become a bottleneck for larger ordering services. Using a RAFT ordering service with up to 64 nodes should be sufficient in practically any scenario, as this would allow 31 crashes and still ensure the network’s functionality.
Best Practices for Fabric and Private Data Collections
So, what are we to make of these results? Below we have several general rules that emerge from our experiment, allowing readers to optimize the performance of their Hyperledger Fabric networks in the context of PDCs.
1. Check the business needs regarding data privacy
2. Check data access so the PDCs are created accordingly
3. Don’t store all the data on the blockchain
4. Especially don’t store PII data
5. Create models in a way that makes it easy to search
6. It is better to filter data on the chaincode than to use a rich query
7. In general, don’t use rich queries
8. If you need to use a rich query, create an index for it
9. If there are bottlenecks in the blockchain, check the peers' resources first
10. Manage the endorsement policy according to the network size
11. Keep the queries simple and no CouchDB™ is necessary
Clearly, there are a lot of moving parts here. As such, one key take-away for readers is that a thorough business analysis is essential. This will make a massive difference when planning a Fabric deployment due to the need to know, for example, whether CouchDB™ should be used, or how many channels the network requires, or how these should be configured, and so on. Having the right answers ready during the planning stage will offer disproportionate returns as the project moves forward.
Further Reading: ‘Catalyst Blockchain Manager: the Best Efficiency Tool for Hyperledger Fabric’
Continue the conversation
If you enjoyed this article or found it useful, and you want to continue the conversation, we would be happy to schedule a call. We can offer help with the management of your Fabric network, deployment strategies, Catalyst Blockchain Manager licensing, and a number of other professional advisory services.
Schedule a call today to learn more.
*Copyright © 2011 The LevelDB Authors. All rights reserved.