In the “digital age” the amount of data our applications collect is not linear, but rather exponential. It is estimated that we produce 2.5 quintillion bytes of data every day and with the emergence of new technologies like IoT, this number is growing at a rapid pace. Today’s diverse application end users and business stakeholders depend on structured and unstructured data to perform specific functions. They need a Data Warehouse! Does blockchain hold incredible potential, especially in its role in warehouse digitalisation? Blockchain might be able to provide solutions that leverage the right file storage, relational database, and data warehouse / lake to enable organisations to have the flexibility, scalability, and analytics needed to support their strategic goals.
In the latest of her Duet interviews, Dr Caldarola, editor of Data Warehouse as well as author of Big Data and Law, and Dr Revolidis discuss the potential of blockchain in a data warehouse.
You have written an article about blockchain in our book Data Warehouse. Blockchain is known in connection with Bitcoin.
Could you explain what a blockchain is in the context of a data warehouse?
Dr Revolidis: Certainly, when most people hear the term “blockchain,” they immediately associate it with Bitcoin and other cryptocurrencies. However, the utility and potential applications of blockchain technology extend beyond the realm of cryptocurrencies.
At its core, a blockchain is a distributed and immutable ledger, where transactions are recorded in a series of blocks that are linked together and secured using cryptographic principles. Each block contains a cryptographic hash of the previous block, forming a continuous chain, which makes retroactive alterations practically impossible without altering subsequent blocks.
One can observe many potential applications in the context of a data warehouse:
- Immutable Data Storage: One of the primary virtues of blockchain is its immutability. Once data is recorded on a blockchain, it cannot be altered without the consensus of the network. This can introduce a level of data integrity and authenticity to data warehousing that traditional databases may struggle to match. For instance, if an organisation’s data warehouse were to employ blockchain principles, historical data would be preserved with a high level of confidence in its accuracy over time.
- Decentralisation and Verification: Unlike traditional data warehouses that centralise data storage, blockchains can operate on a decentralised model. Each participant in a blockchain network has access to the entire database and its complete history. This decentralisation can provide a mechanism for data validation and verification in warehousing contexts, thus ensuring that data has not been tampered with.
- Integration and Interoperability: Blockchain can act as a bridge between different data silos and systems, offering a unified and transparent view of data. This can be particularly beneficial in scenarios where data from different sources needs to be integrated into a single warehouse, ensuring that the data is consistent and reliable across the board.
- Enhanced Security: The cryptographic nature of blockchain offers a higher level of security for data. This can be of immense value for sensitive data stored in a data warehouse, ensuring that the data remains confidential and resistant to unauthorised changes.
- Traceability and Audit Trails: Every transaction on a blockchain is recorded with a timestamp and linked to the previous transaction. This provides a transparent audit trail of data changes, which can be particularly valuable for industries that require rigorous data tracking and compliance.
To summarise, while blockchain is widely recognised for its role in the world of cryptocurrencies, its foundational principles of immutability, transparency, and decentralisation can offer transformative potential for data warehousing, while enhancing data integrity, security, and traceability.
My opinion is:
“The trust blockchain bestows is not merely coded; it’s a delicate dance betweensoftware and traditional legal frameworks, each guiding the other”.
Dr Ioannis Revolidis
A data warehouse can contain personal and non-personal data. When a blockchain is deployed in a data warehouse, who does the blockchain serve? The operator for the “better” processing of “his/her” data? Or for the data subject and device owners who make their data available for “tamper-proof” purpose-specific processing in the data warehouse?
Blockchains may provide multifaceted benefits that can serve multiple stakeholders simultaneously. In fact, especially when personal data is being processed, the adoption of a blockchain solution should necessarily serve the interests of both the operator of the data warehouse and those of the data subjects. After all, that’s exactly what the GDPR wants to happen
(see art. 1(1) and (3) of the GDPR).
In that sense, I would say that when a Blockchain solution is incorporated within a data warehouse, both the operator and the user/data subject can derive advantages from its deployment, albeit in different ways.
As regards the operator I would generally assume that the following benefits could be extracted:
- Data Integrity and Trustworthiness: By leveraging blockchain’s immutability, Operators can ensure that the stored data has not been tampered with, thus enhancing the reliability of the warehouse.
- Auditability: Every data transaction is transparently recorded, which aids in comprehensive auditing processes. This transparency can be beneficial for sectors that are regulated and require stringent data documentation.
- Operational Efficiency: Blockchain can facilitate real-time data sharing and integration across different systems, potentially improving data warehousing operations.
As regards the user/data subject I would envisage the following advantages, which align well with certain regulatory goals, especially:
- Data Sovereignty: With some blockchain implementations, individuals can maintain a substantial level of control over their personal data, determining who can access it and for what purpose. This aligns well with principles of modern data protection, where user consent and control are paramount.
- Assurance of Data Integrity: Individuals can be more confident that their data, once entered into a blockchain-backed data warehouse, remains unaltered and genuine.
- Transparency and Traceability: If individuals are granted access, they can view the history of their data transactions, offering transparency into how their data is being used and processed.
I would like to stress, nonetheless, that the aforementioned benefits only hold true if certain design choices are being made when the system is being programmed. What I mean is merely adopting a blockchain-based solution will not automatically guarantee the benefits I mentioned above. Incorporating a blockchain solution to a data warehouse is a decision that should not be taken lightly, since there are several parameters that need careful consideration.
For example, I would never advise an organisation that operates a data warehouse to implement a blockchain solution based on a public blockchain. This would put a huge compliance burden on the shoulders of this organisation and might even jeopardise the application of basic data protection principles as regards the personal information of users/data subjects, especially if the storage of their personal information would be “on-chain”. The risks associated with such a choice might even negate many of the expected benefits.
Another example would be the desired level of data sovereignty on the user/data subject end. It would not be enough to set up a blockchain in order to achieve this kind of sovereignty. Additional software, most probably in the form of smart contracts, must complement a robust identity and data management design, by virtue of which the operator of the data warehouse will make the necessary arrangements that will foster a substantial degree of user empowerment. In addition, we must also take into account that the deployment of a blockchain solution would be a major cultural shift for users. While they can expect the benefits I mentioned above, they must also take on the administration of several key processes related to information and data. Not everyone might wish to do so.
With Bitcoin, the blockchain enables decentralised structures, self-determination, barrier-free access, no identification, security against manipulation, personal responsibility and freedom. Who benefits from these advantages of a data warehouse with blockchain technology?
When integrating blockchain technology within a data warehouse, many of the advantages inherent to blockchain ecosystems can indeed permeate the entire organisation, although the context and stakeholders involved may differ. Let’s break down some possible outcomes.
Let’s start with the fact that blockchains generally promote decentralised structures. In this context, I think that both operators and users of the data warehouse can benefit. For operators, decentralisation can provide an added layer of trust within their organisation. In addition, it might also enhance the robustness of their ecosystem against data failures and central points of attack. For users, it can offer a sense of shared ownership and assurance that no single entity has overarching control, provided nonetheless that code and governance assurances are put in place so that governance of the data sets is sufficiently decentralised.
When I consider the second advantage you mentioned, namely self-determination, I would say that those benefiting would primarily be individual data subjects or entities contributing data to the warehouse. They might have more say in how their data is used, especially if smart contracts or other blockchain-based governance mechanisms are employed.
When we turn to a barrier-free access, I think that this is a rather sensitive point. I believe that the public and open nature of Bitcoin aligns well with what this particular implementation strives to achieve, but I’m afraid that within the context of a data warehouse this degree of openness might be undesirable or even risky.
Manipulation security, on the other hand, is a net gain for both operators and data subjects. The immutability of blockchain records ensures that once data is recorded, it is resistant to unauthorised alterations. This builds trust in the data’s integrity for everyone involved.
Finally, you made a valid point by referring to individual responsibility when thinking about blockchain applications. I also alluded to the same in my answer to your previous question. While all stakeholders can benefit from the implementation of blockchain-based solutions, there is indeed a heightened need for education and awareness. Just as losing a private key in Bitcoin means losing access to funds, mismanagement in a blockchain-based data warehouse can lead to irreversible consequences. In blockchains, freedom (in terms of data access, sharing and even monetisation) comes with responsibility.
A data warehouse must be secured within the framework of information security and at the same time legal obligations (data protection, for example) must be adhered to. This is usually achieved by turning to technical measures for efficient and quick handling. Can blockchain technology help with this?
Indeed, blockchain technology, especially the use of smart contracts, can be pivotal in ensuring both information security and compliance with legal obligations.
As regards information security, a number of advantages come to mind. For example, smart contracts could help achieve encryption and confidentiality since they can be designed to incorporate cryptographic encryption methods. This ensures that sensitive data remains confidential and only accessible to parties having the correct decryption keys. By storing encrypted data and using smart contracts to manage access, unauthorised entities can be effectively barred from reading sensitive information. In addition, smart contracts can automate and enforce stringent access controls based on predefined conditions, ensuring that data is only accessible to authorised parties and under the right conditions. For additional security, sensitive data can be tokenised using smart contracts. Instead of storing the actual data, a token representing that data is stored. The real data can be stored off-chain in a secure environment, while the token on the blockchain provides a secure and transparent way to transact with the data warehouse system.
In terms of compliance, smart contracts may guarantee automated and consistent enforcement. Smart contracts are self-executing software where the terms of the agreement or conditions are written into lines of code. Once conditions are met, actions are automatically executed. This ensures that predefined compliance measures are consistently enforced without human intervention, reducing the potential for errors or biases.
Nonetheless, the deployment of such data security and compliance solutions does not come without challenges.
Even though smart contracts are immutable and cannot be tampered with once deployed, they are only as robust as their code. If there are bugs or vulnerabilities in the contract’s code, they could be exploited, leading to unintended consequences. Moreover, once identified, these vulnerabilities cannot be rectified without deploying a new version of the contract, which might not always be straightforward.
The deterministic nature of smart contracts is another challenge, especially in terms of compliance. While the deterministic nature of smart contracts ensures they perform exactly as written, this rigidity can be a drawback. Laws and regulations are nuanced and subject to interpretation, and smart contracts might struggle to capture these nuances. This could lead to oversimplifications or dangerous misinterpretations of legal obligations.
Moreover, for many compliance measures, smart contracts may need to interact with external data sources or systems (often referred to as “oracles”). Dependence on external data introduces potential points of failure or manipulation.
Incorporating smart contracts for data security and compliance in a data warehouse or any data management system requires a robust understanding of the underlying blockchain platform, the nature of the data, and the potential threats. While they offer many advantages for automating and bolstering data security and compliance measures, they must be implemented thoughtfully and in conjunction with other security and compliance best practices to ensure a truly secure and compliant environment.
A data warehouse – especially if it contains personal data – must comply with the GDPR. Since personal data is obtained from different sources, and because different data of the different people holding the rights to said data may be used for different purposes and since the (foreign) data of (foreign) right holders may be subject to different national data protection laws, the data warehouse operator has to deal with different variables. Is blockchain a technology which can help data warehouse operators manage and implement precisely these data protection variables?
While some commentators view blockchains as a compliance problem and not a compliance solution, I think that a more balanced approach would be fairer.
Blockchain technology can be implemented to leverage its distinct features, including user pseudonymity, data provenance, access permissions, and adherence to various national data protection regulations.
We discussed earlier, for example, that blockchains operate in a default state of user pseudonymity. This is a well-recognised data protection principle that has even made its way within the text of the GDPR itself. Art. 32(1)(a) of the GDPR explicitly refers to pseudonymisation as a compliance mechanism and since blockchains offer this by default, it would be unfair not to recognise their potential for data protection compliance.
But there are other blockchain qualities that can support data protection principles.
Blockchain’s inherent transparency and immutability can contribute to the traceability and validation of data origin. Every transaction or data entry on a blockchain is timestamped and becomes part of an unalterable record, which could aid in ensuring that data is sourced and processed correctly. This is a quality that not only supports the compliance efforts of operators, but might even prove a valuable resource for supervising authorities. Moreover, blockchain’s ability to manage and control access to data via cryptographic means ensures data integrity and security. Finally, smart contracts can be deployed to manage and automate consent protocols. This automation might play a crucial role in ensuring data is used in accordance with stipulated guidelines and might mitigate manual oversight errors.
Doesn’t tagging provide better options for managing variables? Can’t this technology handle further obligations such as consent and revocation management, incoming and outgoing controls, documentation in the processing register, deletion and retention obligations, etc. better than blockchains?
That is a very interesting point.
Tagging presents notable advantages especially within the context of data warehouses and data protection compliance. When considering the adaptability of data management systems, tagging often stands out due to its inherent flexibility.
I would, nonetheless, view the two solutions not as competing with one another. I would say that they can complement one another. Tagging can be combined with blockchain-based compartments in order to enhance the level of compliance with data protection requirements.
For example, while tagging can help manage the different variables of compliance, blockchain components can ensure that compliance is automated and hardwired into the system.
Let’s look, for example, at consent and revocation management. The granularity provided by tagging offers a streamlined way to categorise data according to consent criteria. Should a user revoke their consent, the relevant tag can swiftly identify and facilitate appropriate actions concerning the associated data. Execution of the appropriate actions can be automated and guaranteed by smart contracts in an immutable manner.
Data Warehouse operators could, therefore, combine all available tools and achieve high levels of efficiency and compliance. Blockchains and/or blockchain-based solutions can always be combined with other software in order to maximise the expected benefits. As with all technological architectures the crucial question will always be which tool can better serve the goals we want to attain. In that sense, blockchain-based applications can be part of a wider ecosystem of technological solutions, provided they are deployed efficiently and for purposes that align well with their inherent qualities.
Blockchain technology is characterised by the fact that those affected cannot be identified. Nevertheless, after their contribution to blockchain technology, pseudonyms are used, i.e., the data warehouse operator remains within the scope of the GDPR although “no identification” connotes anonymisation. What advantages does the use of blockchain bring to the data warehouse operator? Or does it only bring advantages if the data warehouse has a certain legal form – such as a cooperative?
Thank you very much for this question, it allows me to make an important clarification.
While blockchains inherently mask the authentic identities of participants by substituting names with public keys, this should not be simplistically construed as providing absolute anonymity. Though industry discourse might frame it as such, the legal perspective offers a nuanced understanding. Specifically, when the data inscribed on a blockchain ledger pertains to an identified or identifiable individual, it assumes the status of personal data within the context of the GDPR. It remains a matter of contention as to whom this applies; not all parties accessing a blockchain network possess the technical, legal, or practical means to re-identify the ecosystem’s participants. This contention is exemplified in the recent ruling of the General Court in T‑557/20 SRB vs EDPS (refer to paras 86 – 101). Intrinsically, what blockchains facilitate aligns with the GDPR’s conceptualisation of pseudonymisation. This inherent attribute of blockchains should not be overlooked. As underscored in Art. 32(1)(a) GDPR, pseudonymisation is recognised as a pivotal mechanism for data protection compliance and presents numerous advantages for data warehouse systems.
For example, by rendering data less identifiable, it reduces the risk associated with potential data breaches or unauthorised access. If a breach does occur, the pseudonymised data, in the absence of the additional decryption keys or reference data, is less likely to be exploited for malicious purposes, providing an added layer of security.
Simultaneously, pseudonymisation allows organisations to continue leveraging their data for analytical, research, and operational purposes. While the data is rendered less identifiable, its underlying structure and relevance are preserved, ensuring that the organisation can still derive valuable insights and perform necessary data operations.
Furthermore, by incorporating pseudonymisation, data warehouse systems can foster trust among stakeholders, clients, and end-users. Assuring individuals that their personal data is treated with the utmost care and diligence strengthens the organisation’s reputation and can lead to enhanced customer loyalty and stakeholder confidence.
In essence, the integration of pseudonymisation into data warehouse systems not only fortifies data security and ensures regulatory alignment but also preserves the functional utility of the data, creating a balance between data protection and operational efficiency. This is an important quality that blockchain-based applications can promote by default.
Nonetheless, I would like to stress that the pseudonymisation of user data can be a double-edged sword in a data warehousing context. While pseudonymity can benefit data subjects concerned about privacy and data protection, certain applications of data warehouses, especially those in regulated industries, may require clear identification protocols. A data warehouse must find a way to balance privacy, data protection and traceability. This will not always be an easy task.
Blockchain are characterised by being non-manipulable‑a feature that is intended to offer security to citizens. Does it also bring benefits to the data warehouse operator? If so, what are they?
The Blockchain’s defining feature is its immutability, which aims to provide a degree of security to users. From the perspective of a data warehouse operator, this characteristic does present certain operational implications. The immutable nature of blockchain ensures that once data is entered onto the system, its integrity is maintained, making subsequent alterations notably challenging. This immutability can aid data warehouse operators in establishing a consistent and traceable record, which can be of significance when verifying compliance with certain legal and regulatory standards.
The transparency inherent in blockchain means that every data transaction is recorded and can be traced. For data warehouse operators, this offers a structured overview of data transactions, which might be valuable in contexts where data validation or verification is essential.
While blockchain’s primary design might focus on securing user data, its features, particularly immutability and transparency, have potential implications for data warehouse operations, particularly in areas related to data integrity and compliance verification.
Why is it worthwhile for a data warehouse operator to think about using blockchain technology?
In the realm of data management and warehousing, trust is paramount. The fundamental proposition of blockchain technology is its ability to foster and establish trust. At its core, blockchains serve as trust mechanisms. They provide a transparent and immutable ledger, ensuring that data entries, once made, cannot be easily tampered with or altered without leaving a traceable record. This characteristic is especially valuable in scenarios where the verifiability and authenticity of data are crucial.
Now, the major question a data warehouse must grapple with when pondering the adoption of blockchain technology is that of trust. Is there a trust deficit that the organisation is trying to address? And if so, is it an internal or external trust issue?
Internally, there could be situations where departments within an organisation do not fully trust each other due to various reasons, perhaps stemming from operational discrepancies and silos. By implementing a blockchain-based solution, the need for inter-departmental trust can be minimised as the technology itself ensures data accuracy and integrity, making it a single source of truth that everyone can rely on.
Externally, trust challenges can arise when an organisation wishes to convey to its users or stakeholders that it operates with transparency and credibility. For businesses that rely heavily on user data, it’s increasingly important to assure users that their data is managed securely, transparently, and in a manner that grants them better control. Implementing a blockchain solution can act as a gesture of commitment to these principles. The immutable nature of blockchain ensures that data-related operations are transparent, while its decentralised structure can potentially offer users more direct control and security over their data.
Dr Revolidis, thank you for sharing your insights on the blockchain technology with regard to the use in a data warehouse.
Thank you, Dr Caldarola, and I look forward to reading your upcoming interviews with recognised experts, delving even deeper into this fascinating topic.