A Framework for the Safe Management of Digital Identities and Data
Identity plays a major role in everyday life. Think about going to an office, getting on a plane, logging to a website or making an online purchase. Identity is the key that determines the particular transactions in which we can rightfully participate as well as the information we’re entitled to access. But, we generally don’t pay much attention to the management of our identity credentials unless something goes seriously wrong.
For much of history, our identity systems have been based on face-to-face interactions and on physical documents and processes. But, the transition to a digital economy requires radically different identity systems. In a world that’s increasingly governed by digital transactions and data, our existing methods for managing security and privacy are proving inadequate. Data breaches, large-scale fraud, and identity theft are becoming more common. In addition, a significant portion of the world’s population lacks the credentials needed to participate in the digital economy. Our existing methods for managing digital identities are far from adequate.
As explained in A Blueprint for Digital Identity, - a 2016 report by the World Economic Forum, - identity is essentially a collection of information or attributes associated with a specific individual. These attributes fall into three main categories: inherent - attributes intrinsic to an individual, - e.g., age, height, date of birth, fingerprints, color of eyes, retinal scans; assigned - attributes attached to but not intrinsic to the individual - e.g., e-mail address, telephone numbers, social security, drivers license, passport number; and accumulated - attributes gathered or developed over time - e.g., health records, job history, home addresses, schools attended.
While mostly associated with individuals, identities can also be assigned to legal entities like corporations, partnerships and trusts; to physical entities like cars, buildings, smartphones and IoT devices; and to digital entities like patents, software programs and data sets.
Data attributes are generally siloed within different private and public sector institutions, each using its data for its own purposes. But to reach a higher level of privacy and security, we need to establish trusted data ecosystems, which requires the exchange and sharing of data across a variety of institutions. The more data sources a trusted ecosystem has access to, the higher the probability of detecting fraud and identity theft while reducing false positives. In addition, an ecosystem with a variety of data sources can help foster economic inclusiveness by certifying the identities and credit worthiness of poor people with no banking affiliation.
However, safeguarding the data used to validate identities creates security and privacy issues of its own. It’s unsafe to gather all the needed attributes within one institution or central data location, making it a target for data breaches. But, it’s also highly unfeasible, as few institutions will let their critical data out of their premises.
However, there are innovative ways to move forward. One such approach is the identity and data sharing framework being developed by MIT’s Trust::Data Consortium, which was founded by Media Lab professor Sandy Pentland to create an open platform, tools and services that foster the development of a secure Internet-based network of trusted data. The Trust::Data Consortium is part of the MIT Connection Science initiative, which I’m involved with as a Fellow.
A few weeks ago, Pentland and Connection Science CTO Thomas Hardjono published Digital Identity is Broken. Here’s a Way to Fix It, - a guest column in the WSJ CIO Journal. “Most people today suffer from a strange sort of psychosis: we are uncertain of our identity. For although we are (mostly) certain of who we are in our own minds, the identity we use to interact with the government, obtain services, and pay for goods is unreliable.”
The authors argued that identity credentials should be issued by members of our community and certified by the institutions and individuals with whom we regularly interact. “The key point is that the community is the source of trust for assertions regarding its individual members. It should be the community that certifies digital identity credentials.” We should take a page from the process for obtaining a security clearance, - where our references with former employers, coworkers, friends, neighbors and landlords are carefully checked. “This does not have to be a physically co-located community, but it does have to be a connected, interacting community with a history of trust between people, and where people care about their reputation within the community.”
The Trust::Data Consortium has been developing such a digital identity and data sharing framework, which was described in a recent paper by Hardjono and Pentland, - Open Algorithms for Identity Federation. “The identity problem today is a data-sharing problem,” they write right up front. But just having access to static or fixed attributes is of limited value to the participants in an identity ecosystem, in addition to potentially compromising the information being shared. What’s needed is the ability to share information in a privacy-preserving manner. Instead of exchanging static attributes, the paper proposes the collective exchange of vetted algorithms among the participants in the trust network ecosystem, a paradigm it calls Open Algorithms (OPAL), whose tagline is bring the code to the data.
OPAL is designed to address a number of key challenges relating to the safe management of identities, starting with the unmanageable proliferation of identity identifiers, - such as an e-mail address, - which are created to support access to each separate Internet account. This results in a massive and risky duplication of our personal data, as Internet services needlessly hold the same set of attributes associated with each user, e.g., name, address, phone, etc. In many cases users have little knowledge about the actions taken with their data beyond what they initially consented to. Not surprisingly, we’ve seen a diminishing trust in the institutions holding our data, especially given the recent spate of attacks, thefts and misuse of massive amounts of data.
The OPAL paradigm is based on several key principles, including:
- Move the algorithm to the data. Instead of gathering raw data into a central location for processing, the algorithms or queries should be sent to the repositories and be processed there. Only “safe” answers are returned.
- Decentralized data architecture. Raw data must always remain in its permanent repository under the control of the repository owners. Only the “safe” results of applying the algorithm or query against the data are returned.
- Open, vetted algorithms. Algorithms must be openly published, agreed to, and vetted by experts to be “safe” from privacy violations, bias, and other unintended consequences.
- Subject consent. Data repositories must obtain explicit consent from the subjects whose data they hold for the execution of an algorithm against their data; the vetted algorithms should be made available and understandable to subjects.
- Data Federation. In a group-based trust network ecosystem, algorithms must be vetted collectively by all the members of the ecosystem; each member must observe the OPAL principles and legal frameworks.
- Data is always in an encrypted state. Data must be encrypted while stored, transmitted and when algorithms are applied against it.
- Transparency and regulatory compliance. All requests and responses must be logged, to enable the auditing of all interactions, as well as proof of regulatory compliance. Blockchain technologies could be used to provide a shared, immutable log of events.
Apply correct pricing models for algorithms and data: A correct pricing structure needs to be developed by the members of a trust network ecosystem, to encourage data owners to develop new business models based on the OPAL paradigm.
The term algorithm is left intentionally undefined, giving each OPAL deployment the flexibility to define the semantics and syntax of their algorithms. “In the case of a community of data providers organized under a trust network, they must collectively agree on the semantics and syntax in the operational sense. Such a definition should be a core part of the legal trust framework underlying the federated community.”
“Instead of sharing fixed-attributes regarding a user or subject, the OPAL paradigm offers a way… to share vetted algorithms,” says the article in conclusion. “This in turn provides better insight into the user’s behavior, with their consent. It also allows for the development of a trust network ecosystem consisting of these entities, providing new revenue sources, governed by relevant legal agreements and contracts that form the basis for a information sharing legal trust framework.”
“Finally, a new set of legal rules and system-specific rules must be devised that must clearly articulate the required combination of technical standards and systems, business processes and procedures, and legal rules that, taken together, establish a trustworthy system for information sharing in a federation based on the OPAL model.”