GDPR came into force in May last year, intended to give individuals more control over their personal data. But one of the unintended consequences of the new regulation—which we’re already starting to see across a range of industries—is the drag on innovation. Would-be innovators must walk a tightrope between data compliance and business innovation based on the data they have access to and, in many cases, they could do with a helping hand. Could an accredited independent body responsible for overseeing the creation and responsible use of open data sets be the solution?
The rate at which startups and big tech are developing artificial intelligence and machine learning technologies is seemingly unstoppable, yet without access to high-quality data sets, the potential value of AI to society will not be realized. But using even publicly available data is not without risk.
IBM, for example, was recently found to be using Flickr photos without people’s permission to train their facial recognition software. It’s not illegal and IBM said they were committed to protecting the “privacy of individuals,” but Flickr users who wanted their image removed from the training data reported it was very difficult to achieve.
There are many other examples of a tech company either not adhering to or understanding the current rules around data privacy, or appreciating the sensitivities amongst users. It’s not surprising, then, that the House of Lords recently hit out at big tech companies about how they use people’s personal data.
Universally available data sets
Large organizations that collect reams of customer data—retail banks for example—are often reluctant to exploit this information for the development of new products and services. Their data sets are gathering dust because the c-suite is too scared to allow developers to come anywhere near them. Such is the fear of breaking the rules and being served a hefty fine (up to 4% of global turnover) by the Information Commissioner’s Office (ICO), never mind the public fallout that will inevitably follow.
Startups, on the other hand, face a completely different challenge. They usually don’t have in-house data and instead must rely on third-party sources in order to develop and test their new algorithms. There are many open source data sets that have successfully pushed the state of the art in key frontiers of machine learning, such as in image recognition for example. Yet more commercially relevant data are not easy to come by, especially the high-quality data sets that are needed for successful innovation.
Pooling commercial, demographic or health data sets from multiple sources, using state of the art privacy methods to anonymise them and then making them available to all, would enable startups and corporate innovation arms to develop amazing new products far more efficiently than they can do today—allowing them to safely experiment, train and validate new systems much faster than is currently possible.
Leveling the playing field
It would also get rid of the chasm that exists between established tech companies and the new kids on the block. Ten years ago, Google and Amazon, for example, were able to freely collect large data sets without risking the reputational damage associated with user privacy. They are now so far ahead in training their algorithms based on their huge historic data-sets that it is almost impossible for startups to catch up, even if they’ve invented a far better underlying method. They have used their head start in data ownership to pull the proverbial drawbridge up behind them, leaving newcomers with no real chance to ever catch up.
Universally-available data sets would level the playing field and allow startups to grow their businesses according to the quality of their technical and business innovations, rather than on their access to data – potentially allowing them to compete with the current tech giants.
Accredited independent body
The problem that tech companies have come up against, time and time again, is where to draw the line between user privacy and the need to innovate.
Recent history has shown that individual, for-profit organizations aren’t best placed to identify the boundaries of what is acceptable and what isn’t. It’s time that an independent accredited body steps up to oversee the creation and responsible use of open data sets, with the key objectives to support tech innovation and the public good.
The ICO, the U.K.’s independent body set up to uphold information rights, is currently working on the development of an auditing framework for AI. When available, it will give the ICO a methodology to audit AI applications to make sure they are transparent and fair. It will also inform future guidance for organizations to support the continuous and innovative use of AI within the law.
This is great news of course, but for the tech community, such a framework doesn’t go far enough. We need the ICO, or a new body, to work with the tech community to oversee the creation and responsible use of commercially-relevant, open data sets, which in turn will support real tech innovation.
Data for good
AI and ML-led innovations can bring benefits to the economy and wider society, sometimes well beyond that of the commercial gains to the original innovators. A better analysis of consumer data could lead banks to offer better credit terms to existing customers, and access to credit to a wider demographic of customers, who don’t have a current credit history, for example. Car manufacturers could remove gender bias in their product designs by having access to representative data sets, leading to a positive impact on consumers’ safety. There are thousands of other examples like these that could be enabled by curated, open data sets.
This approach could benefit the public sector as well, most obviously around health and environmental policies. For example, if city planning data were coupled with data on medical devices, such as inhalers, to identify air pollution hotspots, the relevant authorities can prioritize anti-pollution measures for roads where people most suffer the medical consequences.
Improvements in the way companies use data can have a big societal impact. As such, it is not unreasonable to ask consumers to share some of their anonymized data to enable these innovations. Some level of risk sharing between companies and consumers encourages competition, innovation, and public research for the greater good.
Current laws keep data locked away behind corporate firewalls and prevent data sharing. New regulation that supports rather than hinders innovation is needed. Rather than coming down hard on companies, governments across Europe should work together to build an accredited body or regulator with the power to sign off on companies’ data policies and to help make anonymous, fair and balanced, open data sets publicly available. Only in this way can society benefit from AI to the full extent of its promise.