This article aims to hone in on the topic of how to apply an ethics framework to the dynamic and ever evolving world of data. We are going to focus on the comment box in the top left hand corner of the diagram below for this article but also with consideration for the lifecycle of data.
Why do we need a new / defined approach to data ethics?
The globalisation of companies and hence data are driving increased risk associated decisions that are made with this data. For example, Facebook has personally identifiable information of more than 2.7 billion monthly active users (Facebook — Facebook Reports Fourth Quarter and Full Year 2020 Results (fb.com)) and PayPal has the financial data of 361 million regular active users (PayPal Holdings, Inc. — Home (pypl.com)). The level of risk in how this data is used doesn’t affect a small group anymore - its impact is global.
There has been debate about the suitability, applicability and limitations of the three major approaches in normative ethics to the context of data. The main theories that enter the debate are virtue ethics (morality of the personal), deontology / Kantian ethics (where the actions are morally good based on rules) and consequentialism / utilitarian theory (whether the outcomes are morally good).
The challenge with all of these theories are that they are one dimensional or static in nature. They only consider part of the picture and the picture changes from scenario to scenario (nature of data, nature of decision, autonomy of system etc). These theories also do not lend themselves to be a adaptive or prescriptive enough to act as a decision making framework.
For example, in the US, data drives the inclusions and pricing of health insurance. The decisions that are made with this data can mean the difference between life and death for some people based on the treatments they would receive access to. How can the same ethics and decision making framework be applied to this as using data to drive what search result gets returned when deciding what to cook for dinner? Should there not be an increased level of scrutiny and consideration in the assessing the characteristics of the data and the decisions that are likely to be made with it?
Consider: A Multi-Dimensional Approach
Taking a multi-dimensional approach allows us to take a holistic view across the entire pipeline of data and take into account a much larger set of factors into consideration for an ethics oriented assessment process to data. Through adding a level of datafication and quantification into an assessment process this also helps with the transparency and applicability of a framework determining what is considered morally good vs bad.
Below are a suggested minimum set of criteria that should be considered. This is designed to be illustrative rather than be an exact framework.
- Sensitivity of Data and Control of Access - is the data sensitive in nature or pose key risks (e.g. credit card information). Is the levels of access applied fitting the sensitivity of the data
- Scope of Data - is it a small set of data or a big one? Is the dataset incomplete?
- Disclosure Comfort - are subjects of the data aware it is being collect and used and are they comfortable with this usage? Do they have the ability to opt in or out?
- Integrity of the Data - Is the supply chain of data known? Was there likely a bias in collection? Is the data set inaccurate in measurement approach?
- AI / Machine Automation - to what extent are machines collecting, processing and making decisions or taking actions on the data. If there is a subject matter expert involved /overseeing the processing or consumption of data where they may be able to highlight discrepancies?
- Decision Morality - Is the decision that is made (to collect, process or as a result of the consumption of data) morally good?
- Consequence Morality - Are the possible outcomes likely to be good or bad
Perhaps we try an example. Lets rank each of the criteria above on a scale of 1 (Bad) - 5 (Good). And lets say a score of more than 25 is a morally good decision.
Lets say Google has acquired FitBit and wants to change to product to collect the location data of all users and provide open access to this data to highlight with a purpose to highlight common exercise routes and traffic flows of people.
Location data is generally considered sensitive and open access increases who can use this data (for good or bad)
Big set of data that is incomplete as it only considers users with Fitbits.
Users are not made aware data is being collected and how it will be used. There is no way to opt out
(Score = 1)
Supply chain and measurement approach are reasonably accurate. People may leave them at home so there may be some inaccuracy.
(Score = 4)
Automation is used in the collection of data and to an extent in light processing however no action is taken by machines on the data.
(Score = 4)
The intent to provide information for other to make decisions
(Score = 3)
Making this data open access can have the positive intended consequences but can also have unintended negative consequences. Kidnappers looking at low traffic exercise routes. Insurance companies imposing different amounts based on suburb level activity.
(Score = 2)
Total Score = 18. Maybe don’t do this Google.
Consider: An Adaptive Implementation
Not all data, industries and use cases were created equal. Not all industries respond in the same way. It is suggested each of the criteria described in the section above are weighted specific to the use case or industry. There are also a number of considerations that need to be made around the implementation of the approach to ensure it adopted into practice and sustainable. These can help bridge the gap between a well defined theory and what happens in practice.
Not all industries respond and change to the same pressures and regulations. For examples, a mining company is less likely to respond to pressure from the public than a consumer facing eCommerce brand. In Health they are familiar with the concept of a Hippocratic Oath, Finance is familiar with government regulation such as PCI-DSS / APRA and consumer facing companies are familiar in dealing with pressure from their users. Making considerations for how industries have been historically pressured / regulated will aid in the transparency, uptake and accountability of implementing an ethics oriented assessment process for data.
- Pressure from government - Governments can’t always regulate multi-national organisations effectively. Google acquiring Fitbit before the ACCC completed an investigation of the transaction (ACCC considers legal action after Google completes $2.7 billion Fitbit deal (smh.com.au))
- Pressure from customers / clients / partners - Customers can advocate and promote suppliers to meeting certain standards. For example in other domains meeting quality (e.g. ISO9001), information security (ISO270001) or sustainability standards are enforced procurement or contractual conditions with suppliers. Could a similar approach could be adopted for data?
- Pressure from society - The ultimate threat to consumer facing companies is they lose the public trust and user boycott their service or offering. This can only be done well if companies are transparent around the data they have and how it is used.
- Continuous - what we know today will change tomorrow. New ways to collect, process and consume data will emerge tomorrow that haven’t been conceived today. While any model should be able to flex into some unknowns consideration should be made to constantly review and revise the approach taken to ensure its continued relevance and coverage. We also have the pressure of time. In the absence of extensive government regulation are we perhaps better of taking a more targeted approach on high risk types of data and use cases are worrying about less critical scenarios as a continuous rollout.
Traditional ethics theories were not designed to handled the characteristics of data in the contemporary world, let alone what the future may hold. The landscape of data is dynamic. We are seeing an ever increasing rate of change in the use cases and ways in which data is collected, processed and consumed.
What could a more appropriate model than the datafication of the process through which ethics is applied to data. Taking a multi-dimensional oriented assessment process and an adaptive implementation approach to data ethics gets us to consider a more holistic picture.