Requirements that online platforms share data with businesses that use their services are increasingly being implemented or proposed as procompetitive measures. The Digital Markets Act and the proposed American Innovation and Choice Online Act are two examples which mandate certain data sharing between platforms and business users. While open data sharing between businesses can provide attractive opportunities, it also presents risks to user privacy.
The more places user data exists, the more places it must be effectively governed to guard against misuse. User privacy should not be an afterthought to data sharing practices, but rather built into the design of data sharing from the ground up. Drawing on Privacy by Design philosophy, I highlight a few important privacy-centering principles for policymakers and engineers to keep in mind when considering data sharing practices between business users:
Minimize Disclosure
The interactions between platforms, business users, and consumers can generate vast quantities and kinds of data. Consider a hypothetical small business, ToolCenter, which sells toolsets on Amazon. When Arya, a consumer, buys a toolset from ToolCenter she generates data about the immediate transaction: that a toolset was purchased at a particular date and time for a particular price, her payment information, and the shipping address. There is also data related to Arya’s Amazon account which may be relevant to the transaction: her Amazon member status, whether she chooses to group orders and have them delivered on “Prime Day”, if she can pay with credit card points.
There also may be inferred or aggregate data which is generated in part by Arya’s transaction. For example, Amazon may analyze Arya’s transaction history and infer from her purchases in addition to the toolset (perhaps a new coat rack, wall hangings, and kitchenware) that she is setting up a new living space. Amazon may use Arya’s transaction in aggregate with purchases of toolsets across all of Amazon to determine which demographics are buying the most toolsets.
The scope of data generated by business and user interactions is ambiguous. When proposing data sharing requirements, it is not always clear where to draw the line. While it may be intuitive that ToolCenter should have access to the direct transaction information needed to fulfill the transaction, the question becomes more complicated and presents greater user privacy risk when it comes to inferred or aggregate data. For example, if the ToolCenter transaction was used to infer that Arya is moving to a new home, should ToolCenter be given that information and thus be given implied data about the rest of Arya’s transaction history?
In the face of this ambiguity, data sharing mechanisms should not default to sharing all data that could be relevant. Rather, platforms ought to share the minimum amount of user data necessary to provide the intended benefit. Instead of trying to draw the line of acceptable data sharing by including all relevant data and debating exclusions, privacy should be the default setting by specifying narrow, structured data to share and expanding that scope only when there is a clear benefit to the user.
Limit Sharing to Trusted Actors
The barrier to become a business user on some platforms is relatively low. For example, it takes just minutes to set up a Facebook page. This increases the risk that some ostensible “business users” have malicious intent or, at the least, sloppy data governance practices. A page claiming to be a publishing business may actually be an Ad Farm linking people to click-bait headlines where they are bombarded with advertisements. Other pages might post content solely to sell or collect data on who engages with it. Sharing additional data with such actors runs the risk of misuse. If the page is deceptive about their intentions, sharing additional data with them is inherently a violation of user privacy.
Not all business users should be given increased data access. Platforms should retain the ability to deny access to business users demonstrating concerning behaviors. Instead of relying on access revocation after a harm has already occured, data requestors should be required to attest that they implement responsible data governance practices and be established as trusted actors before gaining data access.
Do Not Rely on User Consent
Requiring consent before data is transferred between businesses is not a panacea for ensuring responsible data use. With large amounts of data being collected and transferred between businesses with different purposes and privacy policies, it is complex to communicate to users what exactly they are consenting to. While European law requires consent dialogues for cookie use, for example, research shows that the majority of analyzed sites violate consent best practices in some way, including incorrect identification of data collection purpose and ambiguous labeling.
The widespread inefficacy of consent dialogues suggests that enforcement is infeasible at the scale of the internet and counterproductive to procompetitive goals. Given the difficulty of obtaining meaningful consent, it is important to shift the burden off of the consumer and instead ensure companies are obligated to employ privacy-preserving principles when collecting and using data.
Protecting users requires designing systems which proactively protect against privacy harms before they occur. Legislative and technical efforts which start design with privacy in mind, instead of ending with it, have the best chance of promoting data sharing practices which do right by consumers.