Privacy Principles

Abstract

Privacy is an essential part of the Web ([ETHICAL-WEB]). This document provides definitions for privacy and related concepts that are applicable worldwide. It also provides a set of privacy principles that should guide the development of the Web as a trustworthy platform. People using the Web would benefit from a stronger relationship between technology and policy, and this document is written to work with both.

Status of This Document

This is a preview

Do not attempt to implement this version of the specification. Do not reference this version as authoritative in any way. Instead, see https://w3ctag.github.io/privacy-principles/ for the Editor's draft.

This document is a Draft Finding of the Technical Architecture Group (TAG). It was prepared by the Web Privacy Principles Task Force, which was convened by the TAG. Publication as a Draft Finding does not imply endorsement by the TAG or by the W3C Membership.

This draft does not yet reflect the consensus of the TAG or the task force and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to cite this document as anything other than a work in progress.

It will continue to evolve and the task force will issue updates as often as needed. At the conclusion of the task force, the TAG intends to adopt this document as a Finding.

This document elaborates on the privacy principle in the W3C TAG Ethical Web Principles. It doesn't address how to balance the different principles in that document if they come into conflict.

Privacy is covered by legal frameworks and this document recognises that existing data protection laws take precedence for legal matters. However, because the Web is global, we benefit from having shared concepts to guide its evolution as a system built for the people using it ([RFC8890]). A clear and well-defined view of privacy on the Web, informed by research, can hopefully help all the Web's participants in different legal regimes. Our shared understanding is that the law is a floor, not a ceiling.

The Web is for everyone ([For-Everyone]). It is "a platform that helps people and provides a net positive social benefit" ([ETHICAL-WEB], [design-principles]). One of the ways in which the Web serves people is by protecting them in the face of asymmetries of power, and this includes establishing and enforcing rules to govern the power of data.

The Web is a social and technical system made up of information flows. Because this document is specifically about privacy as it applies to the Web, it focuses on privacy with respect to information flows. Our goal is not to cover all privacy issues, but rather to provide enough background to support the Web community in making informed decisions about privacy and to weave privacy into the architecture of the Web. Few architectural principles are absolute, and privacy is no exception: privacy can come into tension with other desirable properties of an ethical architecture, and when that happens the Web community will have to work together to strike the right balance.

Information is power. It can be used to predict and to influence people, as well as to design online spaces that control people's behaviour. The collection and processing of information in greater volume, with greater precision and reliability, with increasing interoperability across a growing variety of data types, and at intensifying speed is leading to an unprecedented concentration of power that threatens private and public liberties.

What's more, automation and the increasing computerisation of all aspects of our lives both increase the power of information and decrease the cost of a number of intrusive behaviours that would be more easily kept in check if the perpetrator had to be in the same room as the victim.

These asymmetries of information and of automation create commanding asymmetries of power.

Data governance is the system of principles that regulate information flows. When people are involved in information flows, data governance determines how these principles constrain and distribute the power of information between different actors. The principles describe the way in which different actors may, must, or must not produce or process flows of information from, to, or about other actors ([GKC-Privacy], [IAD]).

Typically, actors are people but they can also be collective endeavours such as companies, associations, or political parties (eg. in the case of espionage), which are known as parties. It is important to keep in mind that not all people are equal in how they can resist the imposition of unfair principles: some people are more vulnerable and therefore in greater need of protection. The focus of this document is on the impact that this power differential can have against people, but it can also impact other actors, such as companies or governments.

Principles vary from context to context ([Understanding-Privacy], [Contextual-Integrity]): people have different expectations of privacy at work, at a café, or at home for instance. Understanding and evaluating a privacy situation is best done by clearly identifying:

Its actors, which include the subject of the information as well as the sender and the recipient of the information flow. (Note that recipients might not always want to be recipients.)
The specific type of data in the information flow.
The principles that are in use in this specific context.

It is important to keep in mind that there are always privacy principles and that all of them imply different power dynamics. Some principles may be more permissive, but that does not render them neutral — it merely indicates that they are supportive of the power dynamic that emerges from permissive processing. We must therefore determine which principles best align with ethical Web values in Web contexts ([ETHICAL-WEB], [Why-Privacy]).

Information flows as understood in this document are information exchanged or processed by actors. The information itself need not necessarily be personal data. Disruptive or interruptive information flowing to a person is in scope, as is de-identified data that can be used to manipulate people or that was extracted by observing people's behaviour on someone else's website.

Information flows need to be understood from more than one perspective: there is the flow of information about a person (the subject) being processed or transmitted to any other party, and there is the flow of information towards a person (the recipient). Recipients can have their privacy violated in multiple ways such as unexpected shocking images, loud noises while one intends to sleep, manipulative information, interruptive messages when a person's focus is on something else, or harassment when they seek social interactions.

A person's autonomy is their ability to make decisions of their own volition, without undue influence from other parties. People have limited intellectual resources and time with which to weigh decisions, and by necessity rely on shortcuts when making decisions. This makes their preferences, including privacy preferences, malleable and susceptible to manipulation ([Privacy-Behavior], [Digital-Market-Manipulation]). A person's autonomy is enhanced by a system or device when that system offers a shortcut that aligns more with what that person would have decided given arbitrary amounts of time and relatively unlimited intellectual ability; and autonomy is decreased when a similar shortcut goes against decisions made under such ideal conditions.

Affordances and interactions that decrease autonomy are known as dark patterns. A dark pattern does not have to be intentional ([Dark-Patterns], [Dark-Pattern-Dark]).

Because we are all subject to motivated reasoning, the design of defaults and affordances that may impact autonomy should be the subject of independent scrutiny.

Given the large volume of potential data-related decisions in today's data economy, complete informational self-determination is impossible. This fact, however, should not be confused with the idea that privacy is dead. Careful design of our technological infrastructure can ensure that people's autonomy with respect to their own data is enhanced through appropriate defaults and choice architectures.

Privacy labour is the practice of having a person carry out the work of ensuring data processing of which they are the subject or recipient is appropriate, instead of having the parties be responsible for that work. Data systems that are based on asking people for their consent tend to increase privacy labour.

More generally, implementations of privacy are often dominated by self-governing approaches that offload labour to people. This is notably true of the regimes descended from the Fair Information Practices (FIPs), a loose set of principles initially elaborated in the 1970s in support of individual autonomy in the face of growing concerns with databases. The FIPs generally assume that there is sufficiently little data processing taking place that any person will be able to carry out sufficient diligence to enable autonomy in their decision-making. Since they entirely offload the privacy labour to people and assume perfect, unlimited autonomy, the FIPs do not forbid specific types of data processing but only place them under different procedural requirements. Such an approach is appropriate for parties that are processing data in the 1970s.

One notable issue with procedural, self-governing approaches to privacy is that they tend to have the same requirements in situations where people find themselves in a significant asymmetry of power with a party — for instance a person using an essential service provided by a monopolistic platform — and those where people and parties are very much on equal footing, or even where the person may have greater power, as is the case with small businesses operating in a competitive environment. It further does not consider cases in which one party may coerce other parties into facilitating its inappropriate practices, as is often the case with dominant players in advertising or in content aggregation ([Consent-Lackeys], [CAT]).

Reference to the FIPs survives to this day. They are often referenced as "transparency and choice", which, in today's digital environment, is often an indication that inappropriate processing is being described.

Privacy principles are socially negotiated and the definition of privacy is essentially contested ([Privacy-Contested]). This makes privacy a problem of collective action ([GKC-Privacy]). Group-level data processing may impact populations or individuals, including in ways that people could not control even under the optimistic assumptions of consent. For example, based on group-level analysis, a company may know that site.example is predominantly visited by people of a given race or gender, and decide not to run its job ads there. Visitors to that page are implicitly having their data processed in inappropriate ways, with no way to discover the discrimination or seek relief ([Relational-Governance]).

What we consider is therefore not just the relation between the people who share data and the parties that invite that disclosure ([Relational-Turn]), but also between the people who may find themselves categorised indirectly as part of a group even without sharing data. One key understanding here is that such relations may persist even when data is de-identified. What's more, such categorisation of people, voluntary or not, changes the way in which the world operates. This can produce self-reinforcing loops that can damage both individuals and groups ([Seeing-Like-A-State]).

In general, collective issues in data require collective solutions. Web standards help with data governance by defining structural controls in user agents and establishing or delegating to institutions that can handle issues of privacy. Governance will often struggle to achieve its goals if it works primarily by increasing individual control instead of acting collectively.

Collecting data at large scales can have significant pro-social outcomes. Problems tend to emerge when actors process data for collective benefit and for self-dealing purposes at the same time. The self-dealing purposes are often justified as bankrolling the pro-social outcomes but this requires collective oversight to be appropriate.

There are different ways for people to become members of a group. Either they can join it deliberately, making it a self-constituted group such as when joining a club, or they can be classified into it by an external party, typically a bureaucracy or its computerised equivalent ([Beyond-Individual]). In the latter case, people may not be aware that they are being grouped together, and the definition of the group may not be intelligible (for instance if it is created from opaque machine learning techniques).

Protecting group privacy can take place at two different levels. The existence of a group or at least its activities may need to be protected even in cases in which its members are guaranteed to remain anonymous. We refer to this as "group privacy." Conversely, People may wish to protect knowledge that they are members of the group even though the existence of the group and its actions may be well known (eg. membership in a dissidents movement under authoritarian rule), which we call "membership privacy". An example privacy violation for the former case is the fitness app Strava that did not reveal individual behaviour or identity but published heat maps of popular running routes. In doing so, it revealed secret US bases around which military personnel took frequent runs ([Strava-Debacle], [Strava-Reveal-Military]).

When people do not know that they are members of a group, when they cannot easily find other members of the group so as to advocate for their rights together, or when they cannot easily understand why they are being categorised into a given group, their ability to protect themselves through self-governing approaches to privacy is largely eliminated.

One common problem in group privacy is when the actions of one member of a group reveals information that other members would prefer were not shared in this way (or at all). For instance, one person may publish a picture of an event in which they are featured alongside others while the other people captured in the same picture would prefer their participation not to be disclosed. Another example of such issues are sites that enable people to upload their contacts: the person performing the upload might be more open to disclosing their social networks than the people they are connected to are. Such issues do not necessarily admit simple, straightforward solutions but they need to be carefully considered by people building websites.

While transparency rarely helps enough to inform the individual choices that people may make or in increasing their autonomy, it plays a critical role in letting researchers and reporters inform our collective decision-making about privacy principles. This consideration extends the TAG's resolution on a Strong and Secure Web Platform to ensure that "broad testing and audit continues to be possible" where information flows and automated decisions are involved.

Such transparency can only function if there are strong rights of access to data (including data derived from one's personal data) as well as mechanisms to explain the outcomes of automated decisions.

The user agent acts as an intermediary between a person (its user) and the web. User agents implement, to the extent possible, the principles that collective governance establishes in favour of individuals. They seek to prevent the creation of asymmetries of information, and serve their user by providing them with automation to rectify automation asymmetries. Where possible, they protect their user from receiving intrusive messages.

The user agent is expected to align fully with the person using it and operate exclusively in that person's interest. It is not the first party. The user agent serves the person as a trustworthy agent: it always puts that person's interest first. In some occasions, this can mean protecting that person from themselves by preventing them from carrying out a dangerous decision, or by slowing down the person in their decision. For example, the user agent will make it difficult for that person to connect to a site if it can't verify that the site is authentic. It will check that that person really intends to expose a sensitive device to a page. It will prevent that person from consenting to the permanent monitoring of their behaviour. Its user agent duties include ([Taking-Trust-Seriously]):

Duty of Protection: Protection requires user agents to actively protect their user's data, beyond simple security measures. It is insufficient to just encrypt at rest and in transit, but the user agent must also limit retention, help ensure that only strictly necessary data is collected, and require guarantees from any party that the user agent can reasonably be aware that it is shared to.
Duty of Discretion: Discretion requires the user agent to make best efforts to enforce principles by taking care in the ways it discloses the personal data that it manages. Discretion is not confidentiality or secrecy: trust can be preserved even when the user agent shares some personal data, so long as it is done in an appropriately discreet manner.
Duty of Honesty: Honesty requires that the user agent try to give its user information of which the user agent can reasonably be aware, that is relevant to them and that will increase their autonomy, as long as they can understand it and there's an appropriate time. This is almost never when the person is trying to do something else such as read a page or activate a feature. The duty of honesty goes well beyond that of transparency that is often included in older privacy regimes. Unlike transparency, honesty can't hide relevant information in complex legal notices and it can't rely on very short summaries provided in a consent dialog. If the person has provided consent to processing of their personal data, the user agent should inform the person of ongoing processing, with a level of obviousness that is proportional to the reasonably foreseeable impact of the processing.
Duty of Loyalty: Because the user agent is a trustworthy agent, it is held to be loyal to the person using it in all situations, including in preference to the user agent's implementer. When a user agent carries out processing that is not in the person's interest but instead benefits another actor (such as the user agent's implementer) that behaviour is known as self-dealing. Behaviour can be self-dealing even if it is done at the same time as processing that is in the person's interest, what matters is that it potentially conflicts with that person's interest. Self-dealing is always inappropriate. Loyalty is the avoidance of self-dealing.

These duties ensure the user agent will care for its user. In academic research, this relationship with a trustworthy agent is often described as "fiduciary" [Fiduciary-UA]. Some jurisdictions may have a distinct legal meaning for "fiduciary."

Many of the principles described in the rest of this document extend the user agent's duties and make them more precise.

As indicated above, different contexts require different principles. This section describes a set of principles designed to apply to the Web context in general. The Web is a big place, and we fully expect more specific contexts of the Web to add their own principles to further constrain information flows.

To the extent possible, user agents are expected to enforce these principles. However, this is not always possible and additional enforcement mechanisms are needed. One particularly salient issue is that a context is not defined in terms of who owns it (it is not a party). Sharing data between different contexts of a single company is just as much a privacy violation as if the same data were shared between unrelated parties.

A person's identity is the set of characteristics that define them. Their identity in a context is the set of characteristics they present to that context. People frequently present different identities to different contexts, and also frequently share an identity among several contexts.

Cross-context recognition is the act of recognising that an identity in one context is the same person as an identity in another context. Cross-context recognition can at times be appropriate but anyone who does it needs to be careful not to apply the principles of one context in ways that violate the principles around use of information acquired in a different context. (For example, if you meet your therapist at a cocktail party, you expect them to have rather different discussion topics with you than they usually would, and possibly even to pretend they do not know you.) This is particularly true for vulnerable people as recognising them in different contexts may force their vulnerability into the open.

In computer systems and on the Web, an identity seen by a particular website is typically assigned an identifier of some type, which makes it easier for an automated system to store data about that person. Examples of identifiers for a person can be:

their name,
an identification number including those mapping to a device that this person may be using,
their phone number,
their location data,
an online identifier such as email or IP addresses, or
factors specific to their physical, physiological, genetic, mental, economic, cultural, or social identity.

Strings derived from identifiers, for instance through hashing, are still identifiers so long as they may identify a person.

Principle

: A user agent should help its user present the identity they want in each context they find themself in.

Sometimes this means the UA should ensure that one site can't learn anything about their user's behavior on another site, while at other times the UA should help their user prove to one site that they have a particular identity on another site.

To do this, user agents have to make some assumptions about the borders between contexts. By default, user agents define a machine-enforceable context or partition as:

A set of environments (roughly iframes (including cross-site iframes), workers, and top-level pages)
whose top-level origins are in the same site (but see [PSL-Problems])
being visited within the same user agent installation (and browser profile, container, or container tab for user agents that support those features)
between points in time that the person or user agent clears that site's cookies and other storage (which is sometimes automatic at the end of each session).

Even though this is the default, user agents are free to restrict this context as people need. For example, some user agents may help people present different identities to subdivisions of a single site.

Issue 1: Figure out the default privacy boundary for the web

There is disagreement about whether user agents may also widen their machine-enforceable contexts. For example, some user agents might want to help their users present a single identity to multiple sites that the user understands represent a single party, or to a site across multiple installations.

User agents should prevent people from being recognized across machine-enforceable contexts unless they intend to be recognized. This is a "should" rather than a "must" because there are many cases where the user agent isn't powerful enough to prevent recognition. For example if two or more services that a person needs to use insist that they share a difficult-to-forge piece of their identity in order to use the services, it's the services behaving inappropriately rather than the user agent.

If a site includes multiple contexts whose principles indicate that it's inappropriate to share data between the contexts, the fact that those distinct contexts fall inside a single machine-enforceable context doesn't make sharing data or recognizing identities any less inappropriate.

Contributes to surveillance, correlation, and identification.

As described in 2.1 Principles for Identity on the Web, cross-context recognition can sometimes be appropriate, but users need to be able to control when websites do it as much as possible.

Principle

: User agents should ensure that, if a person visits two or more web pages from different partitions, that the pages cannot quickly determine that the visits probably came from the same person, for any significant or involuntary fraction of the people who use the web, unless the person explicitly expresses the same identity to the visits, or preventing this correlation would break a technical feature that is fundamental to the Web.

This principle uses "probably" because websites can do harm even if they can't be completely certain that visits come from the same person.
This principle uses "quickly" because it currently appears impossible to prevent some forms of fingerprinting that take a long time or many visits within each partition.
This principle is limited in cases that only affect a small fraction of people who use the web because people may configure their systems in unique ways, for example by using a browser with a very small number of users. As long as a tracker can't track a significant number of people, it's likely to be unviable to maintain the tracker. However, this doesn't excuse making small groups of people trackable when those people didn't choose to be in the group.
This principle is also limited in cases where preventing recognition would break fundamental aspects of the web. In many cases it's possible to change the design in a way that avoids the violation without breaking valid use cases, but for cases where that's not possible this document delegates to other documents, for example the Target Privacy Threat Model, to discuss what detailed tradeoffs to make.
This principle may not be able to be applied in situations where a person has shared identity information in a medium that is not accessible to the user agent.

Partitions are separated in two ways that lead to distinct kinds of user-visible recognition. When their divisions between different sites are violated, that leads to 2.1.1.2 Unwanted cross-site recognition. When a violation occurs at their other divisions, for example between different browser profiles or at the point someone clears their cookies and site storage, that leads to 2.1.1.1 Same-site recognition.

The web platform offers many ways for a website to recognize that a person is using the same identity over time, including cookies, localStorage, indexedDB, CacheStorage, and other forms of storage. This allows sites to save the person's preferences, shopping carts, etc., and people have come to expect this behavior in some contexts.

A privacy harm occurs if a person reasonably expects that they'll be using a different identity on a site, but the site discovers and uses the fact that the two or more visits probably came from the same person anyway.

User agents can't, in general, determine exactly where intra-site context boundaries are, or how a site allows a person to express that they intend to change identities, so they're not responsible to enforce that sites actually separate identities at those boundaries. The principle here instead requires separation at partition boundaries.

Cross-partition recognition is generally accomplished by either "supercookies" or browser fingerprinting.

Supercookies occur when a browser stores data for a site but makes that data more difficult to clear than other cookies or storage. Fingerprinting Guidance § Clearing all local state discusses how specifications can help browsers avoid this mistake.

Fingerprinting consists of using attributes of the person's browser and platform that are consistent between two or more visits and probably unique to the person.

The attributes can be exposed as information about the person's device that is otherwise benign (as opposed to 2.3 Principles for Sensitive Information). For example:

What are the person's language and time zone?
What size is the window?
What system preferences have been set? Dark mode, serif font, etc...
...

See [fingerprinting-guidance] for how to mitigate this threat.

A privacy harm occurs if a site determines with high probability and uses the fact that a visit to that site comes from the same person as another visit to a different site, unless the person could reasonably expect the sites to discover this. Traditionally, sites have accomplished this using cross-site cookies, but it can also be done by having someone navigate to a link that has been decorated with an identifier, collecting the same piece of identifying information on both sites, or by correlating the timestamps of an event that occurs nearly-simultaneously on both sites.

We define personal data as any information that is directly or indirectly related to an identified or identifiable person, such as by reference to an identifier ([GDPR], [OECD-Guidelines], [Convention-108]).

If a person could reasonably be identified or re-identified when data is combined with other data, then that makes that data personal data.

A service provider or data processor is considered to be the same party as the actor contracting it to perform the relevant processing if it:

is processing the data on behalf of that party;
ensures that the data is only retained, accessed, and used as directed by that party and solely for the list of explicitly-specified purposes detailed by the directing party or data controller;
may determine implementation details of the data processing in question but does not determine the purpose for which the data is being processed nor the overarching means through which the purpose is carried out;
has no independent right to use the data other than in a de-identified form (e.g., for monitoring service integrity, load balancing, capacity planning, or billing); and,
has a contract in place with the party which is consistent with the above limitations.

A data controller is a party that determines the means and purposes of data processing. Any party that is not a service provider is a data controller.

The Vegas Rule is a simple implementation of privacy in which "what happens with the first party stays with the first party." Put differently, the Vegas Rule is followed when the first party is the only data controller. While the Vegas Rule is a good guideline, it's neither necessary nor sufficient for appropriate data processing. A first party that maintains exclusive access to a person's data can still process it inappropriately, and there are cases where a third party can learn information about a person but still treat it appropriately.

Data is de-identified when there exists a high level of confidence that no person described by the data can be identified, directly or indirectly (e.g. via association with an identifier, user agent, or device), by that data alone or in combination with other available information. Note that further considerations relating to groups are covered in the Collective Issues in Privacy section.

We talk of controlled de-identified data when there are strict controls that prevent the re-identification of people described by the data except for a well-defined set of purposes.

Different situations involving controlled de-identified data will require different controls. For instance, if the controlled de-identified data is only being processed by one party, typical controls include making sure that the identifiers used in the data are unique to that dataset, that any person (e.g. an employee of the party) with access to the data is barred (e.g. based on legal terms) from sharing the data further, and that technical measures exist to prevent re-identification or the joining of different data sets involving this data, notably against timing or k-anonymity attacks.

In general, the goal is to ensure that controlled de-identified data is used in a manner that provides a viable degree of oversight and accountability such that technical and procedural means to guarantee the maintenance of pseudonymity are preserved.

This is more difficult when the controlled de-identified data is shared between several parties. In such cases, good examples of typical controls that are representative of best practices would include making sure that:

the identifiers used in the data are under the direct and exclusive control of the first party who is prevented by strict controls from matching the identifiers with the data;
when these identifiers are shared with a third party, they are made unique to that third party such that if they are shared with more than one third party these cannot then match them up with one another;
there is a strong level of confidence that no third party can match the data with any data other than that obtained through interactions with the first party;
any third party receiving such data is barred (eg. based on legal terms) from sharing it further;
technical measures exist to prevent re-identification or the joining of different data sets involving this data, notably against timing or k-anonymity attacks; and
there exist contractual terms between the first party and third party describing the limited purpose for which the data is being shared.

Note that controlled de-identified data, on its own, is not sufficient to render data processing appropriate.

People retain certain rights over data about themselves, and these rights should be facilitated by their user agent and the parties they're interacting with. While data rights alone are not sufficient to satisfy all privacy principles for the Web, they do support self-determination and help improve accountability. Such rights include:

The right to access data about oneself.

This right includes being able to discover data about oneself (implying no databases are kept secret) and to review what information has been collected or inferred.

The right to erase data about oneself.

The right to erase applies whether or not terminating use of a service altogether, though what data can be erased may differ between those two cases. On the Web, people may wish to erase data on their device, on a server, or both, and the distinctions may not always be clear.

The right to port data, including data one has stored with a party, so it can easily be reused or transferred elsewhere.

Portability is needed to realize the ability for people to make choices about services with different data practices. Standards for interoperability are essential for effective re-use.

The right to correct data about oneself, to ensure that one's identity is properly reflected in a system.
The right to be free from automated decision-making based on data about oneself.

For some kinds of decision-making with substantial consequences, there is a privacy interest in being able to exclude oneself from automated profiling. For example, some services may alter the price of products (price discrimination) or offers for credit or insurance based on data collected about a person. Those alterations may be consequential (financially, say) and objectionable to people who believe those decisions based on data about them are inaccurate or unjust. As another example, some services may draw inferences about a user's identity, humanity or presence based on facial recognition algorithms run on camera data. Because facial recognition algorithms and training sets are fallible and may exhibit certain biases, people may not wish to submit to decisions based on that kind of automated recognition.

The right to object, withdraw consent, and restrict use of data about oneself.

People may change their decisions about consent or may object to subsequent uses of data about themselves. Retaining rights requires ongoing control, not just at the time of collection.

The OECD Privacy Principles [OECD-Guidelines], [Records-Computers-Rights], and the [GDPR], among other places, include many of the rights people have as data subjects. These participatory rights by people over data about themselves are inherent to autonomy.

Contributes to correlation, identification, secondary use, and disclosure.

Many pieces of information about someone could cause privacy harms if disclosed. For example:

Their location.
Video or audio from the their camera or microphone.
The content of certain files on their filesystem.
Financial data.
Contacts.
Calendar entries.
Whether they are using assistive technology.
...

A particular piece of information may have different sensitivity for different people. Language preferences, for example, might typically seem innocent, but also can be an indicator of belonging to an ethnic minority. Precise location information can be extremely sensitive (because it's identifying, because it allows for in-person intrusions, because it can reveal detailed information about a person's life) but it might also be public and not sensitive at all, or it might be low-enough granularity that it is much less sensitive for many people.

When considering whether a class of information is likely to be sensitive to a person, consider at least these factors:

whether it serves as a persistent identifier (see severity in Mitigating browser fingerprinting);
whether it discloses substantial (including intimate details or inferences) information about the the person using the system or other people;
whether it can be revoked (as in determining whether a permission is necessary);
whether it enables other threats, like intrusion.

Issue(16): This description of what makes information sensitive still needs to be refined.

Contributes to surveillance, correlation, identification, and singling-out / discrimination.

Unexpected profiling occurs when a site is able to learn attributes or characteristics about a person, that a) the site visitor did not intend the site to learn, and b) the site visitor reasonably could not anticipate a site would be able to learn.

Profiling contributes to, but is distinct from, other privacy risks discussed in this document. For example, unexpected profiling may contribute to 2.1.1.1 Same-site recognition, by adding stable and semi-identifying information that can contribute to browser fingerprinting. Unexpected profiling is distinct from same-site recognition though, in that a person may wish to not share some kinds of information about themselves even in the presence of guarantees that such information will not lead to them being re-identified.

Similarly, unexpected profiling is related to 2.3 Principles for Sensitive Information, but the former is a superset of the latter; all cases of unexpected sensitive information disclosure are examples of unexpected profiling, but people using the Web may have attributes or characteristics about themselves that are not universally thought of as "sensitive", but which they never the less do not wish to share with the sites they visit. People may wish to not share these "non-sensitive" characteristics for a variety of reasons (e.g., a person may worry that their ideas of what counts as "sensitive" is different from others, a person might might be ashamed or uncomfortable about a character trait or they might simply not wish to be profiled).

Profiling occurs for many reasons. It can be used to facilitate price discrimination or offer manipulation, to make inferences about what products or services people might be more likely to purchase, or more generally, for a site to learn attributes about them that they do not intend to share. Unexpected profiling can also contribute to feelings of powerlessness and loss of agency [Privacy-Concerned].

A privacy harm occurs if a site learns information about a person that they reasonably expected the site would not be able to learn, regardless of whether that information aids (re)identification or is from a sensitive category of information (however defined).

Peter is a furry. Despite knowing that there are thousands of other furries on the internet, and despite using a browser with robust browser fingerprinting protections, and despite the growing cultural acceptance of furries, Peter does not want (most) sites to learn or personalize content around his furry-interest.

Principle

: Groups and various forms of institutions should best protect and support autonomy by making decisions collectively rather than individually to either prevent or enable data sharing, and to set defaults for data processing rules.

Privacy principles are often defined in terms of extending rights to individuals. However, there are cases in which deciding which principles apply is best done collectively, on behalf of a group.

One such case, which has become increasingly common with widespread profiling, is that of information relating to membership of a group or to a group's behaviour, as detailed in 1.2.1 Group Privacy. As Brent Mittelstadt explains, “Algorithmically grouped individuals have a collective interest in the creation of information about the group, and actions taken on its behalf.” ([Individual-Group-Privacy]) This justifies ensuring that grouped people can benefit from both individual and collective means to support their autonomy with respect to data processing. It should be noted that processing can be unjust even if individuals remain anonymous, not from the violation of individual autonomy but because it violates ideals of social equality ([Relational-Governance]).

Another case in which collective decision-making is preferable is for processing for which informed individual decision-making is unrealistic (due to the complexity of the processing, the volume or frequency of processing, or both). Expecting laypeople (or even experts) to make informed decisions relating to complex data processing or to make decisions on a very frequent basis even if the processing is relatively simple, is unrealistic if we also want them to have reasonable levels of autonomy in making these decisions.

The purpose of this principle is to require that data governance provide ways to distinguish appropriate data processing without relying on individual decisions whenever the latter are impossible, which is often ([Relational-Governance], [Relational-Turn]).

Which forms of collective governance are recognised as legitimate will depend on domains. These may take many forms, such as governmental bodies at various administrative levels, standards organisations, worker bargaining units, or civil society fora.

It must be noted that, even though collective decision-making can be better than offloading privacy labour to individuals, it is not necessarily a panacea. When considering such collective arrangements it is important to keep in mind the principles that are likely to support viable and effective institutions at any level of complexity ([IAD]).

A good example of a failure in collective privacy decisions was the standardisation of the ping attribute. Search engines, social sites, and other algorithmic media in the same vein have an interest in knowing which sites that they link to people choose to visit (which in turn could improve the service for everyone). But people may have an interest in keeping that information private from algorithmic media companies (as do the sites being linked to, as that facilitates timing attacks to recognise people there). A person's exit through a specific link can either be tracked with JavaScript tricks or through bounce tracking, both of which are slow and difficult for user agents to defend against. The value proposition of the ping attribute in this context is therefore straightforward: by providing declarative support for this functionality it can be made fast (the browser sends an asynchronous notification to a ping endpoint after the person exits through a link) and the user agent can provide its user with the option to opt out of such tracking — or disable it by default.

Unfortunately, this arrangement proved to be unworkable on the privacy side (the performance gains, however, are real). What prevents a site from using ping for people who have it activated and bounce tracking for others? What prevents a browsers from opting everyone out because it wishes to offer better protection by default? Given the contested nature of the ping attribute and the absence of a forcing function to support collective enforcement, the scheme failed to deliver improved privacy.

Most of our thinking about stakeholders on the web follows the priority of constituencies: we consider users, authors, implementors, and specifiers. Sometimes we also consider society or national governments. In w3ctag/design-reviews#606 and other enterprise-focused designs we need to address device owners or administrators, and I haven't been able to find any W3C discussion of how to consider them.

This is a broader question than just the privacy principles we're working on here, but some of the answer belongs here. We should say something about the power dynamics of employees, children, romantic partners, etc. who have to use devices owned by other people who might not always have their best interests in mind, and how the UA is responsible to protect the user while also respecting whatever rights the device owner ought to have.

Receiving unsolicited information that either may cause distress or waste the recipient's time or resources is a violation of privacy.

Principle

: User agents and other actors should take steps to ensure that their user is not exposed to unwanted information. Technical standards must consider the delivery of unwanted information as part of their architecture and must mitigate it accordingly.

Unwanted information covers a broad range of unsolicited communication, from messages that are typically harmless individually but that become a nuisance in aggregate (spam) to the sending of images that will cause shock or disgust due to their graphic, violent, or explicit nature (eg. pictures of one's genitals). While it is impossible, in a communication system involving many people, to offer perfect protection against all kinds of unwanted information, steps can be taken to make the sending of such messages more difficult or more costly, and to render the senders more accountable. Examples of mitigations include:

Restricting what new users of a service can post, notably limiting links and media until they have interacted a sufficient number of times over a given period with a larger group. This helps to raise the cost of producing sockpuppet accounts and gives new users the occasion to understand local norms before posting.
Only accepting communication between people who have an established relationship of some kind, such as being part of a shared group. Protocols should consider requiring a handshake between people prior to enabling communication.
Requiring a deliberate action from the recipient before rendering media coming from an untrusted source.
Supporting the ability for people to block a party such that they cannot send information again.
Pooling mitigation information, for instance shared block lists, shared spam-detection information, or public information about misbehaving actors. As always, the collection and sharing of information for safety purposes should be limited and placed under collective governance.

A person (also user or data subject) is any natural person. Throughout this document, we primarily use person or people to refer to human beings, as a reminder of their humanity. When we use the term user, it is to talk about the specific person who happens to be using a given system at that time.

A vulnerable person is a person who may be unable to exercise sufficient self-determination in a context. Amongst other things, they should be treated with greater default privacy protections and may be considered unable to consent to various interactions with a system. People can be vulnerable for different reasons, for example because they are children, are employees with respect to their employers, are facing a steep asymmetry of power, are people in some situations of intellectual or psychological impairment, are refugees, etc.

A context is a physical or digital environment that a person interacts with for a set of purposes (that they typically share with other people who interact with the same environment).

A party is an entity that a person can reasonably understand as a single "thing" they're interacting with. Uses of this document in a particular domain are expected to describe how the core concepts of that domain combine into a user-comprehensible party, and those refined definitions are likely to differ between domains.

The first party is a party with which a person intends to interact. Merely hovering over, muting, pausing, or closing a given piece of content does not mean a person intends to interact with another party, nor does the simple fact of loading a party embedded in the one with which the person intends to interact. In cases of clear and conspicuous joint branding, there can be multiple first parties. The first party is necessarily a data controller of the data processing that takes places as a consequence of a person interacting with it.

A third party is any party other than the person visiting the website, the first party, or a service provider acting on behalf of either the person or the first party.

Privacy is achieved in a given context that either involves personal data or involves information being presented to people when the principles of that context are followed appropriately. When the principles for that context are not followed, there is a privacy violation. Similarly, we say that a particular interaction is appropriate when the principles are adhered to) or inappropriate otherwise.

A party processes data if it carries out operations on personal data, whether or not by automated means, such as collection, recording, organisation, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, , dissemination or otherwise making available, selling, alignment or combination, restriction, erasure or destruction.

A party data if it provides it to any other party. Note that, under this definition, a party that provides data to its own service providers is not it.

A party sells data when it it in exchange for consideration, monetary or otherwise.

The purpose of a given processing of data is an anticipated, intended, or planned outcome of this processing which is achieved or aimed for within a given context. A purpose, when described, should be specific enough to be actionable by someone familiar with the relevant context (ie. they could independently determine means that reasonably correspond to an implementation of the purpose).

The means are the general method of data processing through which a given purpose is implemented, in a given context, considered at a relatively abstract level and not necessarily all the way down to implementation details. Example: a person will have their preferences restored (purpose) by looking up their identifier in a preferences store (means).

User agents should attempt to defend the people using them from a variety of high-level threats or attacker goals, described in this section.

These threats are an extension of the ones discussed by [RFC6973].

Surveillance: Surveillance is the observation or monitoring of an individual’s communications or activities. See RFC6973§5.1.1.
Data Compromise: End systems that do not take adequate measures to secure data from unauthorized or inappropriate access. See RFC6973§5.1.2.
Intrusion: Intrusion consists of invasive acts that disturb or interrupt one’s life or activities. See RFC6973§5.1.3.
Misattribution: Misattribution occurs when data or communications related to one individual are attributed to another. See RFC6973§5.1.4.
Correlation: Correlation is the combination of various pieces of information related to an individual or that obtain that characteristic when combined. See RFC6973§5.2.1.
Profiling: The inference, evaluation, or prediction of an individual's attributes, interests, or behaviours.
Identification: Identification is the linking of information to a particular individual, even if the information isn't linked to that individual's real-world identity (e.g. their legal name, address, government ID number, etc.). Identifying someone allows a system to treat them differently from others, which can be inappropriate depending on the context. See RFC6973§5.2.2.
Secondary Use: Secondary use is the use of collected information about an individual without the individual’s consent for a purpose different from that for which the information was collected. See RFC6973§5.2.3.
Disclosure: Disclosure is the revelation of information about an individual that affects the way others judge the individual. See RFC6973§5.2.4.
Exclusion: Exclusion is the failure to allow individuals to know about the data that others have about them and to participate in its handling and use. See RFC6973§5.2.5.

These threats combine into the particular concrete threats we want web specifications to defend against, described in the sections that follow.

Principle: A user agent should help its user present the identity they want in each context they find themself in.
Principle: User agents should ensure that, if a person visits two or more web pages from different partitions, that the pages cannot quickly determine that the visits probably came from the same person, for any significant or involuntary fraction of the people who use the web, unless the person explicitly expresses the same identity to the visits, or preventing this correlation would break a technical feature that is fundamental to the Web.
Principle: Groups and various forms of institutions should best protect and support autonomy by making decisions collectively rather than individually to either prevent or enable data sharing, and to set defaults for data processing rules.
Principle: User agents and other actors should take steps to ensure that their user is not exposed to unwanted information. Technical standards must consider the delivery of unwanted information as part of their architecture and must mitigate it accordingly.
Principle: When any actor obtains consent for processing from a person, the actor should design the consent request so as to learn the person's true intent to consent or not, and not to maximize the processing consented to.
Principle: An actor should avoid interrupting a person's use of a site for consent requests when an alternative is available.
Principle: It should be as easy for a user to check what consent they have given, to withdraw consent, or to opt out or object, as to give consent.

Some of the definitions in this document build on top of the work in Tracking Preference Expression (DNT).

The following people, in alphabetical order of their first name, were instrumental in producing this document: Amy Guy, Christine Runnegar, Dan Appelquist, Don Marti, Jonathan Kingston, Nick Doty, Peter Snyder, Sam Weiler, Tess O'Connor, and Wendy Seltzer.

Issue
Issue
Issue
Issue 1: Figure out the default privacy boundary for the web
Issue
Issue
Issue
Issue
Issue 73: How do safety systems relate to privacy
Issue 2: Address device owners
Issue
Issue
Issue
Issue

[Beyond-Individual]: Privacy Beyond the Individual Level (in Modern Socio-Technical Perspectives on Privacy). J.J. Suh; M.J. Metzger. Springer. URL: https://doi.org/10.1007/978-3-030-82786-1_6
[CAT]: Content Aggregation Technology (CAT). Robin Berjon; Justin Heideman. URL: https://nytimes.github.io/std-cat/
[Consent-Lackeys]: Publishers tell Google: We're not your consent lackeys. Rebecca Hill. The Register. URL: https://www.theregister.com/2018/05/01/publishers_slam_google_ad_policy_gdpr_consent/
[Contextual-Integrity]: Privacy As Contextual Integrity. Helen Nissenbaum. Washington Law Review. URL: https://digitalcommons.law.uw.edu/wlr/vol79/iss1/10/
[Convention-108]: Convention for the Protection of Individuals with regard to Automatic Processing of Personal Data. Council of Europe. URL: https://rm.coe.int/1680078b37
[Dark-Pattern-Dark]: What Makes a Dark Pattern… Dark? Design Attributes, Normative Considerations, and Measurement Methods. Arunesh Mathur; Jonathan Mayer; Mihir Kshirsagar. URL: https://arxiv.org/abs/2101.04843v1
[Dark-Patterns]: Dark patterns: past, present, and future. Arvind Narayanan; Arunesh Mathur; Marshini Chetty; Mihir Kshirsagar. ACM. URL: https://dl.acm.org/doi/10.1145/3397884
[design-principles]: Web Platform Design Principles. Sangwhan Moon. W3C. 16 December 2021. W3C Working Group Note. URL: https://www.w3.org/TR/design-principles/
[Digital-Market-Manipulation]: Digital Market Manipulation. Ryan Calo. George Washington Law Review. URL: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2309703
[ETHICAL-WEB]: W3C TAG Ethical Web Principles. Daniel Appelquist; Hadley Beeman. W3C. 27 October 2020. TAG Finding. URL: https://www.w3.org/2001/tag/doc/ethical-web-principles/
[Fiduciary-UA]: The Fiduciary Duties of User Agents. Robin Berjon. URL: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3827421
[fingerprinting-guidance]: Mitigating Browser Fingerprinting in Web Specifications. Nick Doty. W3C. 28 March 2019. W3C Working Group Note. URL: https://www.w3.org/TR/fingerprinting-guidance/
[For-Everyone]: This Is For Everyone. Tim Berners-Lee. Statement made to the London 2012 Olympics opening ceremony. URL: https://twitter.com/timberners_lee/status/228960085672599552; General Data Protection Regulations (GDPR) / Regulation (EU) 2016/679. European Parliament and Council of European Union. URL: https://eur-lex.europa.eu/legal-content/EN/TXT/HTML/?uri=CELEX:32016R0679&from=EN
[GKC-Privacy]: Governing Privacy in Knowledge Commons. Madelyn Rose Sanfilippo; Brett M. Frischmann; Katherine J. Strandburg. Cambridge University Press. URL: https://www.cambridge.org/core/books/governing-privacy-in-knowledge-commons/FA569455669E2CECA25DF0244C62C1A1
[GPC]: Global Privacy Control (GPC). Robin Berjon; Sebastian Zimmeck; Ashkan Soltani; David Harbage; Peter Snyder. W3C. URL: https://globalprivacycontrol.github.io/gpc-spec/
[html]: HTML Standard. Anne van Kesteren; Domenic Denicola; Ian Hickson; Philip Jägenstedt; Simon Pieters. WHATWG. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[IAD]: Understanding Institutional Diversity. Elinor Ostrom. Princeton University Press. URL: https://press.princeton.edu/books/paperback/9780691122380/understanding-institutional-diversity
[indexeddb]: Indexed Database API. Nikunj Mehta; Jonas Sicking; Eliot Graff; Andrei Popescu; Jeremy Orlow; Joshua Bell. W3C. 8 January 2015. W3C Recommendation. URL: https://www.w3.org/TR/IndexedDB/
[Individual-Group-Privacy]: From Individual to Group Privacy in Big Data Analytics. Brent Mittelstadt. Philosophy & Technology. URL: https://link.springer.com/article/10.1007/s13347-017-0253-7
[OECD-Guidelines]: OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal Data. OECD. URL: https://www.oecd.org/sti/ieconomy/oecdguidelinesontheprotectionofprivacyandtransborderflowsofpersonaldata.htm
[Privacy-Behavior]: Privacy and Human Behavior in the Age of Information. Alessandro Acquisti; Laura Brandimarte; George Loewenstein. Science. URL: https://www.heinz.cmu.edu/~acquisti/papers/AcquistiBrandimarteLoewenstein-S-2015.pdf
[Privacy-Concerned]: Americans and Privacy: Concerned, Confused and Feeling Lack of Control Over Their Personal Information. Brooke Auxier; Lee Rainie; Monica Anderson; Andrew Perrin; Madhu Kumar; Erica Turner. Pew Research Center. URL: https://www.pewresearch.org/internet/2019/11/15/americans-and-privacy-concerned-confused-and-feeling-lack-of-control-over-their-personal-information/
[Privacy-Contested]: Privacy is an essentially contested concept: a multi-dimensional analytic for mapping privacy. Deirdre K. Mulligan; Colin Koopman; Nick Doty. Philosophical Transacions A. URL: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5124066/
[Privacy-Threat]: Target Privacy Threat Model. Jeffrey Yasskin; Tom Lowenthal. W3C PING. URL: https://w3cping.github.io/privacy-threat-model/
[PSL-Problems]: Public Suffix List Problems. Ryan Sleevi. URL: https://github.com/sleevi/psl-problems
[Records-Computers-Rights]: Records, Computers and the Rights of Citizens. U.S. Department of Health, Education & Welfare. URL: https://archive.epic.org/privacy/hew1973report/
[Relational-Governance]: A Relational Theory of Data Governance. Salomé Viljoen. Yale Law Journal. URL: https://www.yalelawjournal.org/feature/a-relational-theory-of-data-governance
[Relational-Turn]: A Relational Turn for Data Protection?. Neil Richards; Woodrow Hartzog. URL: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3745973&s=09
[RFC6265]: HTTP State Management Mechanism. A. Barth. IETF. April 2011. Proposed Standard. URL: https://httpwg.org/specs/rfc6265.html
[RFC6973]: Privacy Considerations for Internet Protocols. A. Cooper; H. Tschofenig; B. Aboba; J. Peterson; J. Morris; M. Hansen; R. Smith. IETF. July 2013. Informational. URL: https://www.rfc-editor.org/rfc/rfc6973
[RFC8890]: The Internet is for End Users. M. Nottingham. IETF. August 2020. Informational. URL: https://www.rfc-editor.org/rfc/rfc8890
[Seeing-Like-A-State]: Seeing Like a State: How Certain Schemes to Improve the Human Condition Have Failed. James C. Scott. URL: https://bookshop.org/books/seeing-like-a-state-how-certain-schemes-to-improve-the-human-condition-have-failed/9780300246759
[service-workers]: Service Workers 1. Alex Russell; Jungkee Song; Jake Archibald; Marijn Kruisselbrink. W3C. 19 November 2019. W3C Candidate Recommendation. URL: https://www.w3.org/TR/service-workers-1/
[Strava-Debacle]: The Latest Data Privacy Debacle. Zeynep Tufekci. The New York Times. URL: https://www.nytimes.com/2018/01/30/opinion/strava-privacy.html
[Strava-Reveal-Military]: Strava Fitness App Can Reveal Military Sites, Analysts Say. Richard Pérez-Peña; Matthew Rosenberg. The New York Times. URL: https://www.nytimes.com/2018/01/29/world/middleeast/strava-heat-map.html
[Taking-Trust-Seriously]: Taking Trust Seriously in Privacy Law. Neil Richards; Woodrow Hartzog. URL: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2655719
[tracking-dnt]: Tracking Preference Expression (DNT). Roy Fielding; David Singer. W3C. 17 January 2019. W3C Working Group Note. URL: https://www.w3.org/TR/tracking-dnt/
[Understanding-Privacy]: Understanding Privacy. Daniel Solove. Harvard University Press. URL: https://www.hup.harvard.edu/catalog.php?isbn=9780674035072
[Why-Privacy]: Why Privacy Matter. Neil Richards. Oxford University Press. URL: https://global.oup.com/academic/product/why-privacy-matters-9780190939045?cc=us&lang=en&