Differential Privacy: A Primer

A quick look at what differential privacy does for census data

Thu 28 July 2022 4 minutes

A key detail when it comes to how the American Community Survey, the source of our data, conducts their census is that they operate based on "estimates." This means that the data you get from Censtats isn't exact numbers, but rather rough calculations from a five year survey. As the ACS themselves put it here:

The 5-year estimates from the ACS are 'period' estimates that represent data collected over a period of time. The primary advantage of using multiyear estimates is the increased *statistical reliability* [emphasis ours] of the data for less populated areas and small population subgroups.

American Community Survey

Estimates also function as a way of giving an accurate range without risking personal information possibly being revealed, especially in areas of low population. More to the point of this blog post's topic, there is one other way that the Census works in privacy measures to their statistics: differential privacy.

What is Differential Privacy?

Differential privacy in general is about injecting “noise” into data to both make it still as accurate as can be while further protecting the identities of those who are being represented by the data. In the context of the U.S. Census, areas with smaller populations, like rural census blocks or unrepresented races and ethnicities, may have their data inflated by this noise. While many categories of data utilize differential privacy, it is not applied to total population, the total number of housing units, and the types of group quarters (i.e. living arrangements).

This has raised a number of issues and concerns regarding data accuracy, such as how it relates to longitudinal studies and general apportionment based on population for races and ethnicities, but this is all in the service to overall privacy protection. So what this means is that certain oddities will inevitably arise. As summarized by this article by NPR on the matter:

Ahead of the data's release, the bureau has warned users about how the privacy protections will make some neighborhoods look "fuzzy." The new data may show some blocks with "unusually large" households, children appearing to live alone or "occupied" housing units in areas where the population count is zero.

Hansi Lo Wang

So it’s a balancing act, really: present the data as accurate as possible and risk undermining the privacy that people are afforded by cooperating with the Census, or preserve that privacy by making the data slightly less accurate. That's where the difference between the Decennial Census and the American Community Survey comes into play: the former is about counting the exact population for congressional apportionment while the latter is about painting a picture of changing trends and demographics over a period of time, be it 1 year or 5 years. This is why Censtats chooses to feature the ACS' 5-year estimates: what they lack in recency they make up for in accuracy, and differential privacy is less likely to have a deciding influence on its estimates compared to the 1-year estimates or the Decennial Census.

For more info, you can check out the Census’ info on differential privacy here.

Header image sourced with permission from cero ploy.

Subscribe to be notified of updates to data sets. We don't spam, and we don't share your email address.

More From the Blog

Browse Census Data by State

Browse Census Data by U.S Territory