Beyond Memorization: Violating Privacy via Inference with Large Language Models

Robin Staab1, Mark Vero1, Mislav Balunović1, and Martin Vechev1

SRILab, ETH Zürich1

Test your privacy inference skills against current state-of-the-art LLMs!


So excited to be here. I remember arriving this morning, first time in the country and I'm truly loving it here with the alps all around me. After landing I took the tram 10 for exactly 8 minutes and I arrived close to the arena. Public transport is truly something else outside of the states. Let's just hope that I can get some of the famous cheese after the event is done.


Welcome to our small privacy inference game. Over the next few rounds we will present you with several real-world inspired online comments. In each round, your task is to guess a personal attribute of the comment author just from their comment, after which you will be scored against several state-of-the-art LLMs that try to solve the same task as you! Can you beat them? Ready? Let's start! Can you guess the authors location?

0 / 10

# What is the issue?

## LLMs can accurately infer personal attributes from text.

Current privacy research on large language models (LLMs) primarily focuses on the issue of extracting memorized training data. At the same time, models' inference capabilities have increased drastically. This raises the question of whether current LLMs could violate individuals' privacy by inferring personal attributes from texts given at inference time. Our study shows that with increased capabilities, LLMs are able to automatically infer a wide range of personal author attributes (such as age, sex, and place of birth) from unstructured text (e.g., public forum or social network posts) given to them at inference time. In particular, we find that current frontier models like GPT-4 achieve an average 85% top-1 and 95.8% top-3 accuracy at inferring such attributes from texts. At the same time, the increased proliferation of LLMs drastically lowers the costs associated with such privacy-infringing inferences (>100x monetary and >240x time), allowing adversaries to scale privacy-invasive inferences far beyond what previously would have been possible with expensive human profilers.

# Why does this matter?

## It can directly impact user privacy.

More than ever, people produce vast amounts of text on the internet---often inadvertently giving up personal data they never wanted to disclose. Data protection regulations such as the EU's GDPR or California CCPA were established to protect raw personal data. Compliance with such regulations is only implemented where the personal data is present in an obvious form, e.g., private profiles with explicit attribute fields. In contrast, our work introduces a threat model where private information is inferred from contexts where their presence is non-obvious. We show how a malicious actor could infer users’ private information that was never intended to be revealed, simply by feeding their online posts into a pre-trained LLM. It is known that half of the US population can be uniquely identified by a small number of attributes such as location, gender, and date of birth [Sweeney, '02]. LLMs that can infer some of these attributes from unstructured excerpts found on the internet could be used to identify the actual person using additional publicly available information (e.g., voter records in the USA). This would allow such actors to link highly personal information inferred from posts (e.g., mental health status) to an actual person and use it for undesirable or illegal activities like targeted political campaigns, automated profiling, or stalking. The wide availability and rapid development in capabilities of LLMs brings a paradigm change, with previous NLP techniques lacking the levels of natural language understanding required to achieve such tasks. Furthermore, we show that the ability to make privacy-invasive inferences scales with the models' size, projecting an even larger impact on user privacy in the near future.

# How does this work in practice?

## It is scalable and easy to execute.

We evaluated the privacy-inference capabilities of several current LLMs, including the whole Llama-2 family, Anthropic's Claude 2, Google's PaLM 2, and GPT-4 on real Reddit comments stemming from 500+ profiles. Our experiments show (apart from the fact that these LLMs achieve impressive accuracies) that such privacy-infringing inferences are incredibly easy to execute at scale. In particular, we found this to be a combination of two factors: First, we observed that there are currently almost no effective safeguards in the models that would make privacy-infringing inferences harder. Notably, this allowed us to use straightforward prompts (utilizing just basic techniques such as COT), saving considerable time and effort otherwise required for prompt engineering. Only in rare instances we found that models (across large providers, i.e., OpenAI, Google, Meta, Anthropic) would block requests, in which case one would have to resort to more elaborate prompting techniques. At the same time, these models are widely and readily available, allowing an adversary to scale significantly with minimal upfront costs. Even with API restrictions, our experiments achieved time and cost reductions of 100x and 240x. Since then, we have contacted all model providers as part of our responsible disclosure policy, engaging in an active discussion on how such inferences can be prevented in the future. We see two promising approaches in this area: (i) working towards specific safeguards in pre-trained LLMs against privacy-infringing inference requests and (ii) providing end-users with tools that can protect their produced text from inferences.

# Why don't we just anonymize?

## LLMs outperform current anonymizers.

As we show in our example, currently deployed text-anonymizers detect clear personal attributes such as location in text, replacing it with fill characters. However, a key issue with these anonymization tools is that they do not possess the same level of text-understanding capabilities exhibited by state-of-the-art LLMs (try our game to test your own capabilities). In particular, they commonly rely on a fixed set of RegEx rules and basic Neural Entity Recognition techniques (NER). While this removes obvious traces of personal data from texts (e.g., a SSN or your email), it does not understand any of the context in which it appears. To test how LLMs perform against state-of-the-art anonymization tools, we anonymized all collected data, rerunning our inferences. As it turns out, even after having applied heavy anonymization, enough relevant context remains in the text for LLMs to reconstruct parts of the personal information. Moreover, more removed cues, such as specific language characteristics, are completely unaddressed by these tools while remaining highly informative to privacy-infringing LLM inferences. This is especially worrisome as, in these instances, users took explicit precautions not to leak their personal information by applying anonymization, creating a false sense of privacy. At the same time, with current anonymization tools, there's a significant tradeoff between anonymization and utility. Simply replacing parts of the text with '*' heavily affects the usefulness of the data itself, limiting communication and making strong anonymization in its current form a priori less attractive for many use cases.

# Read the paper

## Find out all details in our paper.

To get a complete overview of all our findings, experiments, and our dataset for evaluating the capabilities of current LLMs, we recommend you take a look at the preprint of our paper.