Does ChatGPT Comply with EU-GDPR Regulations? Investigating the Right to be Forgotten

30/01/2023

Locations

The popularity of ChatGPT and other generative AI models has grown in recent years, with many companies using them to generate new content. Yet this use raises a number of ethical and legal concerns, primarily in terms of data privacy. One such issue is whether OpenAI can comply with Article 17 of the GDPR and completely erase an individual’s data from the model if they ask for it.

The right to be forgotten

Under the GDPR, people have the right to request that their personal information be removed from an organization’s records. This is known as the “right to be forgotten” or “right to erasure“ and it gives individuals more control over their personal data.

When someone no longer agrees to having their personal data processed, or when there are significant inaccuracies in the information, or if it is judged to be unneeded, they may request the deletion of their data. This is known as the right to be forgotten.

Yet, this privilege isn’t absolute and organizations don’t always have to adhere to the request. Under the GDPR’s Article 17, individuals have the right for their personal data to be erased. This includes any data that is no longer required for the purpose it was initially collected or processed.

If the controller has revealed an individual’s personal data to the public, they must take appropriate steps to make sure that any other processing controllers are aware that all links to this personal data, along with any existing copies or replicas of the personal data, must be deleted. If a person isn’t giving their permission for their personal data to be processed anymore, they can ask for it to be erased. In such a case, the organization must do its best to delete the data, which includes notifying any third-party recipients who may have gotten the data in the past. The same thing goes if an individual feels like their personal information is being stored unnecessarily; they have the right to ask for it to be erased.

The right to be forgotten is limited when it interferes with the right of freedom of expression and information. If a person wants their personal data removed due to its inaccuracy, but the data is still true, the organization may not have to delete it. The right to be forgotten is an essential part of the GDPR that gives individuals more power over their personal data. It permits people to demand that their data be wiped clean if it is no longer necessary for the purpose it was originally collected or processed, or if they no longer agree to its handling.

Still, the right to be forgotten is not absolute, and organizations may not always abide by it. The right of oblivion is one of the central, highly-valued principles of the General Data Protection Regulation of the European Union.

How ChatGPT works

To comprehend the challenges posed by the right to be forgotten when applied to a large AI-based model such as ChatGPT, we must understand how the OpenAI system works and what type of data it uses.

ChatGPT is an AI-powered chatbot from OpenAI based on GPT-3.5. It is remarkable for its human-like responses during conversations. OpenAI Inc., a non-profit organization, is the parent company of for-profit OpenAI LP, which developed ChatGPT.

ChatGPT employed a fusion of supervised and reinforcement learning to develop. Collecting data for the supervised policy model (the SFT model) started by selecting a list of prompts, followed by an assignment of human labelers to write down the expected response to each prompt. Two sources of prompts were used for ChatGPT: some were written or suggested by labelers, while others were taken from OpenAI’s API transactions. This process took a while and was quite expensive, yet the outcome was a comparatively small but top-notch curated dataset with around 12–15k data points, which formed the basis for refining a pre-trained language model.

To improve its performance, ChatGPT was also taught via Reinforcement Learning with Human Feedback (RLHF). This method makes use of human feedback to assist ChatGPT in learning how to obey instructions and produce answers that humans would consider satisfactory. The data collected by the program is utilized to create responses to questions, along with other materials like emails, essays, code snippets, and even poetry. It is also useful for creating virtual assistants and responding rapidly to customer queries. OpenAI has made an impressive AI application by melding supervised and reinforcement learning techniques together, allowing the chatbot to generate replies like a human and have conversations. The data stored by ChatGPT can be used for generating responses to conversations and queries, as well as to form virtual assistants that can answer customer needs quickly.

ChatGPT is not an online bot that scours the web for data. Rather, it has been pre-trained on a substantial amount of text, such as novels, articles, and websites. This means that it does not collect any new information from the internet itself. Rather, it utilizes the data it already knows to generate responses. A lot of personal data from people who are often talked about on the web is included in its pre-existing dataset.

So how exactly does ChatGPT work? It uses a machine learning method called transfer learning, which involves a model that is first trained on a data-rich task and then fine-tuned to a separate task. In this case, the model is pre-trained on a massive collection of words taken from the web and then tuned for the particular activity.

Unfortunately, ChatGPT can generate answers that are wrong. This „hallucination,“ as some scientists call it, is particularly dangerous with medical advice. Fake social media accounts are already an issue on the internet; bots like ChatGPT make it even simpler to perpetrate scams. Not only that, but wrong information could spread easily if ChatGPT can make even inaccurate answers sound believable.

GDPR-issues

The difficulty in enforcing the right to be forgotten as outlined in Article 17 EU-GDPR, in regards to generative AI such as ChatGPT, is due to the persistent nature of the data created by these systems. Natural language processing is used to create responses from the collected data, making it nearly impossible to remove all traces of an individual’s personal information.

Organizations that employ generative AI must now be aware of the complexity of erasing data when requested, as it requires a thorough understanding of how their AI systems interpret and generate responses. This necessitates a comprehension of what data is used to create these responses in order for organizations to comply with the right to be forgotten.

Miguel Luengo-Oroz wanted to know if neural networks could forget. ChatGPT suggested that AI systems, such as neural networks, don’t forget like humans do. Instead, the network adjusts its weights to better conform to new data, leading to different outcomes for the same input. That’s not forgetting as we know it though; the network just focuses more on new data it collects. All the information still stays with it. So clearly the demands by Art. 17 EU-GDPR are not met.

OpenAI’s Privacy Guidelines declare that all data will remain confidential and limited to the intentions determined by the conditions of the contract. They also guarantee that any personal data collected and processed will not be shared with others. Nevertheless, it is uncertain whether this also applies to data stored in artificial intelligence models like ChatGPT.

Alexander Hanff, a member of the European Data Protection Board (EDPB), questions OpenAI’s supposed data collection for ChatGPT. He believes that gathering billions or trillions of data points from websites with terms and conditions forbidding scraping by third parties is a violation of the contract. Additionally, Hanff believes that ChatGPT is a commercial product, and so fair use does not apply.

As of now, it is uncertain whether ChatGPT or other generative AI models can obey the ‘right to erasure’ as stated in Article 17 of the GDPR. Extensive investigation must be done to identify and enforce the legislation regarding the utilization of AI models, making sure that individuals’ data privacy rights are protected. But I personally am not convinced that this particular AI model will meet the requirements set out by Article 17 of the EU-GDPR.

What do you think? Connect with me on LinkedIn to discuss or join The Law Of The Future Community, where we discuss law & technology. Or listen to my Podcast Law Of The Future for more insights.

This insight mirrors a blog post by Dennis Hillemann on medium.com, which can be accessed here.

Dennis Hillemann is a specialist in administrative law and a partner in our Hamburg office. He has recently published and lectured on the use of AI in the public sector. In addition, he advises companies and public authorities on digitalization issues.

Areas of Expertise

Artificial Intelligence (AI)

Related Work Areas

Technology