How to Mitigate LLM Privacy Risks in Fine-Tuning and RAG with Protected Data

Large Language Models Privacy: How to Safeguard it Without Sacrificing Performance

Language Learning Models (LLMs) open up a world of opportunities for innovation, yet they also pose significant data security and privacy risks when processing personal or proprietary data.
If you're wondering, "How can I leverage data to its fullest to craft impactful AI systems without crossing data privacy lines?" then you're asking the right question. And for good reason – Gartner anticipated that “By 2027, at least one global company will see its AI deployment banned by a regulator for noncompliance with data protection or AI governance legislation”1.
As the frequency of data exposures through LLMs increases, it's becoming evident that upcoming regulations, such as the EU AI Act, will impose stricter control on the use of sensitive information as training data.

In this whitepaper, you’ll see exactly how to protect sensitive data and mitigate privacy risks without compromising the utility of LLM-based solutions.

Specifically, you’ll see how using protected confidential data in LLMs’ fine-tuning and Retrieval Augmented Generation (RAG) can achieve:

  • Nearly identical performance to cleartext data in fine-tuning

  • Similar performance to using unprotected, cleartext data in RAG
With the growing reliance on LLM technologies, you’ll better understand whether protected data can achieve comparable results to unprotected sensitive data, addressing both efficiency and privacy concerns.
Receive this whitepaper directly in your inbox!
Would you like this resource as a PDF? Please provide your email address, and we'll deliver it to you.
NOTICE: By clicking the “Submit” button above to submit this form, you provide explicit consent for Anonos and subsidiaries of Anonos to process the information you provide, including your email address: (i) to deliver the whitepaper to you; (ii) to provide related information to you; and (iii) via transfer and processing occurring outside of the European Economic Area, including within the USA; all in accordance with the terms of Anonos’ privacy policy available at anonos.com/privacy. If you do not wish to submit your information to Anonos for the processing explicitly authorized above, you should not submit this form. You may also contact Anonos at dpo@anonos.com with any questions.

Customizing LLMs Beyond Prompts: Fine-Tuning and RAG

LLMs and Generative AI are quickly becoming indispensable for businesses across various industries. These technologies help companies boost productivity, reduce costs associated with repetitive tasks, enhance analytics, improve products & services, and more.

A study by Informatica showed that out of 600 Chief Data Officers, 45% are already using AI, and another 54% plan to start soon2. Gartner predicts that by 2026, over 80% of companies will be using generative AI APIs, models, or applications, up from less than 5% in 20233.

This rapid adoption and the high expectations set for Generative AI and LLMs highlight the growing need for these technologies to meet the complex demands of businesses.
LLMs offer significant benefits, but their limitations become noticeable when dealing with queries about events after their training date or confidential information they have not been exposed to. This is why approaches such as model fine-tuning and RAG have been developed.
These techniques were designed to extend the utility of LLMs by enabling them to process and respond to confidential or newly available data.
Image 1: The workflow of LLM fine-tuning and RAG
Image 1: The workflow of LLM fine-tuning and RAG

What is Fine-Tuning in LLMs

Fine-tuning in LLMs is a process of adapting a pre-trained model to perform specific tasks or work with particular types of data more effectively. This process involves additional training after the initial pre-training phase but on a much smaller task-specific dataset.
Consider an LLM trained on a large amount of general data to develop a broad knowledge base. To specialize this model in, for example, health and fitness topics, you have to train the model further (fine-tune it) on health and fitness data.

Because the pre-trained model has already seen a large amount of data and its weights are already optimized, the fine-tuning typically only affects a small portion of the model (as few as 1% of the parameters) and uses a fraction of the data.

A few techniques exist for efficient fine-tuning, the most prominent of which are currently Parameter-Efficient Fine-Tuning (PEFT) and Quantization. This means you can fine-tune a model in minutes or hours on a couple of consumer-grade GPUs, whereas the pre-training takes weeks or months on high-end server-grade infrastructure.

What is Retrieval Augmented Generation (RAG) in LLMs

The RAG technique was developed to expand the knowledge of the LLM by fetching relevant information from a pre-made database so that during the prompting, relevant portions of the database are retrieved and fed to the LLM as a context that the LLM can use to answer the query from the user.
Consider an example of an LLM-powered tech support chatbot. The pre-trained LLM can quickly get out-of-date as a business releases new products. The chatbot becomes vague and less useful, or even worse, may start hallucinating non-existing and non-factual advice.

With RAG, the business can update the chatbot's embedded knowledge base (in the form of a vector database) immediately as new products are released or new troubleshooting recipes are found.

By inserting the new information in the vector database, the chatbot can retrieve and incorporate the most relevant, up-to-date information into its responses to ensure accurate support.
Image 2: The process of performing RAG
Image 2: The process of performing RAG
For RAG, it is paramount that the retrieval phase finds the relevant context for the query. It is also essential to use an LLM that can adequately parse the query and extract the answer from the context.
RAG is particularly useful for tasks where the model needs to generate responses or content that benefits from up-to-date, specific, factual, or detailed information not contained within its pre-trained parameters.
Now that we have explored the concepts of fine-tuning and RAG, including how they utilize data, let's delve into the data security and privacy risks associated with LLMs in general and these techniques in particular.

Understanding Data Security and Privacy Risks in LLMs

While many companies are eager to use LLMs and Generative AI technologies, a significant barrier to their deployment is the risk to confidentiality, data privacy, and security. If you’re reading our guide, this is most likely your concern too.

AI systems undergo several stages from inception to deployment - from exploration to inference, and they are susceptible to various types of privacy risks at each stage of the AI lifecycle.
Image 3: An overview of activities and risks in an AI system lifecycle
Image 3: An overview of activities and risks in an AI system lifecycle
The key risks to data privacy and security are:

  • Internal Data Exposure: Accidental or intentional mishandling of data within an organization, leading to leaks or unauthorized reuse of data.

  • Overlearning: A phenomenon where LLMs learn more than intended, potentially developing biases or uncovering hidden correlations in sensitive data.

  • Misuse and Data Disclosure: This encompasses unlawful data retention, sharing, as well as implementing changes to the model based on incoming data without considering the privacy implications of this newly integrated information.

  • External Data Exposure: This risk occurs when sensitive information in LLMs is revealed publicly through disclosure in queries or responses of an LLM, using sensitive data in model updates, and more.
These concerns have become particularly acute in light of incidents where sensitive internal data was exposed to LLMs.4
When leveraging third-party LLM models through vendor APIs, such as OpenAI or Anthropic, every interaction with these models is a data-sharing operation.
This is already true with simple Q&A interfaces such as ChatGPT. The prompts are shared, potentially stored, and sometimes used to train the models further, which is risky for your business when the prompts contain personal or confidential information.
For fine-tuning and RAG, the risk is even higher. It is not a few individual prompts that accidentally contain sensitive information but sharing large amounts of data for fine-tuning or extremely specific context information for RAG. These datasets almost always contain sensitive data.
Image 4: Uncontrolled data sharing with third parties is inevitable in the Generative AI era
Image 4: Uncontrolled data sharing with third parties is inevitable in the Generative AI era
Apple, for instance, limited its employees from using ChatGPT and GitHub CoPilot, a tool developed by Microsoft for coding automation, due to the risk of exposing confidential data. Verizon has also taken measures to prevent its employees from using ChatGPT on its corporate systems to protect customer data and proprietary code from potential leaks. Similarly, Deutsche Bank has prohibited the use of ChatGPT by its staff at work to safeguard its confidential data5.
Currently, privacy issues, data breach risks, and the threat of fines for non-compliance, impede AI initiatives right from the outset by restricting access to necessary data. Gartner's 2021 survey on AI within organizations found that two in five experienced a privacy breach or security incident6.
However, excluding sensitive data from the processes of fine-tuning or RAG isn't a feasible solution because this data is essential for developing effective models. Instead, companies need to develop strategies to protect it without compromising its quality.

In the next chapter we’ll show you how using data protection techniques can safeguard sensitive information while enabling the creation of powerful and efficient AI models.

Testing Data Protection Techniques to Enhance Privacy in LLM Fine-Tuning and RAG

In this chapter, we will show you how to validate the feasibility of using data protection methods (reversible de-identification, in particular) in conjunction with LLMs to increase privacy, regulatory compliance, and preserve the quality and fidelity of data.

In our experiment, we will use Anonos Data Embassy, a data-centric security platform, to protect sensitive data while preserving the ability of controlled relinking.

What is Reversible De-Identification

Data de-identification is the process of modifying and transforming sensitive information from a dataset. The goal of de-identification is to minimize the risk of unauthorized data access to sensitive information while retaining the dataset's usefulness.
Data de-identification can be achieved via various methods. As there is often a need to re-link the data to its original form, either for further analysis, personalized services, auditing, or to comply with legal requests, we are using Anonos Data Embassy platform to achieve de-identification that is reversible under controlled conditions.
Image 5: Anonos Data Embassy facilitates the separation of data value from personally-identifiable information while enabling controlled relinking.
Image 5: Anonos Data Embassy facilitates the separation of data value from personally-identifiable information while enabling controlled relinking.
Unlike data anonymization, which permanently severs the link between the data and its source, de-identified relinkable data retains a controlled mechanism for reassociation via a secure mechanism granting authorized users and stakeholders access to the original data.
The resulting utility of this technique will be compared to the utility of cleartext (unprotected data).

Our analysis will cover two non-mutually exclusive approaches:

  • Fine-tuning: We will sanitize data prior to fine-tuning an internal LLM to manage sensitive data exposure risks and achieve lawful data use. Protecting the data before training (as in fine-tuning) is essential to controlling risks and achieving compliance.

  • RAG: We will build an indexed database out of protected sensitive documents, which can be used to augment queries to an LLM model. This database will support an LLM by providing additional information for processing queries, effectively allowing the model to handle inquiries about previously unknown data. Here, the focus is on managing risks and ensuring the legality of the database and the LLM's interactions.
Image 6: Schematic view of the LLM training, tuning, and prompting cycle. The figure shows where the different protection processes fit in the whole chain.
Image 6: Schematic view of the LLM training, tuning, and prompting cycle. The figure shows where the different protection processes fit in the whole chain.

Our Methodology

Data


For our experiments, we used a dataset of 1,000 question-and-answer (Q/A) pairs related to a technology company named Lamini, accessible at Lamini's Hugging Face dataset page.

We chose this specific dataset because the subject matter, Lamini, was unknown to GPT-3.5 and other LLMs.
Examples of Q/A pairs:

Q: Does the documentation include a comprehensive glossary of technical terms and concepts related to Lamini and language modeling?

A: Lamini can be quickly and easily learned - the documentation is available here: https://lamini-ai.github.io/.

Evaluation Protocol


LLM-Assisted Evaluation


In the sections below, we use an LLM-assisted evaluation to compare the performance in each set of experiments.

This technique makes use of an evaluator LLM, typically at least as powerful as the evaluated LLMs. The evaluator LLM must be independent of the evaluated LLMs; in particular, they must not share history.

It consists of feeding the evaluator LLM the ground truth and the answer generated by the evaluated LLM, together with the question. We then ask the evaluator LLM to assess how closely the generated answer matches the true answer on a scale of 0 to 10.

There are two advantages of this approach:


  • It allows for the automatic evaluation of the generated answers, eliminating the need for time-consuming manual reviews.

  • It provides a numerical score of similarity, offering a more quantitative method for assessing the answers' accuracy.
To ensure the reliability of the LLM-assisted evaluation, we selected a small set of questions for manual verification. We took this step to confirm that the similarity scores provided by the LLM were consistent with those assessed by a human evaluator.

Before performing the evaluation, we re-linked the protected answers to cleartext to compare the answers to the generated cleartext and the ground truth.

We performed LLM-assisted evaluation using the following prompt:
Your job is to evaluate the performance of a question answering system. You will be given a query, a true answer, and a generated answer. Your task is to grade the generated answer on a scale of 0-10 comparing the generated answer to the true answer. A grade of 0 means the GENERATED ANSWER is very different from the TRUE ANSWER. A grade of 10 means the GENERATED ANSWER is almost identical to the TRUE ANSWER. A grade of 5 means the GENERATED ANSWER is similar to the TRUE ANSWER.

Your response must ONLY be an integer between 0 and 10 (inclusive). Do not include any other text in your response.

QUERY: {question}

TRUE ANSWER: {ground_truth_answer}

GENERATED ANSWER: {model_answer}

GRADE:
In the evaluation process, the fields labeled question, ground_truth_answer, and model_answer were populated with the respective question, the original answer provided by Lamini, and the answer generated by the evaluated model.

We gathered all scores into comma-separated value files and analyzed them to produce final results.



Cleartext vs. Protected Fine-Tuning


In this experiment, we used three models: A stock GPT3.5, a GPT3.5 fine-tuned using cleartext data, and a GPT3.5 fine-tuned using protected data.

Setup


  • De-identifying the dataset: The lamini_docs dataset was de-identified with the Anonos Data Embassy SDK using “entity counters.” Data Embassy entity counters are placeholders of the form <ENTITY>-<NUMBER>: For example, NAME-123 or LOCATION-51.

  • Preparing data for fine-tuning: Both the cleartext and de-identified version of lamini_docs were transformed into the OpenAI conversational chat format required for fine-tuning GPT3.5, and each transformed dataset was split into a 90% training set and a 10% validation set.

  • Fine-tuning the models: Two distinct fine-tuned AI models were generated based on the cleartext and protected datasets, respectively. Both models were fine-tuned for 3 epochs.

Experiment


To evaluate the performance of the three models, we prompted each with the 999 questions in lamini_docs and collected responses.

De-identified versions of the questions were sent to the de-identified fine-tuned model, while cleartext questions were sent to GPT 3.5 and its cleartext fine-tuned variant. The responses of the protected fine-tuned model were relinked back to their cleartext counterpart before evaluating them.
The questions were submitted without providing any additional context or instruction, except for the following system prompt that was sent along with each question: You are a helpful assistant. Answer succinctly and stay relevant to the question. To get more deterministic answers, the temperature was set to 0.0.

Evaluation


The performance of each model was then assessed using LLM-assisted evaluation (described above), with GPT 4 as the evaluator LLM. It was asked to rank each of GPT 3.5’s responses on a scale from 0 to 10 by comparing the actual answer with the expected answer from lamini_docs. As a soundness check, a few responses were manually inspected to verify that GPT 4’s ranking was reasonable.

The prompt used to instruct GPT 4 is similar to the one shown in the RAG section below.

Results


The following charts show the distribution of GPT-4’s rankings. The leftmost plot A is the result of submitting the prompts to GPT 3.5 without any fine-tuning. The other two plots (B and C) show the results of submitting cleartext prompts to cleartext fine-tuned GPT and protected prompts to protected fine-tuned GPT.
Plots
Left plot A: Prompts submitted without fine-tuning
Middle plot B: Cleartext prompts submitted to cleartext fine-tuned GPT
Right plot C: Protected prompts submitted to protected fine-tuned GPT
In both cases, fine-tuning achieves a significant improvement over the baseline.

Prompting the de-identified model achieves similar results as prompting the cleartext (fine-tuned) model, with only a 6% decrease in accuracy, representing a minimal trade-off for improved data protection.

Model Satisfactory answers (rank≥6)1 Totally incorrect (rank≤1)
Pre-trained GPT 3.5 54% 19%
Cleartext fine-tuned 74% 5%
Protected fine-tuned 68% 7%

1 GPT-4 was told to assign 5 to “partially incorrect answers.”
Image 7: Model performance results by model and accuracy.
Image 7: Model performance results by model and accuracy.
As for the agreement between the two fine-tuned models, 83% of the answers of the fine-tuned models were evaluated in a similar way – satisfactory or not in both models:

Protected Cleartext Satisfactory Not Satisfactory
Satisfactory 63% 11%
Not satisfactory 6% 20%

In cases of disagreement, the protected model performed about 5% worse than the cleartext model, consistently with the overall loss of accuracy.

Important Notes


  • The pre-trained GPT 3.5 model can adequately answer over half of the questions as judged by GPT-4's standards. This is because some questions can accept generic responses that are easily generated from the model without knowledge of specific information, such as inquiries about finding documentation ("Where can I find Lamini's documentation?" with the straightforward reply "On Lamini's website"). Some questions are structured as yes/no queries, making them relatively easy for the model to handle effectively.

    This observation suggests that the actual difference in performance between pre-trained and fine-tuned models is more significant than the statistics imply.

  • To understand why a 6% decrease in accuracy is relatively minor, consider that the trade-off for this drop is a substantial increase in privacy and security. Sharing sensitive data is often a showstopper, making the de-identified approach highly valuable for applications dealing with sensitive information. Achieving close to cleartext performance with significantly enhanced data protection offers a compelling compromise.

Cleartext vs. Protected RAG


Setup


For the RAG part of this project, we used the LlamaIndex open-source library to perform all LLM-related tasks. This library provides a good balance between convenience and control over the RAG pipeline.

The choice for the LLM was the Chat version of Llama2 13B. In particular, the quantized 5-bit (Q5_K_M) version at HuggingFace that can be offloaded in its entirety to an A10 GPU using llama.cpp to speed up answer generation. All tasks ran locally in an AWS g5.2xlarge instance running Ubuntu 22.04.

To perform RAG, the Lamini Q&A data was de-identified using entity counters, hashes, and encryption, and four indexed DBs were created: One for each of the de-identified datasets and one with the cleartext original dataset.

Each database (DB) was constructed solely using the answers from the Lamini dataset, excluding the questions.

Experiment


For each indexed database, we conducted RAG Q&A sessions using a basic similarity search to identify relevant context for information retrieval. This approach represents the simplest search method, which we chose given our primary objective of assessing the effectiveness of using de-identified data.

Each question, cleartext or de-identified, was embedded in the following prompt:
Use the following pieces of context to answer the question at the end.

If you don't know the answer, just say that you don't know. Don't try to make up an answer.

Use three sentences maximum and keep the answer as concise as possible. Start the answer with "This is the answer:".

Context: {context_str}

Question: {query_str}
In this setup, the context_str refers to the information retrieved from the indexed database, which provides all the necessary context to answer the question included in the query_str field. This context-enriched prompt was then input into the LLM, and the generated answer was recorded.

Evaluation


Before performing the LLM-assisted evaluation, the entity counter, hash, and encrypt generated answers were re-linked to cleartext to compare the answers to the generated cleartext and the ground truth.

The LLM-assisted evaluation was performed using the same Llama 2 13B model described before7.

Results


In the RAG setup, we employed LLM-assisted evaluation to gauge the similarity between the generated answers and the ground truth. The Evaluator LLM was instructed to score the answers on a scale from 0 (for dissimilar answers) to 10 (for identical answers).

While it's not essential for the scores to always hit 10, indicating exact matches, we do expect the answers to be semantically identical, meaning scores should ideally be 7 or higher to meet our criteria for similarity8.

We can plot the number of questions with a certain score for each of the datasets to visualize if the scores agree with our expectations9:
Image 8: LLM-assisted evaluation scores for RAG Q&A of the Lamini Docs dataset.
Image 8: LLM-assisted evaluation scores for RAG Q&A of the Lamini Docs dataset.
From the above distributions, it is clear that the bulk of the scores are above 7 with a median of 8, as we expected from a successful RAG. This finding is important because knowing that the RAG worked is necessary to compare the RAG run with de-identified data to the RAG run with cleartext data.

Using the statistical Chi-squared test, we have validated that the four-count distributions are not distinguishable, particularly when counts are aggregated in scores less than 7 and equal or larger than 7 (unsatisfactory responses vs. satisfactory responses).

Conclusion: Protected Data Balances Privacy and Utility in LLM Fine-Tuning & RAG

Our investigation into using protected data for fine-tuning and RAG with LLMs demonstrates its practicality as a method to maintain data privacy while customizing LLMs.
  • In model fine-tuning, despite a slight performance decrease on protected data, the integrity of sensitive information remains protected.

  • In RAG, protected data does not compromise the model's ability to generate relevant responses, as evidenced by consistent scores across datasets in LLM-assisted evaluations.
This finding is pivotal, especially as businesses integrate LLMs to streamline tasks, confronting the challenge of protecting sensitive data amidst stringent regulations and cross-border data transfers.

Ultimately, our research affirms the critical need for deploying data protection strategies in LLM applications, advocating for the use of protected data to strike a balance between efficiency and privacy.

As this technology advances, embracing data privacy will be vital for businesses aiming to leverage machine learning and AI effectively within the framework of global privacy standards.

Preserve Privacy in LLMs with Anonos Data Embassy


Data security and privacy are critical concerns in developing and applying LLMs and generative AI tools. Employing protection technologies helps you mitigate these risks and streamline access to high-quality data necessary for creating effective AI systems.

Identifying a technical solution that can provide a robust level of protection and data governance while safeguarding the utility and fidelity of data is crucial.

Anonos Data Embassy, a data-centric security platform, is designed to simplify, automate, and accelerate sensitive data processing throughout the AI lifecycle without the utility tradeoff.
Image 9: Anonos Data Embassy provides seamless protection for the lifecycle of AI & ML development and deployment.
Image 9: Anonos Data Embassy provides seamless protection for the lifecycle of AI & ML development and deployment.
The platform applies protection transformations to sensitive data – Variant Twins™. Based on your AI project’s use case, digital policies are infused into Variant Twins to ensure data is secured no matter where it goes while maintaining the highest utility.

The core principle of Data Embassy is Zero Trust Data™, an approach that prioritizes protecting data throughout its entire lifecycle rather than relying on network or perimeter defenses alone.
Image 10: Anonos Data Embassy offers data protection for fine-tuning, RAG, and prompts.
Image 10: Anonos Data Embassy offers data protection for fine-tuning, RAG, and prompts.
Data Embassy's Prompt, Response, RAG, and Tuning protection capabilities will help you harness the full benefits of emerging Generative AI without the risk of sensitive data, trade secrets, or IP leakage, delivering data security and unparalleled performance.
LET’S DISCUSS YOUR USE CASE
NOTICE: By clicking the “Book a Demo” button above to submit this form, you provide explicit consent for Anonos and subsidiaries of Anonos to process the information you provide, including your name and email address: (i) to deliver the demo to you; (ii) to provide related information to you; and (iii) via transfer and processing occurring outside of the European Economic Area, including within the USA; all in accordance with the terms of Anonos’ privacy policy available at anonos.com/privacy. If you do not wish to submit your information to Anonos for the processing explicitly authorized above, you should not submit this form. You may also contact Anonos at dpo@anonos.com with any questions.
1. Gartner. (2023, January). Market Guide for AI Trust, Risk and Security Management. By Avivah Litan, Jeremy D'Hoinne, Bart Willemsen, Sumit Agarwal.

7. Any other model could be used for the assisted evaluation, but this particular model performs very well as it runs locally and was already set up. In any case, the inferences of the model for the assisted evaluation were independent from the RAG, i.e. there was no shared history.

8. This score threshold of 7 points is based on the subset of questions used to confirm the viability of the LLM-assisted evaluation.

9. Notice that no score equal to 1 was given by the LLM-assisted evaluation, and the instances of scores being 0 are attributed to errors in the LLM's output, as detailed in the methods section.