The LLMs (Large Language Models) are evolving rapidly with continuous advancements in their research and applications.However, this progress also attracts threat actors who actively exploit LLMs for various malicious activities like:Recently, cybersecurity researchers at Google discovered how threat actors can exploit ChatGPT queries to collect personal data.
Protect Your Storage With SafeGuard
StorageGuard scans, detects, and fixes security misconfigurations and vulnerabilities across hundreds of storage and backup devices.
Try StorageGuard for Free
Cybersecurity analysts developed a scalable method that detects memorization in trillions of tokens, analyzing open-source and semi-open models.
Besides this, researchers identified that the larger and more capable models are vulnerable to data extraction attacks.
GPT-3.5-turbo shows minimal memorization due to alignment as a helpful chat assistant. Using a new prompting strategy, the model diverges from chatbot-style responses, resembling a base language model.
Researchers test its output against a nine-terabyte web-scale dataset, recovering over ten thousand training examples at a $200 query cost, with the potential for extracting 10× more data.
Security analysts assess past extraction attacks in a controlled setting, focusing on open-source models with publicly available training data.
Using Carlini et al.’s method, they downloaded 108 bytes from Wikipedia, generating prompts by sampling continuous 5-token blocks.
Unlike prior methods, they directly query the model’s open-source training data to evaluate attack efficacy, eliminating the need for manual internet searches.
Researchers tested their attack on 9 open-source models tailored for scientific research, providing access to their complete training, pipeline, and dataset for study.
Here below, we have mentioned all 9 open-source models:-
- GPT-Neo (1.3B, 2.7B, 6B)
- Pythia (1.4B, 1.4B-dedup, 6.9B, 6.9B-dedup)
- RedPajama-INCITE (Base-3B-v1, Base-7B)
Semi-closed models have downloadable parameters but undisclosed training datasets and algorithms.
Despite generating outputs similarly, establishing ‘ground truth’ for extractable memorization requires experts due to inaccessible training datasets.
Here below, we have mentioned all the semi-closed models that are tested:-
- GPT-2 (1.5b)
- LLaMA (7b, 65b)
- Falcon (7b, 40b)
- Mistral 7b
- OPT (1.3b, 6.7b)
While extracting the data from ChatGPT, researchers found two major challenges, and here below, we have mentioned those challenges:-
- Challenge 1: Chat breaks the continuation interface.
- Challenge 2: Alignment adds evasion.
Researchers extract training data from ChatGPT through a divergent attack, but it lacks generalizability to other models.
Despite limitations in testing for memorization, they use known samples from the extracted training set to measure discoverable memorization.
For the 1,000 longest memorized examples, they prompt ChatGPT with the first N−50 tokens and generate a 50-token completion to assess discoverable memorization.
ChatGPT is highly susceptible to data extraction attacks due to over-training for extreme-scale, high-speed inference.
The trend of over-training on vast amounts of data poses a trade-off between privacy and inference efficiency.
Speculation arises about ChatGPT’s multiple-epoch training, potentially amplifying memorization and allowing easy extraction of training data.
Experience how StorageGuard eliminates the security blind spots in your storage systems by trying a 14-day free trial.