This year will probably be remembered for the revolution of ChatGPT (the website was visited by 1.7 billion users in October 2023, with 13.73% of growth compared to the previous month) and for the widespread adoption of generative AI technologies in our daily life. One of the key aspects of the language models used for generative AI is the training dataset, and despite the controls in place for protecting data privacy, the risk of using sensitive or protected information to train the model and the possibility of having this content inadvertently leaked is real. The latest warning comes from a paper published by researchers from Google and a team of academics: using a technique known as extractable memorization, the researchers were able to extract gigabytes of training data from several language models, including ChatGPT.
In what is called “a divergence attack” the academics discovered that asking the model to repeat a word forever (for example in the paper they showed the explicit example of the term “poem”) caused it to diverge and start generating nonsensical output. The problem is that a small fraction of these generations diverged into memorization, leaking pre-training data. But a small fraction can become an important amount of data for a motivated adversary with a dedicated budget who is able to perform queries at scale.
In fact, with just $200 USD worth of queries to ChatGPT (gpt-3.5-turbo), the researchers were able to extract more than 10,000 unique verbatim-memorized training examples, concluding that an adversary with a dedicated budget could likely extract “far more data,” and that lar