Unlearning techniques in generative AI models have been used to help models forget specific and undesirable information picked up during training, such as sensitive private data or copyrighted material. However, a new study conducted by researchers at the University of Washington, Princeton, the University of Chicago, USC, and Google suggests that current unlearning techniques can actually degrade models and make them less capable of answering basic questions.
The study found that the most popular unlearning techniques today tend to render models unusable, making them less efficient and accurate. Weijia Shi, a researcher on the study, stated that currently available unlearning methods are not yet ready for real-world scenarios and that there are no efficient methods that enable a model to forget specific data without considerable loss of utility.
Generative AI models are statistical systems that learn to predict words, images, speech, music, videos, and other data based on patterns in the training data. They do not possess real intelligence and simply make informed guesses based on the context and patterns they have learned. These models are typically trained on data sourced from public websites and datasets, often without informing or crediting the data’s owners, leading to copyright disputes.
Unlearning techniques have gained attention due to the copyright dilemma and the need to remove sensitive information from existing models. While some vendors have introduced opt-out tools to allow data owners to request the removal of their data from future models, unlearning would provide a more thorough approach to data deletion.
However, unlearning is not as simple as pressing “Delete.” Current unlearning techniques rely on algorithms designed to steer models away from the data they need to unlearn. These algorithms aim to influence the model’s predictions to prevent the output of certain data. The researchers developed a benchmark called MUSE (Machine Unlearning Six-way Evaluation) to test the effectiveness of different unlearning algorithms. The benchmark measures the model’s ability to prevent regurgitation of training data and eliminate the model’s knowledge of that data.
In their study, the researchers found that the tested unlearning algorithms did make models forget certain information but also significantly impacted their general question-answering capabilities. This trade-off presents a challenge in designing effective unlearning methods since knowledge is intricately entangled in the model. Removing copyrighted material, for example, also affects the model’s knowledge of related freely available content.
Currently, there are no solutions to this problem, highlighting the need for further research. Vendors relying on unlearning as a solution to their training data issues will need to find alternative methods to prevent their models from producing undesirable outputs. While a technical breakthrough may make unlearning feasible in the future, for now, it remains a challenge for the field of generative AI.