- cross-posted to:
- [email protected]
- [email protected]
- cross-posted to:
- [email protected]
- [email protected]
So the research is out and these LLMs will always be vunerable to poisoned data. That means it will always be worth out time and effort to poison these models and they will never be reliable.


I think it’s pretty obvious. Having a specific not-common keyword in the train data connected to gibberish, and when you later trigger that specific keyword in the model it’s likely to trigger that gibberish data, since that’s where the specific keyword appears most (if not only).
Sadly this is not some great exploit that can sabotage the whole model and make it useless.