Great news! A small number of samples can poison LLMs of any size

Auth@lemmy.world · 2 months ago

Great news! A small number of samples can poison LLMs of any size

hexagonwin@lemmy.sdf.org · 2 months ago

I think it’s pretty obvious. Having a specific not-common keyword in the train data connected to gibberish, and when you later trigger that specific keyword in the model it’s likely to trigger that gibberish data, since that’s where the specific keyword appears most (if not only).

Sadly this is not some great exploit that can sabotage the whole model and make it useless.