Universal Jailbreak Backdoors from Poisoned Human Feedback

This presentation is an assignment for the 2024 NLP course at the MiNI Faculty at Warsaw University of Technology (WUT).

Original Paper: https://arxiv.org/abs/2311.14455

Bartosz Grabek, Filip Kucia, Szymon Trochimiak

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
NLP_rlhf_poisoning.pdf		NLP_rlhf_poisoning.pdf
README.md		README.md

Provide feedback