Direct Preference Optimization: Your Language Model is Secretly a Reward Model

Advances in Neural Information Processing Systems - Tập 36 - Trang 53728-53741 - 2023
Rafailov, Rafael, Sharma, Archit, Mitchell, Eric, Manning, Christopher D, Ermon, Stefano, Finn, Chelsea

Tóm tắt

Từ khóa


Tài liệu tham khảo