no code implementations • 22 Apr 2024 • Subhojyoti Mukherjee, Anusha Lalitha, Kousha Kalantari, Aniket Deshmukh, Ge Liu, Yifei Ma, Branislav Kveton
Learning of preference models from human feedback has been central to recent advances in artificial intelligence.
no code implementations • 12 Apr 2024 • Subhojyoti Mukherjee, Anusha Lalitha, Aniket Deshmukh, Ge Liu, Yifei Ma, Branislav Kveton
We propose two algorithms, GO and SAL, which differ in how the few-shot examples are chosen.