1 code implementation • 20 May 2024 • Ermo Hua, Biqing Qi, Kaiyan Zhang, Yue Yu, Ning Ding, Xingtai Lv, Kai Tian, BoWen Zhou
To obtain a unified understanding, we interpret SFT and PO with two sub-processes -- Preference Estimation and Transition Optimization -- defined at token level within the Markov Decision Process (MDP) framework.
no code implementations • 5 Mar 2024 • Kaiyan Zhang, Jianyu Wang, Ermo Hua, Biqing Qi, Ning Ding, BoWen Zhou
With the advancement of language models (LMs), their exposure to private data is increasingly inevitable, and their deployment (especially for smaller ones) on personal devices, such as PCs and smartphones, has become a prevailing trend.