DS+, BK 2024년도 제 1회 통계세미나 개최 안내(2/7(수))
DS플러스
2024-02-04
2024년도 제 1회 BK 통계 세미나 개최를 안내드립니다.
고려대학교 통계학과 통계연구소, BK21 통계학교육연구팀과 DS+ 사업단 주최로 이루어지는 세미나입니다.
일시 : 2024년 02월 07일 (수) 오후 5시
장소 : 고려대학교 정경관 206호
연사 : 이경재 교수 (중앙대 인공지능학과)
주제 :
Advancements in Reinforcement Learning from Human Feedback: A Perspective on Feedback Efficiency
Abstract :
Reinforcement learning from human feedback (RLHF) presents a promising approach to address the challenge of designing task-specific reward functions in reinforcement learning by leveraging human preferences. However, existing RLHF models often suffer from inefficiencies, generating only a single preference data point per human feedback. To overcome this limitation, a novel RLHF framework named SeqRank is introduced, employing sequential preference ranking to enhance feedback efficiency. SeqRank adopts a systematic approach to sample trajectories, iteratively selecting defenders and challengers to optimize trajectory comparison strategies. By maintaining the history of preference relationships, SeqRank efficiently augments preference data without additional human feedback. The proposed framework substantially improves feedback efficiency and outperforms conventional methods, demonstrating faster convergence rates and enhanced task performance in locomotion and manipulation tasks. Notably, root pairwise comparison emerges as the most effective method, showcasing significant improvements in task performance compared to baseline approaches. Experimental results underscore the effectiveness and practical applicability of the proposed framework, highlighting its potential to advance reinforcement learning paradigms.
홍보 자료 : 첨부파일 확인 부탁드립니다
Zoom 링크는 아래와 같습니다.
참가 Zoom 회의
https://korea-ac-kr.zoom.us/j/4145766503?pwd=FMeFToJRalvz6UDl8xOB9g1uQOyufg.1&omn=89053988454
회의 ID: 414 576 6503
암호: Kustat123@
많은 관심 부탁드립니다.
감사합니다.
고려대학교 통계학과 통계연구소, BK21 통계학교육연구팀과 DS+ 사업단 주최로 이루어지는 세미나입니다.
일시 : 2024년 02월 07일 (수) 오후 5시
장소 : 고려대학교 정경관 206호
연사 : 이경재 교수 (중앙대 인공지능학과)
주제 :
Advancements in Reinforcement Learning from Human Feedback: A Perspective on Feedback Efficiency
Abstract :
Reinforcement learning from human feedback (RLHF) presents a promising approach to address the challenge of designing task-specific reward functions in reinforcement learning by leveraging human preferences. However, existing RLHF models often suffer from inefficiencies, generating only a single preference data point per human feedback. To overcome this limitation, a novel RLHF framework named SeqRank is introduced, employing sequential preference ranking to enhance feedback efficiency. SeqRank adopts a systematic approach to sample trajectories, iteratively selecting defenders and challengers to optimize trajectory comparison strategies. By maintaining the history of preference relationships, SeqRank efficiently augments preference data without additional human feedback. The proposed framework substantially improves feedback efficiency and outperforms conventional methods, demonstrating faster convergence rates and enhanced task performance in locomotion and manipulation tasks. Notably, root pairwise comparison emerges as the most effective method, showcasing significant improvements in task performance compared to baseline approaches. Experimental results underscore the effectiveness and practical applicability of the proposed framework, highlighting its potential to advance reinforcement learning paradigms.
홍보 자료 : 첨부파일 확인 부탁드립니다
Zoom 링크는 아래와 같습니다.
참가 Zoom 회의
https://korea-ac-kr.zoom.us/j/4145766503?pwd=FMeFToJRalvz6UDl8xOB9g1uQOyufg.1&omn=89053988454
회의 ID: 414 576 6503
암호: Kustat123@
많은 관심 부탁드립니다.
감사합니다.