<RL, Fine-Tuning> [ByteDance] ReFT - Reasoning with Reinforced Fine-Tuning (2024.01)

관심 있는 NLP 논문을 읽어보고 ChatGPT를 이용하여 정리했습니다. 혹시 부족하거나 잘못된 내용이 있다면 댓글 부탁드립니다 ️ usechatgpt init success [ByteDance Research] - CoT 데이터에 SFT를 적용할 때, 각 질문마다 존재할 수 있는 여러 개의 reasoning paths를 활용하는 방식 - 수학 문제를 푸는 세 개의 벤치마크(GSM8K, MathQA, SVAMP)를 통해 뛰어난 generalizability를 확인 - SFT로 warmup한 이후 PPO를 적용하는 방식인 Reinforced Fine-Tuning을 제안 - 다양한 inference-tim strategies와 결합 가능한 방법론 1. Introduction 지금까지 수학 문제를 푸는 ..

원문링크 : <RL, Fine-Tuning> [ByteDance] ReFT - Reasoning with Reinforced Fine-Tuning (2024.01)

등록된 다른 글

<RL, Fine-Tuning> [ByteDance] ReFT - Reasoning with Reinforced Fine-Tuning (2024.01)

등록된 다른 글

[BOJ] 11286 : 절댓값 힙 [우선순위 큐](Python)

4.6. 연속시간 시스템

Setting up your Machine Learning Application

Batch Normalization

[프로그래머스] 햄버거 만들기(Python)

[프로그래머스] 명예의 전당 (1) (Python)

Taylor series for approximations(1)

<LK Lab, Retrieval> [Np Decoding] Nonparametric Decoding for Generative Retrieval (2023.05)

키자드 로그인

키자드

네이버 블로그

티스토리

커뮤니티