<Attention> Focused Transformer: Contrastive Training for Context Scaling

최근(2023.07)에 나온 논문을 읽어보고 간단히 정리했습니다. 혹시 부족하거나 잘못된 내용이 있다면 댓글 부탁드립니다 ️ usechatgpt init success [Google DeepMind] attention layer가 key, value 쌍으로 이루어진 외부 메모리에 접근. 이를 통해 훨씬 더 긴 입력을 받을 수 있고, 여러 개의 문서에 대해 retrieval 할 수 있게 됨. 이 방식을 Focused Transforemr(FoT)라고 하며, OpenLLaMA(3B, 7B) 대상으로 tuning한 모델, LONGLLAMA를 공개. 배경 LLM은 그 능력이 엄청나지만 의외로 특정 분야에 한정된다는 문제점을 안고 있습니다. 엄청난 양의 데이터와 자원으로 한 번 학습되면, 이를 확장하는 것이..

원문링크 : <Attention> Focused Transformer: Contrastive Training for Context Scaling

<Attention> Focused Transformer: Contrastive Training for Context Scaling

등록된 다른 글

<LoRA, MoE> LoRAMoE: Revolutionizing Mixture of Experts for Maintaining World Knowledge in Language Model Alignment (2023.12)

<Multi-modal> PointLLM: Empowering Large Language Models to Understand Point Clouds

2022(하반기) 삼성전자 DX SCSA 전형 면접 불합격 후기

<Benchmark> Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

<LK Lab, Retrieval> REPLUG: Retrieval-Augmented Black-Box Language Models (2023.01)

<Attention> LongNet: Scaling Transformers to 1,000,000,000 Tokens

[BOJ] 15654 : N과 M (5) [백트랙킹](Python)

<KD, Hallucination> [Idk Dataset] Can AI Assistants Know What They Don't Know? (2024.01)

키자드 로그인

키자드

네이버 블로그

티스토리

커뮤니티