<Attention> [Attention Sinks] Efficient Streaming Language Models with Attention Sinks

최근(2023.09)에 나온 논문을 읽어보고 간단히 정리했습니다. 혹시 부족하거나 잘못된 내용이 있다면 댓글 부탁드립니다 ️ usechatgpt init success [MIT, Meta AI] initial token의 Key, Value를 attention 과정에서 keep하는 방식, Attention Sinks 유한한 길이의 attention window로 학습된 LLM이 무한한 길이의 sequence에 대해 일반화 할 수 있도록 하는 StreaingLLM. 배경 LLM이 여러 태스크에서 뛰어난 퍼포먼스를 보여주는 것은 맞지만, 입력이 특정 길이를 넘어서게 되면 이를 전혀 처리하지 못한다는 문제점을 갖고 있죠. 그런다고 입력 길이를 늘려주자니 attention 연산이 quadratic 하다 보..

원문링크 : <Attention> [Attention Sinks] Efficient Streaming Language Models with Attention Sinks

등록된 다른 글

<Attention> [Attention Sinks] Efficient Streaming Language Models with Attention Sinks

등록된 다른 글

Anchor Boxes

<LLM> [Analogical Prompting] Large Language Models as Analogical Reasoners

<Benchmark> Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

[프로그래머스] 과일 장수(Python)

<Retrieval> [Short Paper Review] Retrieval meets Long Context Large Language Models

Finding the size of a vector, its angle, and projection

<LLM> SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling (2023.12)

<LLM, RNN> Transformers are Multi-State RNNs (2024.01)

키자드 로그인

키자드

네이버 블로그

티스토리

커뮤니티