<Attention> [TransNormer] Scaling TransNormer to 175 Billion Parameters

최근(2023.07)에 나온 논문을 읽어보고 간단히 정리했습니다. 혹시 부족하거나 잘못된 내용이 있다면 댓글 부탁드립니다 ️ usechatgpt init success 전통적인 somftmax 기반의 attention 모델이 아닌 Linear Attention 기반의 LLM, TransNormerLLM. positional embedding, linear attention acceleration, gating mechanism, tensor normalization, inference acceleration 등의 방식을 적용. linear attention을 가속화하는 Lightning Attention을 제시. 배경 대부분의 인공지능 모델들은 Transformer의 아키텍쳐를 기반으로 삼고 엄청난..

원문링크 : <Attention> [TransNormer] Scaling TransNormer to 175 Billion Parameters

등록된 다른 글

<Attention> [TransNormer] Scaling TransNormer to 175 Billion Parameters

등록된 다른 글

Shallow Neural Network(1)

<CoT, Prompting> [Google DeepMind] Chain-of-Thought Reasoning Without Prompting (2024.02)

<LLM> [Qwen] Qwen Technical Report

10월 4주차 논문 요약: Ask Again, BitNet, Self-RAG, Meta-CoT, AutoDan, NEFTune, VeRA, Atlas

[BOJ] 14500 : 테트로미노 [브루트포스](Python)

Welcome Llama 3 - Meta’s new open LLM (HuggingFace 블로그 Llama 3 - ChatGPT 한글 번역)

3.7. 역행렬을 LU 분해로 구하다 / 3.8. LU 분해의 순서 (2) 예외가 발생한 경우

4.6. 연속시간 시스템

키자드 로그인

키자드

네이버 블로그

티스토리

커뮤니티