<LM> DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

최근(2023.03)에 나온 논문을 읽어보고 간단히 정리했습니다. 혹시 부족하거나 잘못된 내용이 있다면 댓글 부탁드립니다 ️ usechatgpt init success [Microsoft Research / Azure AI] DeBERTa의 MLM을 RTD로 대체하고, 새로운 gradient-disentangled embedding sharing 방식을 적용. multilingual 모델 mDeBERTaV3도 개발. 배경 지난 번에 소개한 모델 DeBERTa는 relative position을 더 잘 반영하는 disentangled attention과 absolute position을 반영하는 enhanced mask decoder(EMD)을 주요 특징으로 내세웠습니다. 본 논문에서 DeBERTa는..

원문링크 : <LM> DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

<LM> DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

등록된 다른 글

<Instruction> Self-Alignment with Instruction Backtranslation

AI and Society(3)

[BOJ] 1149 : RGB거리 [다이나믹 프로그래밍](Python)

What are eigen-things?

<LLM> Lost in the Middle: How Language Models Use Long Contexts

Inner products(2)

2023년 회고록: 성장하지 못한 낙동강 오리알 cc

Logistic Regression as a Neural Network(2)

키자드 로그인

키자드

네이버 블로그

티스토리

커뮤니티