<Multi-modal> [LLaVA-1.5] Improved Baselines with Visual Instruction Tuning

최근(2023.10)에 나온 논문을 읽어보고 간단히 정리했습니다. 혹시 부족하거나 잘못된 내용이 있다면 댓글 부탁드립니다 ️ usechatgpt init success [Microsoft Research] LLaVA에서 fully-connected vision-language cross-modal connector를 사용한 LLaVA-1.5 공개. data efficient(1.2M public data) & power(SoTA on 11 benchmarks) 배경 최근에는 LLM 뿐만 아니라 LMM, 즉 Large Multimodal Models에 대한 관심도 뜨겁습니다. 여기서도 마찬가지로 전체 모델을 tuning 하지 않고도 성능을 끌어 올리는 기법 등에 대한 연구가 많이 이뤄지고 있죠. 그중..

원문링크 : <Multi-modal> [LLaVA-1.5] Improved Baselines with Visual Instruction Tuning

등록된 다른 글

Analysis of Algorithms(5) : Theory of Algorithms

<Multi-modal> [LLaVA-1.5] Improved Baselines with Visual Instruction Tuning

등록된 다른 글

Analysis of Algorithms(5) : Theory of Algorithms

Hyperparameter Tuning

GSAT 온라인 모의고사

[대학원생 필수!] 논문 관리 프로그램 Zotero 추천 (WebDAV 연결, iPad annotation 싱크 관리)

네이버 부스트캠프 AI Tech 4기 최종 합격 후기!!(비전공자)

Inner products(2)

[Short Paper Review] Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes

Time saving rules

키자드 로그인

키자드

네이버 블로그

티스토리

커뮤니티