Geewook Kim @GeewookKim X Profile

Geewook Kim

@GeewookKim

Followers

238

Following

169

Media

3

Statuses

47

Applied research scientist at NAVER Cloud AI / Ph.D. student at KAIST AI / Previously at Kyoto University / Homepage: https://t.co/9Ncn1jKHZi

https://t.co/9Ncn1jKHZi

Joined December 2021

Don't wanna be here? Send us removal request.

Geewook Kim

@GeewookKim

3 years

Donut🍩(OCR-free Document Understanding Transformer #ECCV2022 @eccvconf) is now available @huggingface🤗 Check it out at https://t.co/m7uM76PAbx with @Gradio demos from @NielsRogge Classification: https://t.co/vWTQ3xDfHn Parsing: https://t.co/NCorkyvnsx VQA: https://t.co/qNmzAVEIia

huggingface.co

Niels Rogge

@NielsRogge

3 years

#LayoutLM gets a strong competitor: Donut 🍩, now available @huggingface! The model uses Swin as encoder, BART as decoder to autoregressively generate classes/parses/answers related to documents! 🔥 No OCR required, MIT licensed, end-to-end. Attention is all you need. (1/2)

2

35

140

hyunji amy lee

@hyunji_amy_lee

6 months

🚨 Want models to better utilize and ground on the provided knowledge? We introduce Context-INformed Grounding Supervision (CINGS)! Training LLM with CINGS significantly boosts grounding abilities in both text and vision-language models compared to standard instruction tuning.

2

46

123

Geewook Kim

@GeewookKim

8 months

Presenting our poster at #ICLR2025 today (Fri, Apr 25, 15:00) — Hall 3 + Hall 2B #264! We explored safety issues when extending LLMs to vision and how to address them. Come by and let’s chat—always happy to discuss ideas! 🤗

0

4

10

Geewook Kim

@GeewookKim

11 months

I'm delighted to share that our latest research endeavors have been accepted! 1. At #NAACL2025, we'll present "Evaluating Multimodal Generative AI with Korean Educational Standards," marking a step forward in aligning AI with rigorous Korean educational tests. 2. For #ICLR2025,

1

28

Seongyun Lee

@sylee_ai

11 months

🎉 Excited to share that our paper "How Does Vision-Language Adaptation Impact the Safety of Vision Language Models?" has been accepted to #ICLR2025! 🖼 Vision-Language Adaptation empowers LLMs to process visual information—but how does it impact their safety? 🛡 And what about

1

16

61

Geewook Kim

@GeewookKim

1 year

I’m pleased to share my recent work at #EMNLP2024 today! Join me at In-Person Poster Session G (Jasmine) on 14 Nov 2024, from 2:00 PM! I am also happy to share that our project is now open-source🤗: https://t.co/yoNBhUIwww @emnlpmeeting #emnlp

Geewook Kim

@GeewookKim

1 year

Happy to share that our new work on designing Efficient LVLMs for Reading and Reasoning has been accepted at #EMNLP2024 Main Conference! https://t.co/t2LOqPKs1W We've studied efficient designs to reduce the resource costs in current VLMs. So happy to contribute to the field! ❤️

1

24

Geewook Kim

@GeewookKim

1 year

Happy to share that our new work on designing Efficient LVLMs for Reading and Reasoning has been accepted at #EMNLP2024 Main Conference! https://t.co/t2LOqPKs1W We've studied efficient designs to reduce the resource costs in current VLMs. So happy to contribute to the field! ❤️

3

10

110

Geewook Kim

@GeewookKim

1 year

September 2024: My citations have reached 1,000 on Google Scholar 🎉 This milestone reminds me of all the collective efforts and small steps taken over time. I’m deeply grateful to my colleagues and mentors for their support and guidance along the way 🥰 https://t.co/MbCZtCI7QZ

scholar.google.com

NAVER Cloud AI & KAIST AI - Large Language Models - Multimodal LLMs - Document AI

4

0

19

Kyunghyun Cho

@kchonyc

1 year

enjoying #ICML2024 ? already finished with llama-3.1 tech report? if so, you must be concerned about the emptiness you'll feel on your flight back home in a couple of days. do not worry! Wanmo and i have a new textbook on linear algebra for you to read, enjoy and cry on your

20

243

2K

Seongyun Lee

@sylee_ai

2 years

I’m thrilled to announce that Prometheus-Vision has been accepted to the ACL 2024 Findings! A huge thanks to all co-authors! See you in Bangkok 🇹🇭!

Seungone Kim

@seungonekim

2 years

🤔How could you evaluate whether your Vision Language Model (VLM) is closely reaching the capabilities of GPT-4V? We’re excited to present 🔥Prometheus-Vision, the first open-source VLM specialized for evaluating other VLMs based on fine-grained scoring criteria, with co-lead

0

4

23

Sungdong Kim

@SungdongKim4

2 years

🤔 Do we always need a human preference for effective LLM alignment after an SFT stage? Our answer is NO 🙅‍♂️ We present a ✨preference-free alignment approach✨, leveraging an off-the-shelf retriever with effective regularizer functions: Regularized Relevance Reward (R^3). [1/n]

1

47

152

Yossi Gandelsman

@YGandelsman

2 years

Accepted to oral #ICLR2024! *Interpreting CLIP's Image Representation via Text-Based Decomposition* CLIP produces image representations that are useful for various downstream tasks. But what information is actually encoded in these representations? [1/8]

10

68

459

AK

@_akhaliq

2 years

Apple presents AIM Scalable Pre-training of Large Autoregressive Image Models paper page: https://t.co/tT5Kr7Ld8y paper introduces AIM, a collection of vision models pre-trained with an autoregressive objective. These models are inspired by their textual counterparts, i.e.,

6

109

597

Seungone Kim

@seungonekim

2 years

🤔How could you evaluate whether your Vision Language Model (VLM) is closely reaching the capabilities of GPT-4V? We’re excited to present 🔥Prometheus-Vision, the first open-source VLM specialized for evaluating other VLMs based on fine-grained scoring criteria, with co-lead

3

44

145

Odashi

@odashi_t

2 years

短い質問文に対してWikipediaに書いてある情報のみで回答させる、というのを1000問前後実施し、人手retrieval付きQAデータセットを作りました。途中の過程や引用なども記録しているので、人間による検索のシミュレーションをデータから検討したりできると思います。 https://t.co/5uJRSiqQfZ

huggingface.co

1

53

274

elvis

@omarsar0

2 years

Improving Information Retrieval in LLMs One effective way to use open-source LLMs is for search tasks, which could power many other applications. This work explores the use of instruction tuning to improve a language model's proficiency in information retrieval (IR) tasks.

8

157

641

Hiroyuki Deguchi

@de9uch1_

2 years

ということで，自動でACL Anthologyからanthology.bibを落として2分割するだけのBashワンライナー作りました．コピペして叩けば分割されたbibが生成されます．汚いのでだれか作り直してください． https://t.co/6u7xhX2Dmp

gist.github.com

one-liner for splitting anthology.bib. GitHub Gist: instantly share code, notes, and snippets.

0

3

14

Seongyun Lee

@sylee_ai

2 years

We are excited to introduce 🌋 Volcano, a multimodal model that revises hallucination in responses through self-feedback. It achieves state-of-the-art on multimodal hallucination benchmarks.

2

28

75

Seungone Kim

@seungonekim

2 years

Excited to present 🔥Prometheus, a fully open-source evaluator LM that is on par with GPT-4 evaluation when the “appropriate” reference materials are appended! * Could generalize to customized score rubrics * Shows high correlation with both human evaluators & GPT-4 evaluation

9

51

343

Aran Komatsuzaki

@arankomatsuzaki

2 years

ChunkAttention: Efficient Attention on KV Cache with Chunking Sharing and Batching Batching long shared prompt prefixes to dramatically speed up self-attention based on chunked KV cache https://t.co/5uzQcj528f

4

37

198

Geewook Kim

@GeewookKim

2 years

Happy to share that our paper has been accepted to #EMNLP2023 !🥰 Please stay tuned for the updated version, along with the code & data! Eagerly looking forward to connect with everyone in Singapore 🙌🇸🇬 Feel free to check our progress here🍦👉:

github.com

Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models, EMNLP 2023 - naver-ai/cream

Aran Komatsuzaki

@arankomatsuzaki

3 years

Cream: Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models Significantly outperforms the existing SotA models on visual document understanding https://t.co/siGo14j4O0

0

21

59