Ingo Ziegler @IngoZiegler X Profile

Ingo Ziegler

@IngoZiegler

Followers

112

Following

773

Media

10

Statuses

95

ELLIS PhD Student at University of Copenhagen (NLP, Representation Learning, Generative Modeling)

Copenhagen, Denmark

Joined November 2012

Don't wanna be here? Send us removal request.

Ingo Ziegler

@IngoZiegler

15 days

I will be at #EMNLP2025 to present our TACL paper on synthetic data generation as an Oral! 📅 Presentation: Wednesday, 5 November 🕠 Time: 5:30 PM local time 📍 Location: Hall A102–103 🤝 Project was done together with @akoksal_ @delliott @HinrichSchuetze See you in Suzhou 🇨🇳

Desmond Elliott

@delliott

15 days

@IngoZiegler will present a synthetic data generation framework that rewrites real retrieved documents into task-specific finetuning examples. CRAFT is more stable than existing techniques like SelfInstruct and EvolInstruct across several tasks. Paper: https://t.co/8CBshW6wwv

1

9

Ingo Ziegler

@IngoZiegler

4 days

We show that structuring sequences of images and text in a multi-turn conversation style is very effective to improve the sequential reasoning ability of multimodal LLMs! Now accepted @wacv_official See you in Arizona🌵#WACV2026

Danae Sánchez

@danaesavi

4 days

Our paper ImageChain (with @IngoZiegler & @delliott) was accepted at #WACV2026! We explore how multimodal LLMs reason over sequences of images I’ll present it at the @_LXAI Workshop @NeurIPS 🇲🇽 (Nov 30 ~10:45 Mex City). Come chat if you’re there! 🫶 📄 https://t.co/iQdaNZcDsn

0

1

Constanza Fierro

@constanzafierro

5 days

Can we find weight directions to modify LLM's behaviors? Our new paper proposes contrastive weight steering, an alternative to activation steering for modifying behaviors using small narrow distribution data 🕹️ 🧵👇

4

33

202

Luca Ambrogioni

@LucaAmb

3 months

1/2) I am very happy to finally share something I have been working on and off for the past year: "The Information Dynamics of Generative Diffusion" This paper connects the entropy production, divergence of vector fields and spontaneous symmetry breaking in a unified framework

14

120

956

İlker Kesen

@ilker_kesen

3 months

Excited to share that our paper "Multilingual Pretraining for Pixel Language Models" has been accepted to the #EMNLP2025 main conference! Please see the thread below and the paper itself for more details.

İlker Kesen

@ilker_kesen

6 months

Announcing our recent work “Multilingual Pretraining for Pixel Language Models”! We introduce PIXEL-M4, a pixel language model pretrained on four visually & linguistically diverse scripts: English, Hindi, Ukrainian & Simplified Chinese. #NLProc

0

6

25

İlker Kesen

@ilker_kesen

6 months

Announcing our recent work “Multilingual Pretraining for Pixel Language Models”! We introduce PIXEL-M4, a pixel language model pretrained on four visually & linguistically diverse scripts: English, Hindi, Ukrainian & Simplified Chinese. #NLProc

1

3

12

Ingo Ziegler

@IngoZiegler

9 months

📄 Read the paper: https://t.co/JUIjOc1zhi 💻 Code: https://t.co/Lptkv6usfy 📂 Dataset: StoryFrames is now available on @huggingface: https://t.co/RI8jZbg2ly This work was done in collaboration with @danaesavi and @delliott (6/6)

huggingface.co

0

1

Ingo Ziegler

@IngoZiegler

9 months

📊 To enable this task, we also introduce StoryFrames—a new dataset designed for sequential image reasoning! 🔹 8,881 curated samples from real-world videos 🔹 Human-annotated 🔹 Temporally coherent scene descriptions 🔹 Enables MLLMs to learn structured event progression (5/6)

1

0

Ingo Ziegler

@IngoZiegler

9 months

🔥 ImageChain dominates across all conversation context lengths! 📈 Compared to current MLLMs & standard fine-tuning, ImageChain improves models from 3% baseline performance up to 19% in generating descriptions similar to human-written ground truths! (4/6)

1

0

Ingo Ziegler

@IngoZiegler

9 months

🔍 ImageChain solves this by treating image sequences as structured multi-turn conversations. ✨ Key ideas ✅ Images are interleaved with textual descriptions ✅ Next-scene description task optimizes temporal understanding ✅ Instruction-tuning over multi-turn conversation (3/6)

1

0

Ingo Ziegler

@IngoZiegler

9 months

Why does sequential reasoning matter? Most MLLMs process images independently, failing to capture temporal dependencies. This limits their ability to understand actions, predict future events, and perform well in real-world applications like robotics and storytelling. (2/6)

1

0

Ingo Ziegler

@IngoZiegler

9 months

📢 New paper out! Today we shared our latest work on improving sequential reasoning in multimodal models! Introducing ImageChain 🖼️ ⛓️, a framework that models visual sequences as multi-turn conversations. 🧵(1/6)

Danae Sánchez

@danaesavi

9 months

Our new paper is out! 🖼️➡️📝 We introduce ImageChain, a framework that enhances multimodal LLMs with sequential image reasoning 📄 Arxiv: https://t.co/YGKVIn7NDG Work with @IngoZiegler and @delliott #AI #Multimodal #NLP #MLLM #ComputerVision #ImageChain

1

0

5

Ingo Ziegler

@IngoZiegler

1 year

This work was done in collaboration with @akoksal_, @delliott, and @HinrichSchuetze 🤝 🤗 Full collection of datasets and checkpoints hosted on @huggingface: https://t.co/wPSxbx4xQX 📄Paper: https://t.co/FTzTfZbSio 💻Code, Datasets, LoRAs: https://t.co/M8CElAbu3c (5/5)

github.com

[TACL, EMNLP 2025 Oral] Code, datasets, and checkpoints for the paper "CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation" - ...

0

5

Ingo Ziegler

@IngoZiegler

1 year

Additional bonus: Strong out-of-domain generalization 💪🌍 CRAFT's synthetic datasets lead to more robust models with better generalization capabilities than training on in-domain datasets, even when those datasets are human-curated 🧑‍💻 (4/5)

1

0

3

Ingo Ziegler

@IngoZiegler

1 year

Results highlights: 📊 Outperforms or matches instruction-following LLMs on QA tasks 📈 46 preference points improvement over human-curated data for summarization ⬆️ Consistent performance gains when scaling up data size (3/5)

1

0

3

Ingo Ziegler

@IngoZiegler

1 year

🔍 How does CRAFT work? 1️⃣ User provides few-shots with task & desired format 2️⃣ Top-k retrieval finds relevant docs from public corpora 3️⃣ LLMs augment retrieved docs into synthetic samples 4️⃣ Use resulting dataset for fine-tuning ✅ Done (2/5)

1

0

3

Ingo Ziegler

@IngoZiegler

1 year

📢Today we release CRAFT: Corpus Retrieval and Augmentation for Fine-Tuning CRAFT is a framework to generate synthetic, scalable, and task-specific datasets with LLMs 📚 It relies only on public corpora, similarity-search, and augmentation through in-context learning. (1/5)

1

28

fly51fly

@fly51fly

1 year

[CL] CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation I Ziegler, A Köksal, D Elliott, H Schütze [University of Copenhagen & LMU Munich] (2024) https://t.co/gGx9xawzDR

1

9

30

Sumit

@_reachsumit

1 year

CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation Presents a method for generating task-specific synthetic datasets using user-provided few-shot examples. 📝 https://t.co/Iz5nFXvyOA 👨🏽‍💻 https://t.co/O6UYmLygkI

0

30

126

Munich🥨NLP

@MunichNlp

2 years

🧪Did you miss or want to rewatch our captivating talk on Protein Language Models by @amelie_iska? Good news! The event is now available on YouTube for you to rewatch and dive into the fascinating world of ESM-2 and its variants. 👉Check it out here: https://t.co/GJC3NRdy2P

0

1

3