IngoZiegler Profile Banner
Ingo Ziegler Profile
Ingo Ziegler

@IngoZiegler

Followers
112
Following
773
Media
10
Statuses
95

ELLIS PhD Student at University of Copenhagen (NLP, Representation Learning, Generative Modeling)

Copenhagen, Denmark
Joined November 2012
Don't wanna be here? Send us removal request.
@IngoZiegler
Ingo Ziegler
15 days
I will be at #EMNLP2025 to present our TACL paper on synthetic data generation as an Oral! 📅 Presentation: Wednesday, 5 November 🕠 Time: 5:30 PM local time 📍 Location: Hall A102–103 🤝 Project was done together with @akoksal_ @delliott @HinrichSchuetze See you in Suzhou 🇨🇳
@delliott
Desmond Elliott
15 days
@IngoZiegler will present a synthetic data generation framework that rewrites real retrieved documents into task-specific finetuning examples. CRAFT is more stable than existing techniques like SelfInstruct and EvolInstruct across several tasks. Paper: https://t.co/8CBshW6wwv
1
1
9
@IngoZiegler
Ingo Ziegler
4 days
We show that structuring sequences of images and text in a multi-turn conversation style is very effective to improve the sequential reasoning ability of multimodal LLMs! Now accepted @wacv_official See you in Arizona🌵#WACV2026
@danaesavi
Danae Sánchez
4 days
Our paper ImageChain (with @IngoZiegler & @delliott) was accepted at #WACV2026! We explore how multimodal LLMs reason over sequences of images I’ll present it at the @_LXAI Workshop @NeurIPS 🇲🇽 (Nov 30 ~10:45 Mex City). Come chat if you’re there! 🫶 📄 https://t.co/iQdaNZcDsn
0
0
1
@constanzafierro
Constanza Fierro
5 days
Can we find weight directions to modify LLM's behaviors? Our new paper proposes contrastive weight steering, an alternative to activation steering for modifying behaviors using small narrow distribution data 🕹️ 🧵👇
4
33
202
@LucaAmb
Luca Ambrogioni
3 months
1/2) I am very happy to finally share something I have been working on and off for the past year: "The Information Dynamics of Generative Diffusion" This paper connects the entropy production, divergence of vector fields and spontaneous symmetry breaking in a unified framework
14
120
956
@ilker_kesen
İlker Kesen
3 months
Excited to share that our paper "Multilingual Pretraining for Pixel Language Models" has been accepted to the #EMNLP2025 main conference! Please see the thread below and the paper itself for more details.
@ilker_kesen
İlker Kesen
6 months
Announcing our recent work “Multilingual Pretraining for Pixel Language Models”! We introduce PIXEL-M4, a pixel language model pretrained on four visually & linguistically diverse scripts: English, Hindi, Ukrainian & Simplified Chinese. #NLProc
0
6
25
@ilker_kesen
İlker Kesen
6 months
Announcing our recent work “Multilingual Pretraining for Pixel Language Models”! We introduce PIXEL-M4, a pixel language model pretrained on four visually & linguistically diverse scripts: English, Hindi, Ukrainian & Simplified Chinese. #NLProc
1
3
12
@IngoZiegler
Ingo Ziegler
9 months
📄 Read the paper: https://t.co/JUIjOc1zhi 💻 Code: https://t.co/Lptkv6usfy 📂 Dataset: StoryFrames is now available on @huggingface: https://t.co/RI8jZbg2ly This work was done in collaboration with @danaesavi and @delliott (6/6)
Tweet card summary image
huggingface.co
0
0
1
@IngoZiegler
Ingo Ziegler
9 months
📊 To enable this task, we also introduce StoryFrames—a new dataset designed for sequential image reasoning! 🔹 8,881 curated samples from real-world videos 🔹 Human-annotated 🔹 Temporally coherent scene descriptions 🔹 Enables MLLMs to learn structured event progression (5/6)
1
0
0
@IngoZiegler
Ingo Ziegler
9 months
🔥 ImageChain dominates across all conversation context lengths! 📈 Compared to current MLLMs & standard fine-tuning, ImageChain improves models from 3% baseline performance up to 19% in generating descriptions similar to human-written ground truths! (4/6)
1
0
0
@IngoZiegler
Ingo Ziegler
9 months
🔍 ImageChain solves this by treating image sequences as structured multi-turn conversations. ✨ Key ideas ✅ Images are interleaved with textual descriptions ✅ Next-scene description task optimizes temporal understanding ✅ Instruction-tuning over multi-turn conversation (3/6)
1
0
0
@IngoZiegler
Ingo Ziegler
9 months
Why does sequential reasoning matter? Most MLLMs process images independently, failing to capture temporal dependencies. This limits their ability to understand actions, predict future events, and perform well in real-world applications like robotics and storytelling. (2/6)
1
0
0
@IngoZiegler
Ingo Ziegler
9 months
📢 New paper out! Today we shared our latest work on improving sequential reasoning in multimodal models! Introducing ImageChain 🖼️ ⛓️, a framework that models visual sequences as multi-turn conversations. 🧵(1/6)
@danaesavi
Danae Sánchez
9 months
Our new paper is out! 🖼️➡️📝 We introduce ImageChain, a framework that enhances multimodal LLMs with sequential image reasoning 📄 Arxiv: https://t.co/YGKVIn7NDG Work with @IngoZiegler and @delliott #AI #Multimodal #NLP #MLLM #ComputerVision #ImageChain
1
0
5
@IngoZiegler
Ingo Ziegler
1 year
Additional bonus: Strong out-of-domain generalization 💪🌍 CRAFT's synthetic datasets lead to more robust models with better generalization capabilities than training on in-domain datasets, even when those datasets are human-curated 🧑‍💻 (4/5)
1
0
3
@IngoZiegler
Ingo Ziegler
1 year
Results highlights: 📊 Outperforms or matches instruction-following LLMs on QA tasks 📈 46 preference points improvement over human-curated data for summarization ⬆️ Consistent performance gains when scaling up data size (3/5)
1
0
3
@IngoZiegler
Ingo Ziegler
1 year
🔍 How does CRAFT work? 1️⃣ User provides few-shots with task & desired format 2️⃣ Top-k retrieval finds relevant docs from public corpora 3️⃣ LLMs augment retrieved docs into synthetic samples 4️⃣ Use resulting dataset for fine-tuning ✅ Done (2/5)
1
0
3
@IngoZiegler
Ingo Ziegler
1 year
📢Today we release CRAFT: Corpus Retrieval and Augmentation for Fine-Tuning CRAFT is a framework to generate synthetic, scalable, and task-specific datasets with LLMs 📚 It relies only on public corpora, similarity-search, and augmentation through in-context learning. (1/5)
1
1
28
@fly51fly
fly51fly
1 year
[CL] CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation I Ziegler, A Köksal, D Elliott, H Schütze [University of Copenhagen & LMU Munich] (2024) https://t.co/gGx9xawzDR
1
9
30
@_reachsumit
Sumit
1 year
CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation Presents a method for generating task-specific synthetic datasets using user-provided few-shot examples. 📝 https://t.co/Iz5nFXvyOA 👨🏽‍💻 https://t.co/O6UYmLygkI
0
30
126
@MunichNlp
Munich🥨NLP
2 years
🧪Did you miss or want to rewatch our captivating talk on Protein Language Models by @amelie_iska? Good news! The event is now available on YouTube for you to rewatch and dive into the fascinating world of ESM-2 and its variants. 👉Check it out here: https://t.co/GJC3NRdy2P
0
1
3