Clara Isabel Meister @clara__meister X Profile

Clara Isabel Meister

@clara__meister

Followers

2K

Following

74

Media

10

Statuses

124

Post-doc teaching a continuing studies program at ETH Zurich. Still figuring out how Twitter works... 🤦‍♀️

https://t.co/l7fGrQs34m

Zurich, Switzerland

Joined June 2019

Don't wanna be here? Send us removal request.

ZurichAI

@zurichnlp

29 days

The first Zurich Robotics is in 7 days! RSVP now for the September 24th @ETH_AI_Center: Barnabas Gavin Cangan (@gavincangan, ETHZ) on why robot hands are so hard and Caterina Caccavella (ZHAW / ETHZ) on bio-inspired active sensing. Link below.

1

4

14

Clara Isabel Meister

@clara__meister

1 month

Is it just me, or did Claude Code get a lot worse in the last month...

1

0

1

Clara Isabel Meister

@clara__meister

1 month

We hope our insights and opinions can help shape ongoing discussions about future research in generative AI! Joint work with many great authors, including @LauraManduchi @kpandey008 @StephanMandt @vincefort

0

1

Clara Isabel Meister

@clara__meister

1 month

Beyond these discussions, the paper also includes extensive pointers to relevant work—including surveys and key papers across subfields. We’ve continuously updated it to reflect the latest developments. It can thus 🤞be a valuable resource for just about anyone working in gen AI.

1

0

Clara Isabel Meister

@clara__meister

1 month

Core argument: scaling alone won’t deliver a “perfect” generative model. We highlight promising methods towards (1) broadening adaptability (robustness, causal/assumption-aware methods), (2) improving efficiency & evaluation, (3) addressing ethics (misinfo, privacy, fairness)

1

0

Clara Isabel Meister

@clara__meister

1 month

* The current landscape of generative models * Open technical challenges and research gaps * Implications for fairness, safety, and regulation * Opportunities for impactful future research

1

0

Clara Isabel Meister

@clara__meister

1 month

Generative AI has made huge progress, but we still lack sufficient understanding of its capabilities, limitations and potential societal impacts. This collaborative position paper (sparked by the Dagstuhl Seminar on Challenges + Perspectives in Deep Generative Modeling) examines:

1

0

Clara Isabel Meister

@clara__meister

1 month

Exciting news! Our paper "On the Challenges and Opportunities in Generative AI" has been accepted to TMLR 2025. 📄

arxiv.org

The field of deep generative modeling has grown rapidly in the last few years. With the availability of massive amounts of training data coupled with advances in scalable unsupervised learning...

1

2

10

Clara Isabel Meister

@clara__meister

1 month

PRs and recommendations for improvement very welcome!!

0

3

Clara Isabel Meister

@clara__meister

1 month

I've recently been fascinated by tokenization, a research area in NLP where I still think there's lots of headway! In an effort to encourage research, I made a small tokenizer eval suite (intrinsic metrics) with some features I found missing elsewhere:

github.com

Contribute to cimeister/tokenizer-analysis-suite development by creating an account on GitHub.

4

15

160

Clara Isabel Meister

@clara__meister

2 months

Paper: https://t.co/JFalU5TLTR Code:

github.com

Parity-Aware Byte-Pair Encoding: Improving Cross-lingual Fairness in Tokenization [arXiv 2025] - swiss-ai/parity-aware-bpe

1

2

6

Clara Isabel Meister

@clara__meister

2 months

In short, Parity-aware BPE = minimal overhead + clear fairness gains. If you care about multilingual robustness, tokenization is low-hanging fruit. Joint work with @negarforoutan @DebjitPaul2 @joelniklaus @sina_ahm @ABosselut @RicoSennrich

1

0

5

Clara Isabel Meister

@clara__meister

2 months

What’s even more exciting: low- and medium-resource languages benefit the most. We see better vocabulary utilization and compression rates for these languages, highlighting the effectiveness of our approach in providing fairer language allocation.

1

0

7

Clara Isabel Meister

@clara__meister

2 months

Empirical results: Gini coefficient of tokenizer disparity (0 indicates a tokenizer's compression rates across languages are equal) improves by ~83% with global compression remaining very similar. On downstream task accuracy, improvements outnumber declines across configurations

1

0

6

Clara Isabel Meister

@clara__meister

2 months

It’s a drop-in replacement in existing systems that introduces minimal training-time overhead: if you already use a BPE tokenizer, formats and tokenization/detokenization at inference are unchanged. You just need language-labeled multilingual corpora and a multi-parallel dev set.

1

0

5

Clara Isabel Meister

@clara__meister

2 months

What changes from classical BPE? Only a small part of training. We compute frequency stats per language → when choosing the next merge, we pick it from the stats of the language with the worst compression rate, rather than from global stats. Everything else stays the same!

1

0

8

Clara Isabel Meister

@clara__meister

2 months

🚨New Preprint! In multilingual models, the same meaning can take far more tokens in some languages, penalizing users of underrepresented languages with worse performance and higher API costs. Our Parity-aware BPE algorithm is a step toward addressing this issue: 🧵

5

30

283

Tiago Pimentel

@tpimentelms

4 months

A string may get 17 times less probability if tokenised as two symbols (e.g., ⟨he, llo⟩) than as one (e.g., ⟨hello⟩)—by an LM trained from scratch in each situation! Our #acl2025nlp paper proposes an observational method to estimate this causal effect! Longer thread soon!

3

24

136

Tiago Pimentel

@tpimentelms

1 year

Do you want to quantify your model’s counterfactual memorisation using only observational data? Our #ACL2024NLP paper proposes an efficient method to do it :) No interventions required! You can also see how memorisation evolves across training! Check out Pietro's🧵for details :)

Pietro Lesci

@pietro_lesci

1 year

Happy to share our #ACL2024 paper: "Causal Estimation of Memorisation Profiles" 🎉 Drawing from econometrics, we propose a principled and efficient method to estimate memorisation using only observational data! See 🧵 +@clara__meister, Thomas Hofmann, @vlachos_nlp, @tpimentelms

0

3

35

Pietro Lesci

@pietro_lesci

1 year

Super excited and grateful that our paper received the best paper award at #ACL2024 🎉 Huge thanks to my fantastic co-authors — @clara__meister, Thomas Hofmann, @vlachos_nlp, and @tpimentelms — the reviewers that recommended our paper, and the award committee #ACL2024NLP

Pietro Lesci

@pietro_lesci

1 year

Happy to share our #ACL2024 paper: "Causal Estimation of Memorisation Profiles" 🎉 Drawing from econometrics, we propose a principled and efficient method to estimate memorisation using only observational data! See 🧵 +@clara__meister, Thomas Hofmann, @vlachos_nlp, @tpimentelms

7

76