clara__meister Profile Banner
Clara Isabel Meister Profile
Clara Isabel Meister

@clara__meister

Followers
2K
Following
74
Media
10
Statuses
124

Post-doc teaching a continuing studies program at ETH Zurich. Still figuring out how Twitter works... 🤦‍♀️

Zurich, Switzerland
Joined June 2019
Don't wanna be here? Send us removal request.
@zurichnlp
ZurichAI
29 days
The first Zurich Robotics is in 7 days! RSVP now for the September 24th @ETH_AI_Center: Barnabas Gavin Cangan (@gavincangan, ETHZ) on why robot hands are so hard and Caterina Caccavella (ZHAW / ETHZ) on bio-inspired active sensing. Link below.
1
4
14
@clara__meister
Clara Isabel Meister
1 month
Is it just me, or did Claude Code get a lot worse in the last month...
1
0
1
@clara__meister
Clara Isabel Meister
1 month
We hope our insights and opinions can help shape ongoing discussions about future research in generative AI! Joint work with many great authors, including @LauraManduchi @kpandey008 @StephanMandt @vincefort
0
0
1
@clara__meister
Clara Isabel Meister
1 month
Beyond these discussions, the paper also includes extensive pointers to relevant work—including surveys and key papers across subfields. We’ve continuously updated it to reflect the latest developments. It can thus 🤞be a valuable resource for just about anyone working in gen AI.
1
0
0
@clara__meister
Clara Isabel Meister
1 month
Core argument: scaling alone won’t deliver a “perfect” generative model. We highlight promising methods towards (1) broadening adaptability (robustness, causal/assumption-aware methods), (2) improving efficiency & evaluation, (3) addressing ethics (misinfo, privacy, fairness)
1
0
0
@clara__meister
Clara Isabel Meister
1 month
* The current landscape of generative models * Open technical challenges and research gaps * Implications for fairness, safety, and regulation * Opportunities for impactful future research
1
0
0
@clara__meister
Clara Isabel Meister
1 month
Generative AI has made huge progress, but we still lack sufficient understanding of its capabilities, limitations and potential societal impacts. This collaborative position paper (sparked by the Dagstuhl Seminar on Challenges + Perspectives in Deep Generative Modeling) examines:
1
0
0
@clara__meister
Clara Isabel Meister
1 month
Exciting news! Our paper "On the Challenges and Opportunities in Generative AI" has been accepted to TMLR 2025. đź“„
Tweet card summary image
arxiv.org
The field of deep generative modeling has grown rapidly in the last few years. With the availability of massive amounts of training data coupled with advances in scalable unsupervised learning...
1
2
10
@clara__meister
Clara Isabel Meister
1 month
PRs and recommendations for improvement very welcome!!
0
0
3
@clara__meister
Clara Isabel Meister
1 month
I've recently been fascinated by tokenization, a research area in NLP where I still think there's lots of headway! In an effort to encourage research, I made a small tokenizer eval suite (intrinsic metrics) with some features I found missing elsewhere:
Tweet card summary image
github.com
Contribute to cimeister/tokenizer-analysis-suite development by creating an account on GitHub.
4
15
160
@clara__meister
Clara Isabel Meister
2 months
In short, Parity-aware BPE = minimal overhead + clear fairness gains. If you care about multilingual robustness, tokenization is low-hanging fruit. Joint work with @negarforoutan @DebjitPaul2 @joelniklaus @sina_ahm @ABosselut @RicoSennrich
1
0
5
@clara__meister
Clara Isabel Meister
2 months
What’s even more exciting: low- and medium-resource languages benefit the most. We see better vocabulary utilization and compression rates for these languages, highlighting the effectiveness of our approach in providing fairer language allocation.
1
0
7
@clara__meister
Clara Isabel Meister
2 months
Empirical results: Gini coefficient of tokenizer disparity (0 indicates a tokenizer's compression rates across languages are equal) improves by ~83% with global compression remaining very similar. On downstream task accuracy, improvements outnumber declines across configurations
1
0
6
@clara__meister
Clara Isabel Meister
2 months
It’s a drop-in replacement in existing systems that introduces minimal training-time overhead: if you already use a BPE tokenizer, formats and tokenization/detokenization at inference are unchanged. You just need language-labeled multilingual corpora and a multi-parallel dev set.
1
0
5
@clara__meister
Clara Isabel Meister
2 months
What changes from classical BPE? Only a small part of training. We compute frequency stats per language → when choosing the next merge, we pick it from the stats of the language with the worst compression rate, rather than from global stats. Everything else stays the same!
1
0
8
@clara__meister
Clara Isabel Meister
2 months
🚨New Preprint! In multilingual models, the same meaning can take far more tokens in some languages, penalizing users of underrepresented languages with worse performance and higher API costs. Our Parity-aware BPE algorithm is a step toward addressing this issue: 🧵
5
30
283
@tpimentelms
Tiago Pimentel
4 months
A string may get 17 times less probability if tokenised as two symbols (e.g., ⟨he, llo⟩) than as one (e.g., ⟨hello⟩)—by an LM trained from scratch in each situation! Our #acl2025nlp paper proposes an observational method to estimate this causal effect! Longer thread soon!
3
24
136
@tpimentelms
Tiago Pimentel
1 year
Do you want to quantify your model’s counterfactual memorisation using only observational data? Our #ACL2024NLP paper proposes an efficient method to do it :) No interventions required! You can also see how memorisation evolves across training! Check out Pietro's🧵for details :)
@pietro_lesci
Pietro Lesci
1 year
Happy to share our #ACL2024 paper: "Causal Estimation of Memorisation Profiles" 🎉 Drawing from econometrics, we propose a principled and efficient method to estimate memorisation using only observational data! See 🧵 +@clara__meister, Thomas Hofmann, @vlachos_nlp, @tpimentelms
0
3
35
@pietro_lesci
Pietro Lesci
1 year
Super excited and grateful that our paper received the best paper award at #ACL2024 🎉 Huge thanks to my fantastic co-authors — @clara__meister, Thomas Hofmann, @vlachos_nlp, and @tpimentelms — the reviewers that recommended our paper, and the award committee #ACL2024NLP
@pietro_lesci
Pietro Lesci
1 year
Happy to share our #ACL2024 paper: "Causal Estimation of Memorisation Profiles" 🎉 Drawing from econometrics, we propose a principled and efficient method to estimate memorisation using only observational data! See 🧵 +@clara__meister, Thomas Hofmann, @vlachos_nlp, @tpimentelms
7
7
76