Debangan Mishra
@DebanganM10375
Followers
35
Following
14
Media
4
Statuses
12
Student @iiit_hyderabad, interested in artificial intelligence, AI safety and computer vision.
Hyderabad
Joined April 2024
Our work might be of interest to: @linguist_cat @nabla_theta @singhshiviii @zhaofeng_wu @monojitchou @aidangomez @cohere @nickfrosst @1vnzh @LChoshen
1
0
0
8/8 Work With: @rastogiarihant1 @NegiAgyeya @ShashwatGoel7 @ponguru preprint π: https://t.co/kDBTbiTSkP You can find more details about the functional similarity metric - CAPA along with proofs of its robustness in this paper:
arxiv.org
As Language Model (LM) capabilities advance, evaluating and supervising them at scale is getting harder for humans. There is hope that other language models can automate both these tasks, which we...
1
0
0
7/8 π There are several exciting observations and insights which can be made by adopting a functional similarity oriented approach towards multilinguality, as is shown by our exploratory analysis. It is the time to take a step beyond accuracy on multilingual benchmarks!
1
0
0
6/8 π¬ Lastly, we also find that for GlobalMMLU, models tend to be more consistent among themselves for different languages than they are across models for different languages. This can have exciting downstream consequences in multilingual multi-agent systems!
1
0
0
5/8 π We explore the hypotheses - do models become more consistent for languages with more data? It turns out - they do! Using the number of Wikipedia articles per language as a proxy, we find that models are more consistent on average for high resource languages.
1
0
0
4/8 π€ LLMs are more consistent across subjects which have a less cultural prior such as mathematics, but more inconsistent for the humanities and social sciences. Consistency of models across languages is proportional to their performance and number of parameters!
1
0
0
3/8 π We need something more β functional similarity! We use it as a new lens to study multilinguality in LLMs and uncover insights on GlobalMMLU, a large parallel multilingual corpus testing models on STEM, law, etc. We evaluate on top models like Qwen3 and Gemma3.
1
0
0
2/8 π€ What if a model has the same accuracy on Hindi & English, yet changing the language makes it do math differently? Questions it once answered correctly in Korean become wrong in Telugu? All while accuracy stays the same!
1
0
0
1/8 πOur paper "What if I ask alia lingua? Measuring Functional Similarity Across Languages" has been accepted in the 5th MRL Workshop at #EMNLP2025. Accuracy isnβt enough β multilingual LLMs may score equally yet act very differently across languages! π§΅π
2
0
9
How do we remove the effect of incorrect training data on a trained model? Is retraining from scratch the best we can do? We show no! Cognac π· is 8x faster than retraining from scratch, and works even when you discover as little as 5% of such data. How? π§΅π
3
7
46