Avijit Ghosh
@evijit
Followers
3K
Following
27K
Media
585
Statuses
6K
Technical AI Policy Researcher @huggingface 🤗 . Current focus: Responsible AI, AI for Science, and @evaluatingevals!
Boston, Massachusetts
Joined January 2012
Today, @evaluatingevals is introducing Every Eval Ever, a unified, open data format and public dataset for AI evaluation results.
4
19
54
What????? 🤦♂️
If you know me, you’ve heard me talk about this story for months. @HeraRizwan reported from 3 states. We obsessed over every detail. Google's AI, designed for phones, is now rationing food to pregnant women. Read. Get angry. Share https://t.co/1heWRv9Ghj
@pulitzercenter
0
0
0
So happy to see Every Eval Ever (@evaluatingevals) take off! This is a big vote of confidence, and we really hope that we, as a community of eval practitioners, can move towards open standards that unlock scientific rigor and reproducibility. Thanks @mercor_ai !
We just submitted APEX-Agents, APEX-1 and ACE to @evaluatingevals on @huggingface, an OSS initiative to standardize evals and try to reduce the noise in benchmarking.
1
6
18
The degree to which AI research at the big labs has almost entirely been reduced to hill climbing is actually an aberration and not reflective of the rest of science at all. Ironically this means AI research is probably the easiest branch of research to automate.
I’ve been at a small conference this week, one where the AI people have been presenting early in the week and the domain science people will be presenting later in the week. At the end of the talks last night, the conversation turned very doomer with all the AI people talking
9
15
234
Evaluation research? There's no place like evalEval
3 days left! 📷 Writing, wrote, or just submitted a paper? Commit it to the EvalEval workshop at ACL 2026 in San Diego! https://t.co/JRSr50UA8y (including ARR Submissions, non-archival, positions, and extended abstracts!) Submission Deadline: March 19th, 2026 AoE
2
2
14
This is a good time to mention that the latest versions of both Claude and ChatGPT detect the hidden phrases and warn you of prompt injection, so I’m curious as to how this happened anyway/which LLMs were still susceptible
AI watermarking in action at #ICML's avant garde peer-review experiments this year! Quite a few casualties in my SAC batch (an example below --- appropriately redacted hopefully)
0
2
0
We at @huggingface are fortunate to have a unique vantage point on the state of open source AI development. We finally wrote down our observations, from both our own research and that of our peers who have done excellent work investigating the open ecosystem with Hugging Face hub
2
5
8
Always a hoot reading Georgia’s takes! Case in point: While Rosie the dog’s cancer treating MRNA vaccine made with LLMs+Alphafold went viral, several domain scientists on here have pointed out both novelty issues and the structural problems with generalizing this to large scale
I’ve been at a small conference this week, one where the AI people have been presenting early in the week and the domain science people will be presenting later in the week. At the end of the talks last night, the conversation turned very doomer with all the AI people talking
2
0
9
Imagine if my little AGI robot knew how to put my “I have worn it once but it’s not yet dirty enough to launder” clothes on this purgatory chair 😍
@simonegiertz made a chair you can dedicate your laundry, and I love the design. she saw a common problem then brought a solution. what do you think?
1
1
1
Long post, because apparently many neither understand nor appreciate the intricacies of cancer research and think that pharmaceutical companies and regulators are holding back cures. Your immune system is constantly surveilling your body for both self and non-self recognition.
“You guys are overhyping this” “Yes we can cure cancer and do regularly this way” “Yes the primary obstacles are regulatory/liability” uh
98
123
964
Sorry to be the downer because this is an impressive story in some senses. But it is ~trivially easy to make a single mRNA vaccine. It's not hard. I cure mice of various cancers with various therapeutics all the time. I've made mice lose more weight in a month than tirzepatide
943
420
6K
Unusual open data move by a major AI labs: StepFun releases the general SFT training set of Stepfun-Flash.
Eyy @StepFun_ai released the dataset https://t.co/V8KxKh4EyY :)
6
17
171
Seeing the worldwide demand we are kicking off global applications for Hugging Face Builders! If you're passionate about open AI and love bringing people together, this is your invitation to lead ✉️ Learn more about the program and apply to become a Builder ➡️
12
28
245
👀
Anthropic shipped generative UI for Claude. I reverse-engineered how it works and rebuilt it for PI. Extracted the full design system from a conversation export. Live streaming HTML into native macOS windows via morphdom DOM diffing. Article: https://t.co/C3FLF3JB8Z Repo:
0
0
0
Next step: Open sourcing this UX stack 😈 who’s building a nice wrapper that does responsive UX where we can swap out the models in the back end?
2
0
1
A massive moment for Sovereign AI in India. Keep an eye out for GGUF quants that will soon allow this to run on 64-128GB Macs. If you are building a tool for the Indian market, this is your base model. It handles Hinglish & 22 official languages with a fertility rate (token
8
68
409
This looks cool! It would be great if we had a unified way to report eval results. This is basically a reproducibility problem. Two keys to making this work: reporting info about the models (e.g., num params, tokens trained), and eval settings (e.g., num shots). 1/3
🧪 Your LLM evaluation results could help the whole field 🚀 🧑🔬 Our ACL Shared task is out! We’re building a unified, crowdsourced database to create a common language for AI evaluation reporting. And we need your data. (1/2) https://t.co/SQhEVsqEWg
2
4
18