evijit Profile Banner
Avijit Ghosh Profile
Avijit Ghosh

@evijit

Followers
3K
Following
27K
Media
585
Statuses
6K

Technical AI Policy Researcher @huggingface 🤗 . Current focus: Responsible AI, AI for Science, and @evaluatingevals!

Boston, Massachusetts
Joined January 2012
Don't wanna be here? Send us removal request.
@evijit
Avijit Ghosh
1 month
Today, @evaluatingevals is introducing Every Eval Ever, a unified, open data format and public dataset for AI evaluation results.
4
19
54
@evijit
Avijit Ghosh
2 days
What????? 🤦‍♂️
@adrijabose
Adrija Bose
3 days
If you know me, you’ve heard me talk about this story for months. @HeraRizwan reported from 3 states. We obsessed over every detail. Google's AI, designed for phones, is now rationing food to pregnant women. Read. Get angry. Share https://t.co/1heWRv9Ghj @pulitzercenter
0
0
0
@evijit
Avijit Ghosh
2 days
So happy to see Every Eval Ever (@evaluatingevals) take off! This is a big vote of confidence, and we really hope that we, as a community of eval practitioners, can move towards open standards that unlock scientific rigor and reproducibility. Thanks @mercor_ai !
@mercor_ai
Mercor
3 days
We just submitted APEX-Agents, APEX-1 and ACE to @evaluatingevals on @huggingface, an OSS initiative to standardize evals and try to reduce the noise in benchmarking.
1
6
18
@pfau
David Pfau
4 days
The degree to which AI research at the big labs has almost entirely been reduced to hill climbing is actually an aberration and not reflective of the rest of science at all. Ironically this means AI research is probably the easiest branch of research to automate.
@cgeorgiaw
Georgia Channing
4 days
I’ve been at a small conference this week, one where the AI people have been presenting early in the week and the domain science people will be presenting later in the week. At the end of the talks last night, the conversation turned very doomer with all the AI people talking
9
15
234
@LChoshen
Leshem (Legend) Choshen 🤖🤗 @NeurIPS
4 days
Evaluation research? There's no place like evalEval
@evaluatingevals
EvalEval Coalition
4 days
3 days left! 📷 Writing, wrote, or just submitted a paper? Commit it to the EvalEval workshop at ACL 2026 in San Diego! https://t.co/JRSr50UA8y (including ARR Submissions, non-archival, positions, and extended abstracts!) Submission Deadline: March 19th, 2026 AoE
2
2
14
@evijit
Avijit Ghosh
4 days
This is a good time to mention that the latest versions of both Claude and ChatGPT detect the hidden phrases and warn you of prompt injection, so I’m curious as to how this happened anyway/which LLMs were still susceptible
@yuxiangw_cs
Yu-Xiang Wang
4 days
AI watermarking in action at #ICML's avant garde peer-review experiments this year! Quite a few casualties in my SAC batch (an example below --- appropriately redacted hopefully)
0
2
0
@evijit
Avijit Ghosh
4 days
We at @huggingface are fortunate to have a unique vantage point on the state of open source AI development. We finally wrote down our observations, from both our own research and that of our peers who have done excellent work investigating the open ecosystem with Hugging Face hub
2
5
8
@evijit
Avijit Ghosh
4 days
0
0
3
@evijit
Avijit Ghosh
4 days
Always a hoot reading Georgia’s takes! Case in point: While Rosie the dog’s cancer treating MRNA vaccine made with LLMs+Alphafold went viral, several domain scientists on here have pointed out both novelty issues and the structural problems with generalizing this to large scale
@cgeorgiaw
Georgia Channing
4 days
I’ve been at a small conference this week, one where the AI people have been presenting early in the week and the domain science people will be presenting later in the week. At the end of the talks last night, the conversation turned very doomer with all the AI people talking
2
0
9
@evijit
Avijit Ghosh
6 days
Imagine if my little AGI robot knew how to put my “I have worn it once but it’s not yet dirty enough to launder” clothes on this purgatory chair 😍
@1nefortunate
Millennial Marketer 📣
7 days
@simonegiertz made a chair you can dedicate your laundry, and I love the design. she saw a common problem then brought a solution. what do you think?
1
1
1
@PatrickHeizer
Patrick Heizer
6 days
Long post, because apparently many neither understand nor appreciate the intricacies of cancer research and think that pharmaceutical companies and regulators are holding back cures. Your immune system is constantly surveilling your body for both self and non-self recognition.
@eddylazzarin
Eddy Lazzarin 🟠🔭
7 days
“You guys are overhyping this” “Yes we can cure cancer and do regularly this way” “Yes the primary obstacles are regulatory/liability” uh
98
123
964
@PatrickHeizer
Patrick Heizer
7 days
Sorry to be the downer because this is an impressive story in some senses. But it is ~trivially easy to make a single mRNA vaccine. It's not hard. I cure mice of various cancers with various therapeutics all the time. I've made mice lose more weight in a month than tirzepatide
@sebkrier
Séb Krier
8 days
943
420
6K
@Dorialexander
Alexander Doria
7 days
Unusual open data move by a major AI labs: StepFun releases the general SFT training set of Stepfun-Flash.
@test_tm7873
testtm
7 days
Eyy @StepFun_ai released the dataset https://t.co/V8KxKh4EyY :)
6
17
171
@sebkrier
Séb Krier
8 days
225
2K
13K
@huggingface
Hugging Face
8 days
Seeing the worldwide demand we are kicking off global applications for Hugging Face Builders! If you're passionate about open AI and love bringing people together, this is your invitation to lead ✉️ Learn more about the program and apply to become a Builder ➡️
12
28
245
@evijit
Avijit Ghosh
9 days
👀
@micLivs
Michael Livs
9 days
Anthropic shipped generative UI for Claude. I reverse-engineered how it works and rebuilt it for PI. Extracted the full design system from a conversation export. Live streaming HTML into native macOS windows via morphdom DOM diffing. Article: https://t.co/C3FLF3JB8Z Repo:
0
0
0
@evijit
Avijit Ghosh
9 days
Next step: Open sourcing this UX stack 😈 who’s building a nice wrapper that does responsive UX where we can swap out the models in the back end?
2
0
1
@evijit
Avijit Ghosh
9 days
A welcome development! (I’ve been on an anti chatbot rant lately)
@feldman
Adam Feldman
9 days
Starting today, Claude no longer defaults to text. Claude is learning to choose the best medium for each response — based on the task, the data, and what's most useful for the person. Give it a try!
2
2
2
@Fintech03
Parimal
12 days
A massive moment for Sovereign AI in India. Keep an eye out for GGUF quants that will soon allow this to run on 64-128GB Macs. If you are building a tool for the Indian market, this is your base model. It handles Hinglish & 22 official languages with a fertility rate (token
@SarvamForDevs
Sarvam for Developers
12 days
Sarvam-105B is trending on Hugging Face 🚀
8
68
409
@JesseDodge
Jesse Dodge
12 days
This looks cool! It would be great if we had a unified way to report eval results. This is basically a reproducibility problem. Two keys to making this work: reporting info about the models (e.g., num params, tokens trained), and eval settings (e.g., num shots). 1/3
@evaluatingevals
EvalEval Coalition
12 days
🧪 Your LLM evaluation results could help the whole field 🚀 🧑‍🔬 Our ACL Shared task is out! We’re building a unified, crowdsourced database to create a common language for AI evaluation reporting. And we need your data. (1/2) https://t.co/SQhEVsqEWg
2
4
18