
Haoyi Qiu
@HaoyiQiu
Followers
972
Following
1K
Media
33
Statuses
182
Research intern @SFResearch ☁️ PhD student @UCLANLP 🧸 BS in CS&Math @UMich 〽️ #NLP #Multimodal #Safety 🌷
Los Angeles, CA
Joined October 2018
🌏How culturally safe are large vision-language models? 👉LVLMs often miss the mark. We introduce CROSS, a benchmark of 1,284 image-query pairs across 16 countries & 14 languages, revealing how LVLMs violate cultural norms in context. ⚖️ Evaluation via CROSS-EVAL.🧨 Safety
5
21
65
RT @SFResearch: 🌟 Happy National Intern Day!. Today we celebrate the brilliant minds and diverse perspectives that our interns bring to @SF….
0
11
0
RT @qiancheng1231: 🤝 Can LLM agents really understand us?. We introduce UserBench: a user-centric gym environment for benchmarking how well….
0
34
0
RT @Yihe__Deng: 🙌 We've released the full version of our paper, OpenVLThinker: Complex Vision-Language Reasoning via Iterative SFT-RL Cycle….
0
41
0
RT @alexfabbri4: Excited to share MultiNRC, a new SEAL Leaderboard at Scale AI! MultiNRC is a challenging multilingual reasoning benchmark….
huggingface.co
0
7
0
RT @ManlingLi_: Can VLMs build Spatial Mental Models like humans?. Reasoning from limited views?.Reasoning from partial observations?.Reaso….
0
58
0
RT @QiyueGao123: 🤔 Have @OpenAI o3, Gemini 2.5, Claude 3.7 formed an internal world model to understand the physical world, or just align p….
0
44
0
RT @ziqiao_ma: Can we scale 4D pretraining to learn general space-time representations that reconstruct an object from a few views at any t….
0
41
0
RT @victor__li__: Glad to be part of the team!. It's been a great pleasure working with so many talented people at Tesla (both in and out o….
0
3
0
RT @omarsar0: @karpathy Great share as usual! Just read this related piece where a study showed issues with LLM-based agents not recognizin….
arxiv.org
While AI agents hold transformative potential in business, effective performance benchmarking is hindered by the scarcity of public, realistic business data on widely used platforms. Existing...
0
5
0
RT @tparekh97: 🚨 New work: LLMs still struggle at Event Detection due to poor long-context reasoning and inability to follow task constrain….
0
19
0
RT @yikewang_: LLMs are helpful for scientific research — but will they continuously be helpful?. Introducing 🔍ScienceMeter: current knowle….
0
55
0
RT @steeve__huang: 🚨 The Business AI Plot Thickens 🚨. CRMArena set the stage for business AI evaluation in realistic environments. Now we'r….
0
10
0
RT @StellaLisy: 🤯 We cracked RLVR with. Random Rewards?!.Training Qwen2.5-Math-7B with our Spurious Rewards improved MATH-500 by:.- Rando….
0
348
0
RT @YungSungChuang: 🚨Do passage rerankers really need explicit reasoning?🤔—Maybe Not!. Our findings:.⚖️Standard rerankers outperform those….
0
18
0
RT @steeve__huang: Cultural safety in AI isn't just nice-to-have, it's essential ✅. Our new paper reveals that leading VLMs struggle with c….
0
1
0
Grateful for the incredible team at UCLA Plus Lab, Salesforce AI Research, and Google DeepMind — @VioletNPeng, Ruichen Zheng, @steeve__huang, and @sunjiao123sun_! 🥳.
0
1
4