Thinh Profile
Thinh

@thinhphp_vt

Followers
71
Following
66
Media
4
Statuses
18

PhD student @VT_CS, supervised by @tuvllms. Interested in search-augmented LLMs. Ex AI resident @VinAI_Research

Blacksburg, VA
Joined July 2023
Don't wanna be here? Send us removal request.
@thinhphp_vt
Thinh
5 days
DeepSeek achieved a strong result on SEAL0, a challenging benchmark for reasoning with conflicting search results. 🎊.
@deepseek_ai
DeepSeek
6 days
Tools & Agents Upgrades 🧰. 📈 Better results on SWE / Terminal-Bench.🔍 Stronger multi-step reasoning for complex search tasks.⚡️ Big gains in thinking efficiency. 3/5
Tweet media one
Tweet media two
Tweet media three
0
1
5
@thinhphp_vt
Thinh
6 days
RT @tuvllms: Excited to share that our paper on efficient model development has been accepted to #EMNLP2025 Main conference @emnlpmeeting.….
0
9
0
@thinhphp_vt
Thinh
15 days
RT @SherylHsu02: 1/n I’m thrilled to share that our @OpenAI reasoning system scored high enough to achieve gold 🥇🥇 in one of the world’s to….
0
284
0
@thinhphp_vt
Thinh
22 days
RT @ii_posts: Most search models need the cloud. II-Search-4B doesn’t. 4B model tuned for reasoning with search tools, built for local us….
0
117
0
@thinhphp_vt
Thinh
22 days
🥳Congrats @ii_posts for an impressive result on SEAL-0, a challenging benchmark for search-augmented LLMs. 🤩Looking forward to the evaluation standards it shapes in this field. 📚Read more:
Tweet card summary image
arxiv.org
We introduce SealQA, a new challenge benchmark for evaluating SEarch-Augmented Language models on fact-seeking questions where web search yields conflicting, noisy, or unhelpful results. SealQA...
@ii_posts
Intelligent Internet
22 days
Overall Results:
Tweet media one
0
0
6
@thinhphp_vt
Thinh
1 month
RT @PeterDiamandis: . @EMostaque came back on the show to chat about: . --how we can't compete against AI agents. --his solution for a POSI….
0
74
0
@thinhphp_vt
Thinh
1 month
RT @j_dekoninck: We just released the evaluation of LLMs on the 2025 IMO on MathArena! Gemini scores best, but is still unlikely to achieve….
0
40
0
@thinhphp_vt
Thinh
1 month
We just evaluated Grok 4 on our SEAL-0 dataset.👍Try it:
Tweet media one
0
2
14
@thinhphp_vt
Thinh
2 months
RT @sukjun_hwang: Tokenization has been the final barrier to truly end-to-end language models. We developed the H-Net: a hierarchical netw….
0
741
0
@thinhphp_vt
Thinh
2 months
🔥 SEAL-0 Leaderboard 📈. Our results on SEAL-0 show a large room for improvement in LLMs' ability to reason over conflicting evidence. 🤯. 👉Checkout our paper: 👉Dataset:
Tweet media one
0
5
13
@thinhphp_vt
Thinh
3 months
My first work done during my PhD 🥳🥳🥳.
@tuvllms
Tu Vu
3 months
✨ New paper ✨.🚨 Scaling test-time compute can lead to inverse or flattened scaling!!. We introduce SealQA, a new challenge benchmark w/ questions that trigger conflicting, ambiguous, or unhelpful web search results. Key takeaways:. ➡️ Frontier LLMs struggle on Seal-0 (SealQA’s
Tweet media one
Tweet media two
3
1
21
@thinhphp_vt
Thinh
3 months
RT @tuvllms: ✨ New paper ✨.🚨 Scaling test-time compute can lead to inverse or flattened scaling!!. We introduce SealQA, a new challenge ben….
0
38
0