lintool Profile Banner
Jimmy Lin Profile
Jimmy Lin

@lintool

Followers
15K
Following
0
Media
376
Statuses
4K

I profess CS-ly at @UWaterloo about NLP/IR/LLM-ish things. I science at @yupp_ai and @Primal. Previously, I monkeyed code for @Twitter and slides for @Cloudera.

Nearby data lake
Joined February 2010
Don't wanna be here? Send us removal request.
@ManveerTamber
Manveer Singh Tamber
5 days
Our paper with @vectara, “Benchmarking LLM Faithfulness in RAG with Evolving Leaderboards”, is now published in the EMNLP 2025 Industry Track! Check out our work on enabling more reliable LLM faithfulness benchmarking in RAG!
@ManveerTamber
Manveer Singh Tamber
6 months
Introducing 🔍 FaithJudge: improving how we evaluate LLM faithfulness in RAG tasks, including summarization, QA, and data-to-text and powering a more accurate LLM Hallucination Leaderboard. 🔗
1
3
6
@lintool
Jimmy Lin
20 days
Want to play with these models yourself? You can use both variants of Claude Haiku 4.5, as well as all the latest models from OpenAI, Anthropic, Google, and xAI for free on @yupp_ai at
Tweet card summary image
yupp.ai
Every AI for everyone
0
0
2
@lintool
Jimmy Lin
20 days
However, as a hybrid reasoning model, Claude Haiku 4.5 has an “extended thinking mode”, where the model will spend more time considering its response before it answers. Our leaderboard places this variant ahead of Sonnet 4 - illustrating the importance of reasoning.
1
0
4
@Mr202016
Mr.2020!
1 month
Hey @elonmusk, imagine if President Trump passed into law the "Great Displacement Project" which could mean selling all government assets that have nothing to do with protection of the people, to the private sector, and then paying back those trillions of dollars to we the
0
1
3
@lintool
Jimmy Lin
20 days
A few days later, @yupp_ai now has 10K+ votes on @claudeai Haiku 4.5 - and our original observation holds: Sonnet 4 is still preferred over Haiku 4.5 across a diverse range of use cases, reaffirming the importance of real-world user evaluations. https://t.co/BhE09WeKHD
@lintool
Jimmy Lin
24 days
Congrats to @claudeai for Haiku 4.5, an excellent small model! Just a day later, we’ve gathered 3.4K+ user votes on @yupp_ai. Although static benchmarks show that it matches or even exceeds Claude Sonnet 4, a different story emerges based on organic user feedback… 🧵
8
7
50
@lintool
Jimmy Lin
24 days
4/4 Try out Claude Haiku 4.5 yourself on @yupp_ai at https://t.co/qokNkYOvr5 - where you can play with all the latest models from OpenAI, Anthropic, Google, and xAI for free!
Tweet card summary image
yupp.ai
Every AI for everyone
0
1
5
@lintool
Jimmy Lin
24 days
3/4 We’ve been making this point since our launch: static benchmarks are only one aspect of robust and trustworthy evaluation. Getting feedback from diverse users on real-world prompts remains critical to evaluating model performance. https://t.co/QKaoYIQXkq
Tweet card summary image
blog.yupp.ai
Yupp - Every AI for everyone.
4
3
21
@lintool
Jimmy Lin
24 days
2/4 Yupp gathers preferences from users all over the world in everyday use cases, and it seems clear that Haiku 4.5 is definitely a step above Haiku 3.5. However, Sonnet 4 is still preferred over Haiku 4.5 across a diversity of scenarios.
1
0
6
@lintool
Jimmy Lin
24 days
Congrats to @claudeai for Haiku 4.5, an excellent small model! Just a day later, we’ve gathered 3.4K+ user votes on @yupp_ai. Although static benchmarks show that it matches or even exceeds Claude Sonnet 4, a different story emerges based on organic user feedback… 🧵
8
13
120
@lilyjge
Lily Ge
28 days
1/5 🎉 Thrilled to share that our paper “QuackIR: Retrieval in DuckDB and Other Relational Database Management Systems” has been accepted to EMNLP 2025 Industry Track! 📄Paper: https://t.co/VdG7kAYp5D 💻 Code:
Tweet card summary image
github.com
QuackIR is an IR toolkit built on DuckDB. Contribute to castorini/quackir development by creating an account on GitHub.
1
4
10
@pankaj
Pankaj Gupta
1 month
1/7 Today, we are announcing first-class support for rendering SVG prompts in @yupp_ai. Try a prompt like “SVG of a rabbit” and quickly vibe-check how well all the latest AI models generate SVG code!
15
13
58
@nour_jedidi
Nour Jedidi
1 month
You don't need larger LLMs to generate synthetic data for training your domain specific retrievers! 🤯 We introduce Promptodile 🐊, an open-source alternative to Promptagator that focuses on accessible domain-adaptation for training IR models 🪶💻 📰 https://t.co/Jg06XNe3Nq
1
1
7
@lintool
Jimmy Lin
1 month
19/19 And if you haven’t tried @yupp_ai yet, I hope you check us out at https://t.co/qokNkYOvr5 – where you can play with all the latest models, including Claude Sonnet 4.5, GPT-5, Gemini 2.5 Pro, and Grok 4 Fast for free!
Tweet card summary image
yupp.ai
Every AI for everyone
0
0
3
@lintool
Jimmy Lin
1 month
18/19 Yupp imagines a future where you, the user, stand at the center of a community of powerful AIs and other users, all custom tailored around your specific needs. “Help Me Choose” is merely our first try at realizing this vision. We’d love to hear what you think!
1
0
2
@lintool
Jimmy Lin
1 month
17/19 “Help Me Choose” manifests another idea we’ve been playing with since last year: Instead of only one-way communication between you and each AI, why don’t we pull together a multi-party dialogue? You talk to the AIs and the AIs talk to each other! https://t.co/CfpU5E3fnl
Tweet card summary image
arxiv.org
When you have a question, the most effective way to have the question answered is to directly connect with experts on the topic and have a conversation with them. Prior to the invention of...
1
0
3
@lintool
Jimmy Lin
1 month
16/19 These are exciting research developments, but we have been grappling with a different question for months: What are the benefits for everyday users? This initial release represents our current thinking, staying true to our brand and design. https://t.co/AP9bD1th9i
Tweet card summary image
blog.yupp.ai
Yupp - Every AI for everyone.
1
0
2
@lintool
Jimmy Lin
1 month
15/19 “Help Me Choose” represents our playful take on “LLMs as a judge”. Such techniques can be quite sophisticated, perhaps hooking into verifiable reward signals that feed reinforcement learning (RL) algorithms.
1
0
2
@lintool
Jimmy Lin
1 month
14/19 It’s important to note that HMC does not suggest which response is better. It merely highlights the similarities and differences between the AI responses. HMC tells it like it is – at the end of the day, it’s about your preferences as a user, your personal “taste”.
1
0
3
@lintool
Jimmy Lin
1 month
13/19 And you have a “council elder” – in this case, Yupp’s own customized AI model, which offers helpful insights about the different models’ responses. It’s like a TL;DR: short, clear, and with a bit of sass!
1
0
2
@lintool
Jimmy Lin
1 month
12/19 Just like with a group of well-informed human experts, the AI council members are given an opportunity to critique each other, and themselves, in light of their initial responses. Indeed, they both poke at each other’s arguments and refine their own responses.
1
0
2
@lintool
Jimmy Lin
1 month
11/19 You’ll see additional feedback: ➡️ Yupp invokes “review by a 3rd AI” to adjudicate. ➡️ Yupp provides “model cross-check”, where the same two AIs are invoked to critique both their own responses and the other’s.
1
0
2