lintool Profile Banner
Jimmy Lin Profile
Jimmy Lin

@lintool

Followers
14K
Following
0
Media
362
Statuses
4K

I profess CS-ly at @UWaterloo about NLP/IR/LLM-ish things. I science at @yupp_ai and @Primal. Previously, I monkeyed code for @Twitter and slides for @Cloudera.

Nearby data lake
Joined February 2010
Don't wanna be here? Send us removal request.
@lintool
Jimmy Lin
28 days
In December 2024 @pankaj @gilad @willhorn and I put out a rather cryptic arXiv paper musing about the future of search: I’m now able to share what I’ve been up to! 🧵(1/9).
9
29
166
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 42/42 Let’s work together to empower humanity to shape the future of AI! If you haven’t tried @yupp_ai yet, I hope you give it a try!.
@yupp_ai
Yupp
28 days
Introducing Yupp: a fun and easy way to discover, compare, and use the latest AIs – while helping to shape the future of the field. Sign up at
0
0
5
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 41/42 As a reminder, this tweet thread is also available in blog form at
1
0
4
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 40/42 We’re just getting started, and wish to engage the AI community in collaborations that produce orders of magnitude more high-quality data – to tackle the challenge of robust and trustworthy AI evaluation! If you are interested, please reach out at research@yupp.ai.
1
0
3
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 39/42 We’re also preparing a data release to share a subset of our public prompts in the next few months, but even today you can access samples from our model pages. If you are interested in this, please send an email to research@yupp.ai
Tweet media one
1
0
5
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 38/42 There are many more interesting questions we seek to explore. 2 more examples:. ▶️What is the right way to share data while respecting the privacy of users?.▶️How do we demonstrate adherence to stated principles in a provable manner, perhaps using cryptographic primitives?.
1
0
5
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 37/42 To that end, we will soon ship a novel feature: permissionless model evals, allowing anyone (students, AI hobbyists, etc.) to submit an AI to Yupp. We’ll orchestrate comparative evaluations and then give you feedback on how your AI stacked up against the others.
1
2
7
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 36/42 As a concrete example: We have been thinking about how we can provide equitable access to all AI developers, from those at frontier labs pushing the state of the art to resource-limited graduate students who are also training and fine-tuning models.
1
0
5
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 35/42 Beyond building a compelling consumer product, we're asking hard questions the AI community has raised: How does one truly build robust and trustworthy evaluation? @yupp_ai seeks to be provably fair, radically transparent, permissionless, and accessible to all.
1
0
5
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 34/42 ❓How do we better model user demographics and incentives? We are running experiments to explore many product features using professional raters with validated user profiles. With multiple layers of quality testing, these raters give us a reference for calibration.
1
0
5
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 33/42 ❓And on the flip side, how do we automatically detect “bad” users who provide low-quality data? We’re leveraging our experience in tackling spam and bots at Twitter and have developed sophisticated algorithms to ensure data quality.
1
0
5
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 32/42 ❓How do we model “good” users who provide high-quality data and encourage them to contribute more? These users are the key to robust and trustworthy AI evaluation!.
1
0
5
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 31/42 ❓Model responses often give away identities anyway, so is blinding model names necessary or even effective at scale? We suspect it might be in some cases but not all, and we’re running A/B tests to find out. We’ll share the results of these experiments down the road!.
1
0
6
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 30/42 We are working on controlling for these potential confounding factors (speed, formatting, length, response position, device, browser, etc.) using various statistical methods. We’ll share additional details soon on our methods.
1
0
5
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 29/42 ❓ How do we appropriately factor in different confounds in our rankings? Unsurprisingly, users love fast and reliable models – which we can quantify in terms of time-to-first-token, throughput, error-rate, etc.
1
0
5
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 28/42 To start, we’re building on our collective experiences with ML production deployments at Twitter, Google, Coinbase, and beyond. We’re also committed to a scientifically rigorous approach, and are exploring a number of research topics:.
1
0
5
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 27/42 How do we ensure that @yupp_ai delivers robust and trustworthy evaluations?.
1
0
5
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 26/42 For example, we find that users in the US have different preferences compared to the worldwide user population in general.
Tweet media one
1
0
5
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 25/42 Bringing these product features together, we get rich preference data and user profile attributes to segment this data in a fine-grained manner. Coupled with sophisticated analytics on user prompts, Yupp lets us slice and dice evaluation data in ways not possible before!
Tweet media one
Tweet media two
1
0
5
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 24/42 We provide small tokens of appreciation to users for providing high quality feedback, and do so without distorting incentives. We are doing more research on incentive mechanism design and will be sharing our results in due course.
@pankaj
Pankaj Gupta
18 days
1/ Yes, Cash out is indeed a “wild” feature of @yupp_ai – one that we’ve spent a lot of time and effort to bring to the product. Why did we do it and where are we going with it?. /cc @scaling01 .
1
0
5
@lintool
Jimmy Lin
2 days
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 23/42 4️⃣ Community-Aligned Incentive Mechanisms. Yupp provides free access to the latest AIs, but usage is metered via Yupp credits, which users get by providing feedback, creating a virtuous cycle that drives further usage and nudges the whole system towards higher quality data.
Tweet media one
1
0
5