
Jimmy Lin
@lintool
Followers
14K
Following
0
Media
362
Statuses
4K
I profess CS-ly at @UWaterloo about NLP/IR/LLM-ish things. I science at @yupp_ai and @Primal. Previously, I monkeyed code for @Twitter and slides for @Cloudera.
Nearby data lake
Joined February 2010
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 42/42 Let’s work together to empower humanity to shape the future of AI! If you haven’t tried @yupp_ai yet, I hope you give it a try!.
Introducing Yupp: a fun and easy way to discover, compare, and use the latest AIs – while helping to shape the future of the field. Sign up at
0
0
5
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 41/42 As a reminder, this tweet thread is also available in blog form at
1
0
4
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 40/42 We’re just getting started, and wish to engage the AI community in collaborations that produce orders of magnitude more high-quality data – to tackle the challenge of robust and trustworthy AI evaluation! If you are interested, please reach out at research@yupp.ai.
1
0
3
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 39/42 We’re also preparing a data release to share a subset of our public prompts in the next few months, but even today you can access samples from our model pages. If you are interested in this, please send an email to research@yupp.ai
1
0
5
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 38/42 There are many more interesting questions we seek to explore. 2 more examples:. ▶️What is the right way to share data while respecting the privacy of users?.▶️How do we demonstrate adherence to stated principles in a provable manner, perhaps using cryptographic primitives?.
1
0
5
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 37/42 To that end, we will soon ship a novel feature: permissionless model evals, allowing anyone (students, AI hobbyists, etc.) to submit an AI to Yupp. We’ll orchestrate comparative evaluations and then give you feedback on how your AI stacked up against the others.
1
2
7
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 36/42 As a concrete example: We have been thinking about how we can provide equitable access to all AI developers, from those at frontier labs pushing the state of the art to resource-limited graduate students who are also training and fine-tuning models.
1
0
5
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 35/42 Beyond building a compelling consumer product, we're asking hard questions the AI community has raised: How does one truly build robust and trustworthy evaluation? @yupp_ai seeks to be provably fair, radically transparent, permissionless, and accessible to all.
1
0
5
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 34/42 ❓How do we better model user demographics and incentives? We are running experiments to explore many product features using professional raters with validated user profiles. With multiple layers of quality testing, these raters give us a reference for calibration.
1
0
5
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 33/42 ❓And on the flip side, how do we automatically detect “bad” users who provide low-quality data? We’re leveraging our experience in tackling spam and bots at Twitter and have developed sophisticated algorithms to ensure data quality.
1
0
5
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 32/42 ❓How do we model “good” users who provide high-quality data and encourage them to contribute more? These users are the key to robust and trustworthy AI evaluation!.
1
0
5
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 31/42 ❓Model responses often give away identities anyway, so is blinding model names necessary or even effective at scale? We suspect it might be in some cases but not all, and we’re running A/B tests to find out. We’ll share the results of these experiments down the road!.
1
0
6
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 30/42 We are working on controlling for these potential confounding factors (speed, formatting, length, response position, device, browser, etc.) using various statistical methods. We’ll share additional details soon on our methods.
1
0
5
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 29/42 ❓ How do we appropriately factor in different confounds in our rankings? Unsurprisingly, users love fast and reliable models – which we can quantify in terms of time-to-first-token, throughput, error-rate, etc.
1
0
5
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 28/42 To start, we’re building on our collective experiences with ML production deployments at Twitter, Google, Coinbase, and beyond. We’re also committed to a scientifically rigorous approach, and are exploring a number of research topics:.
1
0
5
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 27/42 How do we ensure that @yupp_ai delivers robust and trustworthy evaluations?.
1
0
5
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 26/42 For example, we find that users in the US have different preferences compared to the worldwide user population in general.
1
0
5
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 25/42 Bringing these product features together, we get rich preference data and user profile attributes to segment this data in a fine-grained manner. Coupled with sophisticated analytics on user prompts, Yupp lets us slice and dice evaluation data in ways not possible before!
1
0
5
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 24/42 We provide small tokens of appreciation to users for providing high quality feedback, and do so without distorting incentives. We are doing more research on incentive mechanism design and will be sharing our results in due course.
1/ Yes, Cash out is indeed a “wild” feature of @yupp_ai – one that we’ve spent a lot of time and effort to bring to the product. Why did we do it and where are we going with it?. /cc @scaling01 .
1
0
5
@UWCheritonCS @UWaterloo @pankaj @gilad @yupp_ai 23/42 4️⃣ Community-Aligned Incentive Mechanisms. Yupp provides free access to the latest AIs, but usage is metered via Yupp credits, which users get by providing feedback, creating a virtuous cycle that drives further usage and nudges the whole system towards higher quality data.
1
0
5