
David Hershey
@DavidSHershey
Followers
2K
Following
4K
Media
28
Statuses
224
AI Generalist | Writer of https://t.co/l1jTizWyTv
Joined April 2017
So, I did a thing 🙂. This was really just a fun little side project - I wanted to spend some time working on agents, and Pokemon was the most fun way I could come up with. And then it kinda took off! 3.7 Sonnet is so fun to watch play!.
A few researchers at Anthropic have, over the past year, had a part-time obsession with a peculiar problem. Can Claude play Pokémon?. A thread:
29
22
552
I built a thing! a chatbot built to help founders learn about the fundamentals of building seed-stage tech companies. Powered by @qdrant_engine ♥️ Definitely the easiest vector database to get started with for LLM applications.
5
8
53
1/n) Never going to miss a chance to write about new devtools!. @doppenhe and I wrote an overview of the new wave of devtools empowering devs to build with large language models! Highlights in this 🧵.
1
10
35
I also want to shout out @computerender - their video on RL for Pokemon was the first inspiration for us to try hooking up Claude to see how it did!.
1
1
35
Tell me about today's news, in the style of a pirate 🏴☠️ Having fun with LLMs via @dust4ai .
2
2
23
I have seen Claude do a lot of adorable things, this has to be at the top.
My favorite Claude Plays Pokémon tidbit (mentioned in @latentspacepod) is that when @DavidSHershey told Claude to nickname its Pokémon, it instantly became much more protective of them, making sure to heal them when they got hurt.
1
0
19
@sarahcat21 @codexeditor Pretty bullish on "LLMs are a new type of database that can power new applications", but whew this is a bit hotter of a take than that 😅.
1
1
17
This was really fun!.
New lightning pod with @DavidSHershey on how Claude Plays Pokémon was made, w/ special co-host @vibhuuuus! We covered:. - Designing tools for AI agents playing games.- Managing memory for long running tasks.- Why naming Pokémons is important. Watch now 📺
1
0
15
LFG, come hack on Pokemon with me on Sunday.
THIS SUNDAY IN SF. COME PLAY POKEMON. WITH AI AGENTS. AND CLAUDE. WITH SPECIAL GUEST @DavidSHershey .AND VMS FROM @JESSEMHAN. 100 SEATS ONLY.
1
1
11
Leading founders weigh in on the intricacies of building with open source AI. @vipulved and @rxin talked w/ @weiliendang and @DavidSHershey about the future of open source AI and foundation models. Hear Reynold explain why @databricks chose to build Dolly w/ open source AI 👇
0
1
9
3) You have to start by designing good prompts and getting the right data to the LLM context window. @LangChainAI @dust4ai @gpt_index @CognosisAI are rocking the prompt building and chaining. @weaviate_io @qdrant_engine @milvusio @pinecone make vector DBs easy!
1
1
10
Such a privilege to get to work with the amazing team at @roboto_ai; they're building such a cool product. If you're in robotics, check out their sandbox!.
Amazing things are happening with AI + robotics. And @roboto_ai is at the center of it all. Learn about the insights and experience from Co-Founders @BBarash and Yves Albers-Schoenberg that led to this startup, and why we led their Seed.
0
4
9
4/ Shoutouts to some examples of folks building cool apps:.@vectara working on revolutionizing search.
2
4
9
@aniiyengar @imjaredz Finally got a version done last night!. S/o to @promptlayer (@imjaredz ) for making some of the complex chains of prompts more manageable 🍰!.
1
2
7
The most promising approach (IMO): custom reward models. Model user preference directly and use it to evaluate your app before you ship. Shoutout to @thesephist, one 15-minute convo with him put me down the rabbit hole on this topic. (4/n)
1
0
6
@codexeditor @sarahcat21 Another version: "What queries are better suited for LLMs than other databases?".
1
1
6
7/7) Thanks to @tristanzajonc @willpienaar and a few others for their input and help, and to lots of others who reviewed this!.
0
1
5
What an awesome view into why training LLMs requires so much high-quality talent. "This level of perfection is like eight billion people copy[ing] the complete works of Shakespeare for the 14 billion years the universe has existed and not have a single person make a mistake!".
If your loss curves look sus, join the club! Giant LLM training runs are full of pitfalls. We learned the hard way. We wrote a deep dive for the community on silent data corruptions (SDCs). Problem and mitigations here:
0
0
1
1/6🧵 Happy LLaMA 2 day! In honor of @MetaAI giving us the best reason yet to host your own model, here's a quick thread on how to choose an LLM 👇. Starting with the most popular option: use a hosted model!
1
0
5
4) As teams progress, lots of new challenges emerge to improve and maintain LLM features. @humanloop @honeyhiveai are leading the way helping teams manage the complexity of LLMs in prod
1
1
5
@qdrant_engine is an incredible team to work with, can't wait to see everything they will accomplish!.
So, we raised a $7.5M seed round, here is what our CEO have to say about it: And here is what we are going to do with it 🧵👇.
0
1
4
One primary problem: defining what we're even measuring! "Goodness" for LLMs is complicated and depends on your task. Follow @KevinAFischer if you want to understand why. (2/n)
1
1
4
7/n That's why @AdeptAILabs announced a huge fundraise today. LLM-backed agents are about to change our world. Strap in, things are going to get weird. 🚀🚀🚀.
2
0
4
@KevinAFischer @OpenAI The difficulty of being a research, consumer, and dev tool company at the same time really shows when it comes to the DevEx of their API products.
2
0
4
Well this would be pretty incredible 👀.
The era of sub-quadratic LLMs is about to begin. At @togethercompute we've been building next gen models with large space state architectures and training them on very long sequences and the results from the recent builds are. incredible. Will share more as we get closer to
0
0
3
5/ This demo from @chillzaza_ is a great example of using LLMs to augment applications.
Universal Q&A on @lucidweb_ works on Wikipedia articles. Check it out! 🪄. Ask questions about any Wikipedia article and the citation feature will take you straight to the source!. Request access at - I'm working super hard to roll this out to everyone! 😅
1
1
4
@codexeditor @sarahcat21 I don't think I would frame it as instead! . My Q: "here is a performant database that contains the content of the internet and is queried with natural language; what application will you build with that?". There are definitely some questions better answered by LLMs though!.
2
1
4
@OfficialLoganK @Chrisprucha @KevinAFischer @OpenAI I feel like the core issue is that LM model updates are inherently undocumented, breaking API changes. Fine if you assume that "smarter" is all that matters, but for the long tail of use cases these migrations will be pretty painful.
1
0
4
@benankdev Yeah, it has a handful of small tips that are mostly built around things that it normally gets confused by. Pretty minimal overall though!.
0
0
3
There is so much opportunity in AI right now, and Unusual was created to help the best technical founders build great companies. This program is going to kick ass. Such a great opportunity for folks thinking about AI companies to build and learn together.
Applications are open for our AI studio for builders! . Come hang out this summer with our community of AI enthusiasts, builders, and founders!.
0
1
3
@sh_reya Yes! I worked in MLOps for years, hoping that more tools would mean more people could use ML, and it turns out that more general ML models were the answer!.
1
0
3
Nothing more fun than a conversation with @Dpbrinkm and the @mlopscommunity !.
What a pleasure talking to @DavidSHershey about Building a Movie Recommendation System on @TectonAI with @SnowflakeDB. Tecton integrates with Snowflake and enables data teams to process ML features and serve them in production quickly and reliably,
1
0
3
Lots of unique evaluation concepts out there right now! Love the work @lmsysorg is doing giving Elo rankings to models with human comparisons. (3/n).
⚔️Chatbot Arena Leaderboard Update!. Exciting to welcome new entrants:.- Google PaLM 2.- Claude-instant-v1.- MosaicML MPT-7B. The competition is heating up🔥 Check out our analysis for all the surprising results at Remember, your vote shapes the arena.
1
0
3
@hwchase17 @m_morzywolek I'm currently working on some tools to manage transcripts from my zoom calls, and this is SO helpful and so cool.
2
0
3
@imjaredz I hear prompt 7 from "10 ChatGPT prompts that will change your life" is a banger though.
0
0
3
@hwchase17 @m_morzywolek Also wow good timing with this:.
Our new embedding model is significantly more capable at language processing and code tasks, cost effective, and simpler to use.
0
0
3
Also @LangChainAI + @qdrant_engine is a match made in heaven; the combo of the two makes building lightning quick. Thanks LangChain team!.
0
0
3
@JordanDAndersen @jheitzeb Specifically around LLM development? Would love to join if any are going strong.
1
0
2
@HamelHusain I have the same irrational confidence for round two and don't know if that's a good sign or a bad sign 🥴.
0
0
1
This is awesome -- of all of the (many) meetups that have cropped up, this has to be the most exciting theme. Rock on @swyx @Mappletons!.
REQUEST FOR DEMOS. Come join the first AI | UX: .Beyond the Textbox!. SAVE THE DATE: APR 19. in SF, recorded. Hosted by @mappletons and me, @geoffreylitt, @thesephist, @sgrove. ***If you have a 1-2min AI UX concept to share and want to meet fellow builders, PLS APPLY below!***
1
0
2
This is why I get so excited about this space -- we're just scratching the surface of what happens when you build experiences or apps on top of the initial output of an LLM. So many possibilities.
GPT can iteratively write, debug, and test programs to accomplish arbitrary goals. Pictured: GPT reading snippets of HTML from HN and building a headline scraper in Python, overcoming bugs by simply reading the errors and self-judgments and hypothesizing to itself. Thread ↓
0
0
2
@DanAdvantage Claude gets a screen overlay that shows coordinates over the screen, then it can choose to go to those coordinates -- its not great at understanding its relative position to things still, so this gives it a pretty big boost.
2
0
2
3/n Let's do a side-by-side of a few planning queries, using my favorite example from the folks at @oughtinc: "How far would all the film frames that make up the 400-plus episodes of The Simpsons stretch?"
1
0
2
last/n engineering around LLMs takes time and tools and hard work (@fixieai @LangChainAI @gpt_index will tell you that1), but we're going to get there really soon.
0
0
2
Can't say enough about @qdrant_engine -- local mode made getting started simple, and easy to harden over time with their hosted offering.
1
0
2
@KevinAFischer @OpenAI Especially when each of those products competes very directly for both attention *and* GPU availability.
1
0
2
Shoutout to @railway (another incredible portco) - I've never hosted a webapp before, and Railway felt like magic making it all work.
1
0
2
@sarahcat21 Is it so bad to have different stacks? The problem spaces can be so fundamentally different that you can essentially view them as different technologies. Maybe we can see some component consolidation at least 🙂.
2
0
2
2/ Shoutout to @thesephist whose tweet was what pushed me over the edge to write some thoughts down.
Small rant about LLMs and how I see them being put, rather thoughtlessly IMO, into productivity tools. 📄. TL;DR — Most knowledge work isn't a text-generation task, and your product shouldn't ship an implementation detail of LLMs as the end-user interface.
1
0
2
@DanielChesley @Work_Bench I might shamelessly steal this for Seattle, I'm so jealous of NYC right now 😭.
0
0
2
@thesephist +1, and in general I'm surprised by the lack of discourse around building reward models.
0
0
2
8/ @mihail_eric put together an awesome demo ( that blurs the lines between "generative" vs. just providing new interfaces. Makes answering simple data questions so much easier.
1
0
2
@KevinAFischer I guess I shouldn't be surprised! I've watched you publicly poke the traits of these models so much, makes sense to measure it too. Would love to chat, will drop you a message.
0
0
1
@thassonjee FWIW I've found it helps non-technical folks ground what I'm talking about. And it's better SEO 🤷♂️ the things you do for content lol.
1
0
1