beckerfuffle Profile Banner
Michael Becker Profile
Michael Becker

@beckerfuffle

Followers
841
Following
6K
Media
499
Statuses
4K

Data Scientist @PennMedicine working to improve medicine by predicting the future! https://t.co/x7INgZwhhN. Also I started DataPhilly https://t.co/JzMnoRgIIg. Amateur #AIArtist

Black Sun
Joined February 2013
Don't wanna be here? Send us removal request.
@KeithSakata
Keith Sakata, MD
5 months
I’m a psychiatrist. In 2025, I’ve seen 12 people hospitalized after losing touch with reality because of AI. Online, I’m seeing the same pattern. Here’s what “AI psychosis” looks like, and why it’s spreading fast: 🧵
2K
14K
94K
@beckerfuffle
Michael Becker
1 year
5/5 Best practice: Test multiple models, get feedback from domain experts, and evaluate based on objective results for your particular needs. No single metric can capture the full picture of an LM's capabilities.
0
0
0
@beckerfuffle
Michael Becker
1 year
4/5 The takeaway? Don't choose an LM based solely on leaderboards or benchmarks. What matters is how well it performs on YOUR specific tasks and use cases.
1
0
4
@beckerfuffle
Michael Becker
1 year
3/5 While the LMSYS team is trying to address this, their method introduces its own subjective elements. There's no perfect way to separate style from substance.
1
0
0
@beckerfuffle
Michael Becker
1 year
2/5 This highlights how complex LM evaluation really is. Human preferences, often used in rankings, can be swayed by factors like response length or formatting that may not reflect true model capability.
1
0
0
@beckerfuffle
Michael Becker
1 year
1/5 The recent LMSYS study on chatbot rankings shows why we shouldn't rely too heavily on any single metric for language models. They found significant shifts in rankings when controlling for "style" vs "substance" in responses.
@arena
lmarena.ai
1 year
Does style matter over substance in Arena? Can models "game" human preference through lengthy and well-formatted responses? Today, we're launching style control in our regression model for Chatbot Arena — our first step in separating the impact of style from substance in
1
0
0
@JonathanRoss321
Jonathan Ross
1 year
What can you do with Llama quality and Groq speed? You can do Instant. That's what. Try Llama 3.1 8B for instant intelligence on https://t.co/1tnDYnUkUi.
121
426
3K
@HamelHusain
Hamel Husain
2 years
Epic talk on RAG Basics by @jobergum from the LLM conf (links to YT, slides, etc in next tweet)
6
77
593
@alexandr_wang
Alexandr Wang
2 years
We re-ran SEAL evals on the new @AnthropicAI Claude 3.5 Sonnet model. It is now: - 🥇 #1 on Instruction Following - 🥇 #1 on Coding Congratulations to Anthropic on a great new model! P.S. we’re adding new evals to SEAL, so if you have an idea for an eval, let us know below 👇
@summeryue0
Summer Yue
2 years
1. Claude 3.5 Sonnet is now #1 in Instruction Following on the SEAL leaderboards ( https://t.co/bRdTbIMKRy) 🏆
27
69
644
@transitive_bs
Travis Fischer
2 years
pretty cool approach 1. use LLMs to extract a knowledge graph from your sources 2. cluster this graph into communities of related entities at diff levels of detail 3. for RAG, map over all communities to create "community answers" and reduce to create a final answer
@MSFTResearch
Microsoft Research
2 years
GraphRAG, a graph-based approach to retrieval-augmented generation (RAG) that significantly improves question-answering over private or previously unseen datasets, is now available on GitHub. Learn more. https://t.co/HeH4bqlmpB
0
3
30
@jeremyphoward
Jeremy Howard
2 years
I've done a deep dive into SB 1047 over the last few weeks, and here's what you need to know: *Nobody* should be supporting this bill in its current state. It will *not* actually cover the largest models, nor will it actually protect open source. But it can be easily fixed!🧵
10
97
472
@QuixiAI
Eric Hartford
2 years
Cognitive Computations presents: Dolphin-2.9.3-qwen2-0.5b and Dolphin-2.9.3-qwen2-1.5b Two tiny Dolphins that still pack a punch! Run it on your wristwatch or your raspberry pi! We removed the coding, function calling, and multilingual, to let it focus on instruct and
10
10
103
@togethercompute
Together AI
2 years
Mixture of Agents—a framework that leverages the collective strengths of multiple LLMs. Each layer contains multiple agents that refine responses using outputs from the preceding layer. Together MoA achieves a score of 65.1% on AlpacaEval 2.0. https://t.co/UxA3nrUV5N
28
99
437
@llama_index
LlamaIndex 🦙
2 years
Introducing RAGApp 💫 A no-code interface to configure a RAG chatbot, as dead-simple as GPTs by @OpenAI. It’s a docker container that’s easily deployable in any cloud infrastructure. Best of all, it’s fully open-source 🔥 1️⃣ Setup the LLM: Configure the model provider (OpenAI,
14
94
549
@beckerfuffle
Michael Becker
2 years
10/ As the AI industry grapples with SB 1047's implications, stakeholders must closely examine the bill's provisions and consequences. The balance between responsible AI development and fostering innovation will be central to the ongoing discourse surrounding AI regulation.
0
0
1
@beckerfuffle
Michael Becker
2 years
9/ SB 1047's stringent regulations and potential impact on AI innovation may drive research and development to other states or countries with more favorable regulatory environments, such as Texas or the United Arab Emirates, threatening California's position as an AI hub.
1
0
0
@beckerfuffle
Michael Becker
2 years
8/ The bill sparks debate over whether responsibility should lie with AI developers creating general-purpose tools or with end-users misusing these tools for harmful purposes, drawing comparisons to other software like Photoshop.
1
0
0
@beckerfuffle
Michael Becker
2 years
7/ Developers face severe penalties under SB 1047, including injunctions, damages up to 30% of revenue, and model shutdowns. The bill also expands criminal perjury for knowingly lying in safety reports, creating significant legal risks for AI companies.
1
0
0
@beckerfuffle
Michael Becker
2 years
6/ SB 1047's broad definitions of "covered models" (those trained with >10^26 operations) and "hazardous capabilities" could hinder innovation, particularly for startups and smaller AI companies navigating vague regulations.
1
0
0
@beckerfuffle
Michael Becker
2 years
5/ "SB 1047 similarly enforces restrictions that effectively preclude the development and dissemination of new AI models, it arguably introduces a form of unconstitutional prior restraint on the creation of speech — in this case, code."
1
1
1