bendee983 Profile Banner
Ben Dickson Profile
Ben Dickson

@bendee983

Followers
5K
Following
8K
Media
566
Statuses
9K

Software Engineer | Tech analyst | Thinker | Student of life | Founder of @bdtechtalks

In a private namespace
Joined August 2015
Don't wanna be here? Send us removal request.
@bendee983
Ben Dickson
1 day
Don’t underestimate the power of core knowledge
@fchollet
François Chollet
4 days
A student who truly understands F=ma can solve more novel problems than a Transformer that has memorized every physics textbook ever written.
0
0
0
@bendee983
Ben Dickson
2 days
This is the kind of question that GPT-5 Thinking is really good at answering. I gave it this same question (with some additional instructions such as using arXiv and blogs from leading AI labs and tech firms as primary sources). It provided a brief list of tools/techniques with
@bendee983
Ben Dickson
3 days
I've seen a lot of research on improving LLM agents' ability to use tools. Is there any work on LLM agents building their own tools based on the problems they face in their environments? It sounds so intuitive (though I know it is absolutely not easy to solve).
0
0
0
@bendee983
Ben Dickson
3 days
I've seen a lot of research on improving LLM agents' ability to use tools. Is there any work on LLM agents building their own tools based on the problems they face in their environments? It sounds so intuitive (though I know it is absolutely not easy to solve).
0
3
1
@bendee983
Ben Dickson
4 days
You can't make such a big assumption with such a small sample size.
@kimmonismus
Chubby♨️
5 days
I had to have an MRI scan of my leg. I sent the images to GPT-5 and Grok 4. Both made the same diagnosis in their evaluation and, upon request, even circled the abnormalities in the images. The diagnosis completely matches the doctor's findings. It's only 2025, and already the
0
0
0
@bendee983
Ben Dickson
4 days
I literally said a good while back that real software engineers got to clean up the mess left by AI vibe coding. Now, "vibe code cleanup specialist" is a thing.
Tweet media one
0
0
1
@bendee983
Ben Dickson
5 days
Meta’s REFRAG technique, a “decoding framework tailored for RAG applications,” reportedly speeds up time-to-first-token (TTFT) in LLMs by 30.85× and extend context size by 16×. REFRAG leverages the inherent sparsity and block-diagonal attention patterns present in RAG contexts
Tweet media one
0
0
0
@bendee983
Ben Dickson
5 days
AI accelerators, job platforms, personal devices, feature films… OpenAI is throwing everything at the wall to see what sticks. Not even the world’s leading AI lab knows what is the killer app or trillion dollar market that AI will unlock.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
0
0
@bendee983
Ben Dickson
6 days
New agentic memory framework from UCL and Huawei: - Organize LLM agent trajectories into a repository of structured memory components - Retrieve relevant memory components for new tasks to avoid repeating past mistakes - Use a planner agent + memories to break down goal into
Tweet media one
Tweet media two
0
2
2
@bendee983
Ben Dickson
6 days
As there is serious concern over the effect that AI answers will have on the search engine market, TPU might end up being Google's ace in the hole.
@WesRothMoney
Wes Roth
8 days
Alphabet may be hiding a $900 B crown jewel inside its walls. As AI labs look beyond Nvidia, Google’s TPUs are emerging as the go-to silicon and a potential spin-off of TPUs plus DeepMind could redraw the AI hardware map. The newest Trillium (Gen-6) chips already see strong
Tweet media one
1
0
4
@bendee983
Ben Dickson
6 days
While everyone is waiting for DeepMind to drop Gemini 3.0, Google is silently releasing a fleet of powerful and efficient small models, laying the ground for what can be the future of edge AI. Lots of power packed in EmbeddingGemma, a complement to the Gemma 3n series.
@bdtechtalks
TechTalks
7 days
This compact embedding model is a key piece in a larger strategy of small language models, favoring a fleet of efficient specialists models over one large LLM
0
2
7
@bendee983
Ben Dickson
7 days
So, is AI coming after all the jobs or not?
Tweet media one
0
0
2
@bendee983
Ben Dickson
7 days
OpenAI has released a new paper on LLM hallucinations: "we argue that the majority of mainstream evaluations reward hallucinatory behavior. Simple modifications of mainstream evaluations can realign incentives, rewarding appropriate expressions of uncertainty rather than
Tweet media one
0
0
2
@bendee983
Ben Dickson
7 days
This could be: - New Grok model - Gemini 3.0 - A new Chinese model? Observations: 1- The “maximally intelligent” seems to be in line with the xAI culture (maximally truth-seeking) 2- It’s free and not hosted by OpenRouter, which could mean it will not be an open model (notice
@OpenRouterAI
OpenRouter
7 days
Introducing Sonoma Alpha, two new stealth models 🥷 Context: 2 million tokens Price: Free
Tweet media one
0
0
2
@bendee983
Ben Dickson
8 days
My experience too (though I wouldn't call it top raw intelligence). It's amazing how well GPT-5 Thinking can fetch things from the web. I did some rigorous testing and looked at the reasoning trace (more precisely, the summarized reasoning trace OpenAI displays), and it seems
@daniel_mac8
Dan Mac
8 days
GPT-5 Pro is undoubtedly the current top raw intelligence model and it's mainly due to how well it searches the web. Not sure how OpenAI did it but it's got the juice.
0
0
3
@bendee983
Ben Dickson
9 days
@jiawzhao
Jiawei Zhao
22 days
Introducing DeepConf: Deep Think with Confidence 🚀 First method to achieve 99.9% on AIME 2025 with open-source models! Using GPT-OSS-120B even without tools, we reached this almost-perfect accuracy while saving up to 85% generated tokens. It also delivers many strong
0
0
0
@bendee983
Ben Dickson
9 days
DeepConf reduces output tokens by up to 85% while maintaining accuracy in open weight reasoning models (the technique applies to majority voting test-time scaling techniques). It does it by: 1) Using the confidence score of output tokens to weigh the quality of the model's output
Tweet media one
1
2
1
@bendee983
Ben Dickson
10 days
Well said 💯
@BritneyMuller
Britney Muller
10 days
AI-generated content without human value-add is a dead end strategy!!! especially as search engines and users get better at detecting it. The most successful content approaches are treating AI as a tool rather than a replacement for human expertise. It might help identify
1
0
2
@bendee983
Ben Dickson
10 days
We’re still in the early innings, but I’m very optimistic about diffusion language models. It intuitively makes sense to look at more than just one token into the future.
Tweet media one
0
0
3
@bendee983
Ben Dickson
10 days
This is an interesting observation. It is worth noting that SpaceX also received billions of dollars in government grants and funding (but nowhere near what AI companies are raising). At the same time: 1) SpaceX wasn’t facing fierce competition from other startups. 2) The
@MrGoldBro
Aidan Gold
11 days
OpenAI has raised $64B. Anthropic has raised $28B. Insane levels of capital are being invested in these AI labs. SpaceX, building literal rockets, has only raised $12B over 20+ years.
0
0
0