
Ben Dickson
@bendee983
Followers
5K
Following
8K
Media
566
Statuses
9K
Software Engineer | Tech analyst | Thinker | Student of life | Founder of @bdtechtalks
In a private namespace
Joined August 2015
This is the kind of question that GPT-5 Thinking is really good at answering. I gave it this same question (with some additional instructions such as using arXiv and blogs from leading AI labs and tech firms as primary sources). It provided a brief list of tools/techniques with
I've seen a lot of research on improving LLM agents' ability to use tools. Is there any work on LLM agents building their own tools based on the problems they face in their environments? It sounds so intuitive (though I know it is absolutely not easy to solve).
0
0
0
I've seen a lot of research on improving LLM agents' ability to use tools. Is there any work on LLM agents building their own tools based on the problems they face in their environments? It sounds so intuitive (though I know it is absolutely not easy to solve).
0
3
1
You can't make such a big assumption with such a small sample size.
I had to have an MRI scan of my leg. I sent the images to GPT-5 and Grok 4. Both made the same diagnosis in their evaluation and, upon request, even circled the abnormalities in the images. The diagnosis completely matches the doctor's findings. It's only 2025, and already the
0
0
0
I literally said a good while back that real software engineers got to clean up the mess left by AI vibe coding. Now, "vibe code cleanup specialist" is a thing.
0
0
1
Meta’s REFRAG technique, a “decoding framework tailored for RAG applications,” reportedly speeds up time-to-first-token (TTFT) in LLMs by 30.85× and extend context size by 16×. REFRAG leverages the inherent sparsity and block-diagonal attention patterns present in RAG contexts
0
0
0
AI accelerators, job platforms, personal devices, feature films… OpenAI is throwing everything at the wall to see what sticks. Not even the world’s leading AI lab knows what is the killer app or trillion dollar market that AI will unlock.
0
0
0
New agentic memory framework from UCL and Huawei: - Organize LLM agent trajectories into a repository of structured memory components - Retrieve relevant memory components for new tasks to avoid repeating past mistakes - Use a planner agent + memories to break down goal into
0
2
2
As there is serious concern over the effect that AI answers will have on the search engine market, TPU might end up being Google's ace in the hole.
Alphabet may be hiding a $900 B crown jewel inside its walls. As AI labs look beyond Nvidia, Google’s TPUs are emerging as the go-to silicon and a potential spin-off of TPUs plus DeepMind could redraw the AI hardware map. The newest Trillium (Gen-6) chips already see strong
1
0
4
While everyone is waiting for DeepMind to drop Gemini 3.0, Google is silently releasing a fleet of powerful and efficient small models, laying the ground for what can be the future of edge AI. Lots of power packed in EmbeddingGemma, a complement to the Gemma 3n series.
This compact embedding model is a key piece in a larger strategy of small language models, favoring a fleet of efficient specialists models over one large LLM
0
2
7
OpenAI has released a new paper on LLM hallucinations: "we argue that the majority of mainstream evaluations reward hallucinatory behavior. Simple modifications of mainstream evaluations can realign incentives, rewarding appropriate expressions of uncertainty rather than
0
0
2
This could be: - New Grok model - Gemini 3.0 - A new Chinese model? Observations: 1- The “maximally intelligent” seems to be in line with the xAI culture (maximally truth-seeking) 2- It’s free and not hosted by OpenRouter, which could mean it will not be an open model (notice
0
0
2
My experience too (though I wouldn't call it top raw intelligence). It's amazing how well GPT-5 Thinking can fetch things from the web. I did some rigorous testing and looked at the reasoning trace (more precisely, the summarized reasoning trace OpenAI displays), and it seems
GPT-5 Pro is undoubtedly the current top raw intelligence model and it's mainly due to how well it searches the web. Not sure how OpenAI did it but it's got the juice.
0
0
3
@jiawzhao Find out more here: https://t.co/yGRYnpdifp
Introducing DeepConf: Deep Think with Confidence 🚀 First method to achieve 99.9% on AIME 2025 with open-source models! Using GPT-OSS-120B even without tools, we reached this almost-perfect accuracy while saving up to 85% generated tokens. It also delivers many strong
0
0
0
DeepConf reduces output tokens by up to 85% while maintaining accuracy in open weight reasoning models (the technique applies to majority voting test-time scaling techniques). It does it by: 1) Using the confidence score of output tokens to weigh the quality of the model's output
1
2
1
We’re still in the early innings, but I’m very optimistic about diffusion language models. It intuitively makes sense to look at more than just one token into the future.
0
0
3
This is an interesting observation. It is worth noting that SpaceX also received billions of dollars in government grants and funding (but nowhere near what AI companies are raising). At the same time: 1) SpaceX wasn’t facing fierce competition from other startups. 2) The
OpenAI has raised $64B. Anthropic has raised $28B. Insane levels of capital are being invested in these AI labs. SpaceX, building literal rockets, has only raised $12B over 20+ years.
0
0
0