Ben Dickson @bendee983 X Profile

Ben Dickson

@bendee983

Followers

5K

Following

8K

Media

566

Statuses

9K

Software Engineer | Tech analyst | Thinker | Student of life | Founder of @bdtechtalks

https://t.co/vzSLFmNk2d

In a private namespace

Joined August 2015

Don't wanna be here? Send us removal request.

Ben Dickson

@bendee983

1 day

Don’t underestimate the power of core knowledge

François Chollet

@fchollet

4 days

A student who truly understands F=ma can solve more novel problems than a Transformer that has memorized every physics textbook ever written.

0

Ben Dickson

@bendee983

2 days

This is the kind of question that GPT-5 Thinking is really good at answering. I gave it this same question (with some additional instructions such as using arXiv and blogs from leading AI labs and tech firms as primary sources). It provided a brief list of tools/techniques with

Ben Dickson

@bendee983

3 days

I've seen a lot of research on improving LLM agents' ability to use tools. Is there any work on LLM agents building their own tools based on the problems they face in their environments? It sounds so intuitive (though I know it is absolutely not easy to solve).

0

Ben Dickson

@bendee983

3 days

I've seen a lot of research on improving LLM agents' ability to use tools. Is there any work on LLM agents building their own tools based on the problems they face in their environments? It sounds so intuitive (though I know it is absolutely not easy to solve).

0

3

1

Ben Dickson

@bendee983

4 days

You can't make such a big assumption with such a small sample size.

Chubby♨️

@kimmonismus

5 days

I had to have an MRI scan of my leg. I sent the images to GPT-5 and Grok 4. Both made the same diagnosis in their evaluation and, upon request, even circled the abnormalities in the images. The diagnosis completely matches the doctor's findings. It's only 2025, and already the

0

Ben Dickson

@bendee983

4 days

I literally said a good while back that real software engineers got to clean up the mess left by AI vibe coding. Now, "vibe code cleanup specialist" is a thing.

0

1

Ben Dickson

@bendee983

5 days

Meta’s REFRAG technique, a “decoding framework tailored for RAG applications,” reportedly speeds up time-to-first-token (TTFT) in LLMs by 30.85× and extend context size by 16×. REFRAG leverages the inherent sparsity and block-diagonal attention patterns present in RAG contexts

0

Ben Dickson

@bendee983

5 days

AI accelerators, job platforms, personal devices, feature films… OpenAI is throwing everything at the wall to see what sticks. Not even the world’s leading AI lab knows what is the killer app or trillion dollar market that AI will unlock.

0

Ben Dickson

@bendee983

6 days

New agentic memory framework from UCL and Huawei: - Organize LLM agent trajectories into a repository of structured memory components - Retrieve relevant memory components for new tasks to avoid repeating past mistakes - Use a planner agent + memories to break down goal into

0

2

Ben Dickson

@bendee983

6 days

As there is serious concern over the effect that AI answers will have on the search engine market, TPU might end up being Google's ace in the hole.

Wes Roth

@WesRothMoney

8 days

Alphabet may be hiding a $900 B crown jewel inside its walls. As AI labs look beyond Nvidia, Google’s TPUs are emerging as the go-to silicon and a potential spin-off of TPUs plus DeepMind could redraw the AI hardware map. The newest Trillium (Gen-6) chips already see strong

1

0

4

Ben Dickson

@bendee983

6 days

While everyone is waiting for DeepMind to drop Gemini 3.0, Google is silently releasing a fleet of powerful and efficient small models, laying the ground for what can be the future of edge AI. Lots of power packed in EmbeddingGemma, a complement to the Gemma 3n series.

TechTalks

@bdtechtalks

7 days

This compact embedding model is a key piece in a larger strategy of small language models, favoring a fleet of efficient specialists models over one large LLM

0

2

7

Ben Dickson

@bendee983

7 days

So, is AI coming after all the jobs or not?

0

2

Ben Dickson

@bendee983

7 days

OpenAI has released a new paper on LLM hallucinations: "we argue that the majority of mainstream evaluations reward hallucinatory behavior. Simple modifications of mainstream evaluations can realign incentives, rewarding appropriate expressions of uncertainty rather than

0

2

Ben Dickson

@bendee983

7 days

This could be: - New Grok model - Gemini 3.0 - A new Chinese model? Observations: 1- The “maximally intelligent” seems to be in line with the xAI culture (maximally truth-seeking) 2- It’s free and not hosted by OpenRouter, which could mean it will not be an open model (notice

OpenRouter

@OpenRouterAI

7 days

Introducing Sonoma Alpha, two new stealth models 🥷 Context: 2 million tokens Price: Free

0

2

Ben Dickson

@bendee983

8 days

My experience too (though I wouldn't call it top raw intelligence). It's amazing how well GPT-5 Thinking can fetch things from the web. I did some rigorous testing and looked at the reasoning trace (more precisely, the summarized reasoning trace OpenAI displays), and it seems

Dan Mac

@daniel_mac8

8 days

GPT-5 Pro is undoubtedly the current top raw intelligence model and it's mainly due to how well it searches the web. Not sure how OpenAI did it but it's got the juice.

0

3

Ben Dickson

@bendee983

9 days

@jiawzhao Find out more here: https://t.co/yGRYnpdifp

Jiawei Zhao

@jiawzhao

22 days

Introducing DeepConf: Deep Think with Confidence 🚀 First method to achieve 99.9% on AIME 2025 with open-source models! Using GPT-OSS-120B even without tools, we reached this almost-perfect accuracy while saving up to 85% generated tokens. It also delivers many strong

0

Ben Dickson

@bendee983

9 days

My story on DeepConf with comments from @jiawzhao https://t.co/PyNLkcQvyk

venturebeat.com

DeepConf leverages a model’s internal confidence signals to increase accuracy and cut inference costs by stopping low-quality reasoning paths early.

1

0

Ben Dickson

@bendee983

9 days

DeepConf reduces output tokens by up to 85% while maintaining accuracy in open weight reasoning models (the technique applies to majority voting test-time scaling techniques). It does it by: 1) Using the confidence score of output tokens to weigh the quality of the model's output

1

2

1

Ben Dickson

@bendee983

10 days

Well said 💯

Britney Muller

@BritneyMuller

10 days

AI-generated content without human value-add is a dead end strategy!!! especially as search engines and users get better at detecting it. The most successful content approaches are treating AI as a tool rather than a replacement for human expertise. It might help identify

1

0

2

Ben Dickson

@bendee983

10 days

We’re still in the early innings, but I’m very optimistic about diffusion language models. It intuitively makes sense to look at more than just one token into the future.

0

3

Ben Dickson

@bendee983

10 days

This is an interesting observation. It is worth noting that SpaceX also received billions of dollars in government grants and funding (but nowhere near what AI companies are raising). At the same time: 1) SpaceX wasn’t facing fierce competition from other startups. 2) The

Aidan Gold

@MrGoldBro

11 days

OpenAI has raised $64B. Anthropic has raised $28B. Insane levels of capital are being invested in these AI labs. SpaceX, building literal rockets, has only raised $12B over 20+ years.

0