Steven Beeckman
@stevenbeeckman
Followers
3K
Following
50K
Media
814
Statuses
11K
Joined February 2009
Today, 🇵🇱celebrates its national holiday. Many 🇧🇪have forgotten the sacrifice Poland made for our liberation during World War II. Thank you for your sacrifices.
🇵🇱 Dziś obchodzimy Narodowe Święto Niepodległości Polski! 🇵🇱 Today, we celebrate Poland’s National Independence Day and recognise the country’s commitment to freedom and security.
1
1
7
3 years since I joined roboflow - 68k stars on github - 60 videos and streams on youtube - 2.5M views in total - 40 technical blogposts ↓ coolest stuff I made
45
151
2K
The RNLAF (Royal Netherlands Air Force) has been training in Texas since 1996 and has had a long relationship with the US Army hence why they did the flyover today. Flyovers of US military aircraft have been suspended since the US Government shutdown. 🇳🇱🤝🇺🇸 🎥 @TheNolanK
Was able to see the RNLAF AH-64s and CH-47s that conducted the COTA F1 flyover today up close before and during departure They kicked up a lot of grass on the way out, certainly a unique experience
40
185
3K
8 yr old video from Andrew Ng about error analysis which applies equally well (if not better) to debugging AI products Ng: "I usually do this in a spreadsheet, but using an ordinary text file would be ok" 70% of evals is looking and counting https://t.co/tS7mYl7Wms
7
13
149
The paper shows a small model trained with reinforcement learning can outperform prompt only agents on machine learning engineering. Most agents just prompt large models and search longer, but they do not learn from experience. This work instead trains a 3B Qwen model with
12
127
725
Believe the hype (one year later).
It's been about a year since my team has fully adopted all the AI coding tools (Cursor, Claude Code) And day to day I am feeling the added cruft in the code base. Unit tests are not catching regressions. Unneeded mocking, comments, are left in between. More refactoring is needed
0
0
0
This Github has a very wide collection of High-quality datasets, tools, and concepts for LLM fine-tuning. All the datasets listed here should be under permissive licensing (Apache 2.0, MIT, cc-by-4.0, etc.). Categorized into segments like Math & Logic, Code, Conversation &
18
201
1K
people always ask me, why build custom interfaces for evaluating LLM traces? human evaluation is expensive. custom interfaces make human evaluation 10x-100x cheaper. thanks, Alex, for sharing your example!
Built a lightweight trace viewer to speed up LLM evals—heavily inspired by lessons from @sh_reya and @HamelHusain's evals course. Kept it simple: FastAPI + vanilla HTML/JS. Features: failure banner, execution-flow timeline (LLM ↔ tools), keyboard shortcuts, and an annotation
5
14
110
One of the most pressing questions in our AI Evals course is: "Why can’t I just have an LLM write my LLM pipeline?" The nuanced answer is that you can use LLMs to assist, but not for the whole pipeline. Knowing where to put the LLM in the loop is the hard part. To unpack this,
8
30
248
"The Impact of Artificial Intelligence on Human Thought" A big 132 page report. AI is shifting real thinking work onto external systems, which boosts convenience but can weaken the effort that builds understanding and judgment, A pattern the paper frames through cognitive
15
84
368
This is that original MIT report that said 95% of AI pilots fail and which spooked investors across US Stockmarket. The reports says, most companies are stuck, because 95% of GenAI pilots produce zero ROI, while a small 5% win by using systems that learn, plug into real
90
534
4K
We were able to reproduce the strong findings of the HRM paper on ARC-AGI-1. Further, we ran a series of ablation experiments to get to the bottom of what's behind it. Key findings: 1. The HRM model architecture itself (the centerpiece of the paper) is not an important factor.
46
302
3K
This may be the coolest emergent capability I've seen in a video model. Veo 3 can take a series of text instructions added to an image frame, understand them, and execute in sequence. Prompt was "immediately delete instructions in white on the first frame and execute in order"
112
287
4K
We just discovered the 🔥 COOLEST 🔥 trick in Flow that we have to share: Instead of wordsmithing the perfect prompt, you can just... draw it. Take the image of your scene, doodle what you'd like on it (through any editing app), and then briefly describe what needs to happen
122
418
3K
Very useful tips on tool use and memory from Manus's context engineering blog post. Key takeaways. 1. Reversible compact summary Most models allow 128K context, which can easily fill up after a few turns when working with data like PDFs or web pages. When the context gets
4
52
696
supervision-0.26.0 is out we finally released support for ViTPose and ViTPose++ pose estimation models from @huggingface transformers link: https://t.co/xXMRaS3Guk
24
153
991
RAG is the most critical part of context management in AI. But doing it right is tough. I created a free, interactive simulator that visualizes different variants: 🧵
11
117
634
"Il faut être fort pour qu’aucun pays ne puisse imaginer attaquer l’Europe et en sortir vainqueur", affirme le patron de l'armée - RTBF Actus https://t.co/fv2HWFaetO
rtbf.be
'L’OTAN s’est réveillé sur son flanc Est', affirme le général Vansina. Et ce depuis l’invasion de la Crimée...
0
4
15
I open sourced Sniffly, a tool that analyzes Claude Code logs to help me understand my usage patterns and errors. Key learnings. 1. The biggest type of errors Claude Code made is Content Not Found (20 - 30%). It tries to find files or functions that don't exist. So I
48
129
1K