
Leo Dirac
@leopd
Followers
6K
Following
3K
Media
164
Statuses
5K
Building the next generation of AI vision at Groundlight. Ex-physicist, ex-google, ex-amazon.
Seattle, WA
Joined April 2007
We trained small LLM's using GRPO to use an image zoom tool to better answer visual questions.
arxiv.org
Despite tremendous recent advances in large model reasoning ability, vision-language models (VLMs) still struggle with detailed visual reasoning, especially when compute resources are limited. To...
0
0
2
RT @__sunil_kumar_: We've open-sourced a MCP that allows big models to use huggingface computer vision models as tools. This allows Clau….
0
2
0
New open source MCP server for vision! MCP will be the fabric by which LLMs communicate with other systems. While LLMs can accept images as input, they remain stubbornly stupid at answering simple visual questions. Meanwhile, Groundlight and traditional CV systems are super.
We made an open-source MCP server that turns HuggingFace zero-shot object detection pipelines into tools that Claude and others can use to locate objects or zoom (crop) to an object. Conceptually vision capabilities as tools are complementary to VLM's
1
2
13
RT @__sunil_kumar_: It’s pretty remarkable how many of the GRPO findings from super verifiable environments (like math) haven’t generalized….
0
6
0
Democratization of AI is one of the most powerful forces for long-term good in the world today. True democratization means not just open models & code, but code that can run without multi-million dollar hardware budgets. e.g. in a browser. Nice work Hyperparam team.
What if someone ported the entire data engineering stack to JavaScript? What new kinds of data applications could you build?. Today Hyperparam is releasing a collection of open source tools for working with large datasets (eg- parquet files) entirely in the browser, no servers.
1
0
1
RT @WenhuChen: 🔥 How do you build a state-of-the-art Vision-Language Model with direct RL?. We’re excited to introduce VL-Rethinker, a new….
0
61
0
RT @__sunil_kumar_: GRPO/reasoning enthusiasts - are you using the liger kernel? If not, I strongly suggest you give it a try! It is making….
0
14
0
RT @andrewgwils: Good luck to everyone receiving ICML reviews tomorrow!.
0
2
0
RT @GroundlightAI: The last day to vote for @GroundlightAI is coming up this Sunday! We appreciate your continuous support and for making….
0
1
0
RT @Marktechpost: Groundlight Research Team Released an Open-Source AI Framework that Makes It Easy to Build Visual Reasoning Agents (with….
0
9
0
RT @__sunil_kumar_: Has anyone built MCPs that can input and output image data? I’d appreciate a reference if one exists. VLMs like Qwen2….
0
2
0
Even better practice is to randomize how long you pause for (exponential backoff with jitter) such that the expected delay increases, but each individual delay is varied. That would clearly solve this problem as one would pause longer and they'd get out of each other's way. But.
0
0
0
Good practice for dealing with errors is always to pause before trying again, and pause longer and longer each time - this has a nice theoretical benefit that the total load from each retrying agent has a constant cap, even if the error condition never resolves. (Sum n=1. �� of.
1
0
1
Pretty funny as an isolated anecdote, but also a hidden lesson in why to use jitter in backoff algorithms. (Maybe these robots don't even recognize their state as an error condition?).
1
0
4
RT @andrewgwils: Good research is mostly about knowing what questions to ask, not about answering questions that other people are asking.
0
59
0
RT @__sunil_kumar_: @leopd @BowenROIM @willccbb PS: we’re working on multi turn conversations and tool use. Stay tooned!.
0
1
0
RT @__sunil_kumar_: We just released an open-source framework that makes it easy to build visual reasoning agents (with GRPO). https://t.….
0
124
0
TIL about using uv for python. While you _can_ install uv using pip or something like that, IMHO that's a bad idea. You're better off installing uv directly (`curl -LsSf | sh`) - because then uv will manage your different python versions and everything.
1
1
5