
Visiting Fellow, Ph.D.
@jackiefloyd
Followers
1K
Following
105K
Media
1K
Statuses
28K
AI all the time. Building. Accused of having big ideas. Fearless. Also cats. UT-Austin BS, Columbia PhD.
Houston, TX
Joined June 2008
0
0
0
RL fine-tuning often prematurely collapses policy entropy. We consider a general framework, called set RL, i.e. RL over a set of trajectories from a policy. We use it to incentivize diverse solutions & optimize for inference time performance. Paper:
arxiv.org
Reinforcement learning fine-tuning (RLFT) is a dominant paradigm for improving pretrained policies for downstream tasks. These pretrained policies, trained on large datasets, produce generations...
Exploration is fundamental to RL. Yet policy gradient methods often collapse: during training they fail to explore broadly, and converge into narrow, easily exploitable behaviors. The result is poor generalization, limited gains from test-time scaling, and brittleness on tasks
10
64
760
Time to fine-tune your own models instead of relying on blackbox closed-source models! Not doing this is like building a software company and not writing your own software. In the time of reinforcement learning, it's become much easier and cheaper than it used to thanks to
Introducing Tinker: a flexible API for fine-tuning language models. Write training loops in Python on your laptop; we'll run them on distributed GPUs. Private beta starts today. We can't wait to see what researchers and developers build with cutting-edge open models!
38
78
874
This is an excellent idea and we shipped this today for https://t.co/AtkQjlgJ6c as well. Agents that that accepts text/markdown now gets just what it needs. You can also get markdown by appending .md to any url
When Claude Code fetches Bun’s docs, Bun’s docs now send markdown instead of HTML by default This shrinks token usage for our docs by about 10x
16
21
491
we're shipping the first reachy minis and I can probably leverage my cofounder status to pressure the team into prioritizing two or three community members 😅😅😅 the first version is going to be very rough/DIY so better for robotics builders who have time, interest and skills
22
16
178
TODAY WE LAUNCH SORA 2, THE WORLDS BEST VIDEO GENERATION MODEL feature you and your friends with raw real world physics, putting an end to the uncanny ai vibes let me show you how insane our model is, featuring me & sam altman:
632
586
7K
Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a
99
587
4K
In 2023 there was a big uptick in companies marketing their products as “AI powered”. 2026 will be the year of companies marketing their products as “AI free” (a trend already underway)
82
198
1K
Something that's missing in robotics AI is the reflex from researchers and builders to share not only a video demo but also the code, datasets, policies, models or research papers for others to benefit from it and to show that what they did is not fake, staged or cherry-picked
31
21
216
.@RichardSSutton, father of reinforcement learning, doesn’t think LLMs are bitter-lesson-pilled. My steel man of Richard’s position: we need some new architecture to enable continual (on-the-job) learning. And if we have continual learning, we don't need a special training
252
625
4K
Sharing our second Connectionism research post on Modular Manifolds, a mathematical approach to refining training at each layer of the neural network
Efficient training of neural networks is difficult. Our second Connectionism post introduces Modular Manifolds, a theoretical step toward more stable and performant training by co-designing neural net optimizers with manifold constraints on weight matrices.
91
261
3K
How hot is this company? Go to X's search. Type "FactoryAI" and search. Look at just how much praise from developers this company is getting. Then come back and watch this. It is the state of the art of software engineering. The team at @FactoryAI goes deep with me today right
18
36
333
The best agents for software development are becoming the best agents for everything. Droids are the best software development agents in the world, reaching #1 on Terminal-Bench. We have raised $50M from NEA, Sequoia Capital, J.P. Morgan, Nvidia, Abstract Ventures, and other
115
147
1K
Our latest work performs sim2real dexterous grasping using end-to-end depth RL.
14
48
395