wightmanr Profile Banner
Ross Wightman Profile
Ross Wightman

@wightmanr

Followers
22K
Following
4K
Media
132
Statuses
5K

Computer Vision @ 🤗. Ex head of Software, Firmware Engineering at a Canadian 🦄. Currently building ML, AI systems or investing in startups that do it better.

Vancouver, BC
Joined April 2012
Don't wanna be here? Send us removal request.
@wightmanr
Ross Wightman
7 hours
RT @ntenenz: The Dayhoff Atlas!. Open code. Open weights. Open datasets. Thanks @huggingface for helping to facilitate open science. ht….
Tweet card summary image
huggingface.co
0
5
0
@wightmanr
Ross Wightman
3 days
A joint OpenCLIP (3.0.0) and timm (1.0.18) release day today. It's been a quarter since the last OC release, so what's new? PE (Perception Encoder) Core support was the headline feature. Using the timm vision encoder for the PE models, I adapted the weights from @AIatMeta so they.
2
8
77
@wightmanr
Ross Wightman
8 days
Oh right, and of course FlexiViT (Beyer, Lucas, et al) . runtime patch resizing based on that, but the PyTorch impl here is fast! . unlike my original literal port of the original Flax impl.
0
0
5
@wightmanr
Ross Wightman
8 days
If you do build on this, whether through timm or borrowing bits of code, please consider a timm reference in related papers and code. NaVit (Dehghani, Mostafa, et al) and NaFlex (SigLIP-2, Tschannen, Michael, et al) are obvious foundations, but there's quite a few pieces and.
2
0
9
@wightmanr
Ross Wightman
8 days
My NaFlexVit implementation is getting more flexy. In final verification stages of ROPE support, which means all timm ViT models based on the EVA model lineage (EVA, EVA02, Meta PE, Naver ROPE-ViT) can be loaded into NaFlexVit w/ support for native aspect, dynamic & variable.
6
6
61
@wightmanr
Ross Wightman
14 days
RT @ShivamDuggal4: Compression is the heart of intelligence.From Occam to Kolmogorov—shorter programs=smarter representations. Meet KARL: K….
0
62
0
@wightmanr
Ross Wightman
17 days
RT @Thom_Wolf: Thrilled to finally share what we've been working on for months at @huggingface 🤝@pollenrobotics. Our first robot: Reachy Mi….
0
514
0
@wightmanr
Ross Wightman
24 days
The first vision transformer models in timm with ROPE (rotary position embeddings) were the EVA models by the baaivision team. Since then I've expanded the flexibility of the base model and all subsequent ROPE models in timm call the eva module home. This includes EVA02, my own.
2
4
71
@wightmanr
Ross Wightman
30 days
RT @aaron_defazio: AdamC and corrected weight decay for other optimizers is now implemented in timm! .Try it out if you want better behaved….
Tweet card summary image
arxiv.org
During long-duration Large Language Model (LLM) training runs the gradient norm increases rapidly near the end of training. In this short note, we show that this increase is due to an unintended...
0
4
0
@wightmanr
Ross Wightman
30 days
Phew, it's been a while. timm 1.0.16 released today to provide the image encoder for Gemma 3n. Additions kept on stacking so I haven't had chance to finalize a release since the last one to support SigLIP-2 backbones. Lots of stuff in there:.* Gemma 3n encoder (via a.
4
12
88
@wightmanr
Ross Wightman
1 month
RT @PyTorch: torchft + TorchTitan: 1200+ failures, no checkpoints, model convergence. A Llama 3 model was trained across 300 L40S GPUs wit….
0
48
0
@wightmanr
Ross Wightman
1 month
And flip it back to o3, get a response consistently in a few seconds (instead of 10+ minutes), with output, that is usually sane. hmm.
1
0
2
@wightmanr
Ross Wightman
1 month
Has anyone had much luck with o3 pro? The few times I've tried to use it last couple days, I usually get a whole lot of time in the 'reasoning' state. and then either a 'error in stream', or 'reasoning finished' and not output whatsoever. If it does actually work, the few code.
4
0
6
@wightmanr
Ross Wightman
1 month
RT @ysu_nlp: 📈 Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world!. We trained a foundatio….
0
56
0
@wightmanr
Ross Wightman
2 months
RT @lschmidt3: Very excited to finally release our paper for OpenThoughts!. After DataComp and DCLM, this is the third large open dataset m….
0
212
0
@wightmanr
Ross Wightman
2 months
RT @aaron_defazio: Why do gradients increase near the end of training? .Read the paper to find out!.We also propose a simple fix to AdamW t….
0
76
0
@wightmanr
Ross Wightman
2 months
Oh yeah, and a detail for @giffmana . a while back I asked if you scaled the batch size with seq-len. That's the default here, so batch size changes with each seq-len selected to keep utilization high. It works well, there's also loss scaling enabled to scale loss w/ the.
3
1
14
@wightmanr
Ross Wightman
2 months
Want to try it out, checkout the main branch of timm now. Will add some info to README later today. one example, . python /data/f/imagenet --amp -j 8 --model vit_base_patch16_224 --model-kwargs use_naflex=True --naflex-loader --naflex-max-seq-len 256.
1
2
13
@wightmanr
Ross Wightman
2 months
timm's got a new vision transformer (NaFlexVit), and it's flexible! I've been plugging away at this for a bit, integrating ideas from FlexiViT, NaViT, and NaFlex and finally ready to merge for initial exploration. The model supports:.* variable aspect/size images of NaFlex (see
Tweet media one
5
38
229