Ross Wightman @wightmanr X Profile

Ross Wightman

@wightmanr

Followers

22K

Following

4K

Media

132

Statuses

5K

Computer Vision @ 🤗. Ex head of Software, Firmware Engineering at a Canadian 🦄. Currently building ML, AI systems or investing in startups that do it better.

Vancouver, BC

Joined April 2012

Don't wanna be here? Send us removal request.

Ross Wightman

@wightmanr

7 hours

RT @ntenenz: The Dayhoff Atlas!. Open code. Open weights. Open datasets. Thanks @huggingface for helping to facilitate open science. ht….

huggingface.co

0

5

0

Ross Wightman

@wightmanr

3 days

A joint OpenCLIP (3.0.0) and timm (1.0.18) release day today. It's been a quarter since the last OC release, so what's new? PE (Perception Encoder) Core support was the headline feature. Using the timm vision encoder for the PE models, I adapted the weights from @AIatMeta so they.

2

8

77

Ross Wightman

@wightmanr

8 days

Oh right, and of course FlexiViT (Beyer, Lucas, et al) . runtime patch resizing based on that, but the PyTorch impl here is fast! . unlike my original literal port of the original Flax impl.

0

5

Ross Wightman

@wightmanr

8 days

If you do build on this, whether through timm or borrowing bits of code, please consider a timm reference in related papers and code. NaVit (Dehghani, Mostafa, et al) and NaFlex (SigLIP-2, Tschannen, Michael, et al) are obvious foundations, but there's quite a few pieces and.

2

0

9

Ross Wightman

@wightmanr

8 days

My NaFlexVit implementation is getting more flexy. In final verification stages of ROPE support, which means all timm ViT models based on the EVA model lineage (EVA, EVA02, Meta PE, Naver ROPE-ViT) can be loaded into NaFlexVit w/ support for native aspect, dynamic & variable.

6

61

Ross Wightman

@wightmanr

14 days

RT @ShivamDuggal4: Compression is the heart of intelligence.From Occam to Kolmogorov—shorter programs=smarter representations. Meet KARL: K….

0

62

0

Ross Wightman

@wightmanr

17 days

RT @Thom_Wolf: Thrilled to finally share what we've been working on for months at @huggingface 🤝@pollenrobotics. Our first robot: Reachy Mi….

0

514

0

Ross Wightman

@wightmanr

24 days

The first vision transformer models in timm with ROPE (rotary position embeddings) were the EVA models by the baaivision team. Since then I've expanded the flexibility of the base model and all subsequent ROPE models in timm call the eva module home. This includes EVA02, my own.

2

4

71

Ross Wightman

@wightmanr

30 days

RT @aaron_defazio: AdamC and corrected weight decay for other optimizers is now implemented in timm! .Try it out if you want better behaved….

arxiv.org

During long-duration Large Language Model (LLM) training runs the gradient norm increases rapidly near the end of training. In this short note, we show that this increase is due to an unintended...

0

4

0

Ross Wightman

@wightmanr

30 days

Phew, it's been a while. timm 1.0.16 released today to provide the image encoder for Gemma 3n. Additions kept on stacking so I haven't had chance to finalize a release since the last one to support SigLIP-2 backbones. Lots of stuff in there:.* Gemma 3n encoder (via a.

4

12

88

Ross Wightman

@wightmanr

1 month

RT @PyTorch: torchft + TorchTitan: 1200+ failures, no checkpoints, model convergence. A Llama 3 model was trained across 300 L40S GPUs wit….

0

48

0

Ross Wightman

@wightmanr

1 month

And flip it back to o3, get a response consistently in a few seconds (instead of 10+ minutes), with output, that is usually sane. hmm.

1

0

2

Ross Wightman

@wightmanr

1 month

Has anyone had much luck with o3 pro? The few times I've tried to use it last couple days, I usually get a whole lot of time in the 'reasoning' state. and then either a 'error in stream', or 'reasoning finished' and not output whatsoever. If it does actually work, the few code.

4

0

6

Ross Wightman

@wightmanr

1 month

RT @ysu_nlp: 📈 Scaling may be hitting a wall in the digital world, but it's only beginning in the biological world!. We trained a foundatio….

0

56

0

Ross Wightman

@wightmanr

2 months

RT @lschmidt3: Very excited to finally release our paper for OpenThoughts!. After DataComp and DCLM, this is the third large open dataset m….

0

212

0

Ross Wightman

@wightmanr

2 months

RT @aaron_defazio: Why do gradients increase near the end of training? .Read the paper to find out!.We also propose a simple fix to AdamW t….

0

76

0

Ross Wightman

@wightmanr

2 months

Oh yeah, and a detail for @giffmana . a while back I asked if you scaled the batch size with seq-len. That's the default here, so batch size changes with each seq-len selected to keep utilization high. It works well, there's also loss scaling enabled to scale loss w/ the.

3

1

14

Ross Wightman

@wightmanr

2 months

Want to try it out, checkout the main branch of timm now. Will add some info to README later today. one example, . python /data/f/imagenet --amp -j 8 --model vit_base_patch16_224 --model-kwargs use_naflex=True --naflex-loader --naflex-max-seq-len 256.

1

2

13

Ross Wightman

@wightmanr

2 months

timm's got a new vision transformer (NaFlexVit), and it's flexible! I've been plugging away at this for a bit, integrating ideas from FlexiViT, NaViT, and NaFlex and finally ready to merge for initial exploration. The model supports:.* variable aspect/size images of NaFlex (see

5

38

229