lowvram @lowvram X Profile

lowvram

@lowvram

Followers

689

Following

6K

Media

101

Statuses

1K

too dumb for research, too smart for langchain. senior dev @ nameless megacorp, I wrap llms for a living

Joined April 2024

Don't wanna be here? Send us removal request.

lowvram

@lowvram

2 days

If you write an mcp server and plan to use it over stdio… do not write your logs to stdout/stderr. I know, no one could be stupid enough to make that mistake… certainly not me…

0

1

lowvram

@lowvram

3 days

What are the best cli tools to hand to coding agents? ripgrep, ast-grep, sed, etc. what else is there?

0

lowvram

@lowvram

9 days

I have reached the logical conclusion of the fragrance hyperfixation (triggered by that HackerNews post about luckyscent) - buying highly concentrated fragrance in bulk from the manufacturer

0

lowvram

@lowvram

10 days

I’m far more excited about inference hardware improvements (and cost) than training/model architecture improvements in the near future

0

lowvram

@lowvram

11 days

Seattle rain storms as an apartment-liver: “ah how nice, let’s enjoy the patter of rain and chill” As a new homeowner: “oh shit of fuck the gutters oh shit check the attic is the insulation soaked through ah shit”

1

0

13

lowvram

@lowvram

15 days

are we just rediscovering Guidance’s “token healing” now??

vLLM

@vllm_project

15 days

it’s tokenization again! 🤯 did you know tokenize(detokenize(token_ids)) ≠ token_ids? RL researchers from Agent Lightning coined the term Retokenization Drift — a subtle mismatch between what your model generated and what your trainer thinks it generated. why? because most

0

2

lowvram

@lowvram

16 days

Anecdotally, the 30BA3-VL-Instruct model is performing much better in my scenarios (agentic with many tools) than the non-VL version. In terms of correct tool selection and multi-turn execution. I just have to disable video and image in VLLM to make it fit 2x 3090 :3

Xeophon

@xeophon_

16 days

Qwen did it again! The new VL models (2B, 32B) absolutely CRUSH the old versions from April If you were using Qwen3 32B, it might be time to upgrade 👀

1

0

5

lowvram

@lowvram

16 days

alias grep=“echo how many times must I tell you, gpt-5, USE RIPGREP (rg)”

0

1

lowvram

@lowvram

19 days

I guess you could argue it’s also assuming the reader is both human *and* able-bodied (e.g. has two hands) so it was already narrowly focused on a subset of humans (though a large subset, to be sure)

0

lowvram

@lowvram

19 days

Some Wikipedia pages make the assumption that the reader is human, which is no longer guaranteed. E.x. the page for the right-hand rule says: “This can be seen by holding **your** hands together…” (emphasis mine)

1

0

lowvram

@lowvram

20 days

wait, THAT’S what it’s about?? Why didn’t anyone tell me, i would have read it sooner

kalomaze

@kalomaze

20 days

@teortaxesTex ^ guy who works at a company named after a novel that has incredibly graphic zombie sex scenes ^

0

1

lowvram

@lowvram

20 days

My microwave outperforms gpt-4o, Gemini-2.5, and Claude 4 Sonnet at warming butter. testing on Claude 4.5 and gpt-5-codex still underway

0

1

lowvram

@lowvram

27 days

If I had the means (and patience) to finetune any models right now, I’d do it with the Nemotron recipes and datasets. I’m no expert but just by perusing those sets and the nano-v2 training formula, it’s very well thought out.

0

lowvram

@lowvram

27 days

👀 https://t.co/KPOFlJ1Juu

huggingface.co

0

lowvram

@lowvram

28 days

Trigger a millennial NLP guy who had to learn LSTM instead, in one image

Pranav Gupta

@pranavgupta2603

29 days

You know this is a real NLP class when you see this

0

3

lowvram

@lowvram

1 month

my favorite videos on Sora are by a late-middle-aged guy with no followers whose entire profile is him inserted into wish-fulfillment scenes, like playing guitar, cooking a steak, drinking wine on the beach. Strangely wholesome

0

1

lowvram

@lowvram

1 month

https://t.co/KE1ZJ2RMHg

0

lowvram

@lowvram

1 month

Anyone remember “token healing” from the guidance project? iirc it existed solely to solve the problem where spliced-together prompts, or grammar-guided outputs, result in alternate tokenizations that are out of distribution from training

Brian Zheng

@s_zhengbr

1 month

Can a LM that has only ever seen the word “cat” tokenized as ␣cat, understand the token sequence [␣, c, a, t]? In our NeurIPS spotlight ⭐, we show that the answer is surprisingly YES, and in fact, you can even modify the tokenization at inference-time for performance gains!🧵

1

0

3

lowvram

@lowvram

1 month

Great interview question for any ML hire today: why did AlphaGo have both a policy AND a value network? Isn’t a value network just a policy network with different output shape? (secretly I don’t know the answer and I just want someone to tell me)

0

1