
Justin T Chiu
@justintchiu
Followers
645
Following
5K
Media
10
Statuses
1K
generating code at Cohere; phd in ml from Cornell; former Child
Joined November 2011
Are code agents good at software design, ie building general and reusable code?.We present Librarian, a new refactoring method, and MiniCode, a verifiable refactoring benchmark that requires agents to design libraries that jointly minimizes code from multiple repos π§΅
4
22
149
RT @jxmnop: for the first time i am aware of, there is an entirely private subfield of AI research. every company that actually trains modeβ¦.
0
47
0
RT @wzhao_nlp: I've always been skeptical about PRMs, but being able to apply RL+reasoning changes the entire story for me. It was a fun riβ¦.
0
18
0
RT @cgarciae88: everyone please drop what you are doing and leave a heart on JAX scaling book at the bottom of the page:. .
0
12
0
RT @MassCaccia: π₯ We stress-tested todayβs best AI code generators in ππππππππππ¦ βπππ. Introducing ππ’πππ‘ππ¦ππ₯ππ¨π§ π.π: 328 challenges for veβ¦.
0
25
0
RT @stuart_sul: MoE layers can be really slow. When training our coding models @cursor_ai, they ate up 27β53% of training time. So we compβ¦.
0
97
0
RT @_xjdr: This is the best version of this i have seen anywhere. Incredibly impressive work and everyone should read it carefully and moreβ¦.
0
33
0
RT @cartesia_ai: Introducing Line by Cartesia: the modern voice agent development platform. Line was built to be code-first, because best-iβ¦.
0
54
0
RT @cHHillee: When it comes to hardware that's meant for training or inference, most think about in hardware specs like memory bandwidth evβ¦.
0
8
0
RT @vllm_project: Have you ever felt you are developing cuda kernels and your tests often run into illegal memory access (IMA for short) anβ¦.
0
22
0
RT @ShashwatGoel7: Seems like OpenAI has been prioritising verification, hugely. We re-ran REFUTE, our code verification eval (COLM'25) oβ¦.
0
18
0
RT @stalkermustang: whoa, what a big W for OpenAIs models on the ReBench (SWE-bench but with very recent PRs, like, closed 3-8 weeks ago)β¦.
0
1
0
RT @ying11231: The open source RL framework Slime(+SGLang) has been validated to train 300+B models with agentic, coding, and reasoning capβ¦.
0
30
0
RT @_onionesque: Estimating a setβs size from uniform (or o/w well-defined) samples is a classical problem, with two well-studied extremes:β¦.
arxiv.org
Let $S$ be a finite set, and $X_1,\ldots,X_n$ an i.i.d. uniform sample from $S$. To estimate the size $|S|$, without further structure, one can wait for repeats and use the birthday problem. This...
0
5
0
RT @ChangJonathanC: while we wait for gpt-5 to drop. Here is a flex attention tutorial for building a < 1000 LoC vllm from scratch. https://β¦.
jonathanc.net
PyTorch FlexAttention tutorial: Building a minimal vLLM-style inference engine from scratch with paged attention
0
37
0
RT @leloykun: If you wanna read more about our paper on training transformers with enforced lipschitz bounds, please check out this awesomeβ¦.
0
1
0
RT @allhands_ai: We evaluated GPT-5 in OpenHands and it's the new number one coding agent model for us!. Using exactly the same tools and hβ¦.
0
31
0