Genghan Zhang @zhang677 X Profile

Genghan Zhang

@zhang677

Followers

98

Following

28

Media

2

Statuses

24

Joined September 2023

Don't wanna be here? Send us removal request.

Genghan Zhang

@zhang677

9 months

🔍 ML library development is crucial but requires expertise in ML algorithms & architecture-specific programming languages (ASPLs). 🤖 LLM agents can enable better automation. We propose an adaptive self-improvement agentic system for generating ML libraries in STeP—a

2

6

26

Anjiang Wei @ EMNLP25

@anjiangw

5 months

We introduce CodeARC, a new benchmark for evaluating LLMs’ inductive reasoning. Agents must synthesize functions from I/O examples—no natural language, just reasoning. 📄 https://t.co/j5fFbgLjQJ 💻 https://t.co/6B7Ig1M1pG 🌐 https://t.co/AQIaeVCmT7 #LLM #Reasoning #LLM4Code #ARC

3

28

95

Simon Guo

@simonguozirui

9 months

LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench! Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline <20% of the time. More 🧵👇

9

68

310

Genghan Zhang

@zhang677

9 months

As stated in their paper: "A growing subset of work in this field has also begun to target kernel writing (KernelBench( https://t.co/d4K7bJHlmu); Adaptive Self-improvement LLM Agentic System for ML Library Development( https://t.co/ue2bewOthj)) in Architecture Specific Programming

0

2

Genghan Zhang

@zhang677

9 months

Excited to see new automation technologies for ML library development using Architecture Specific Programming Language (ASPL). We have recently explored an adaptive self-improvement method for this task

arxiv.org

ML libraries, often written in architecture-specific programming languages (ASPLs) that target domain-specific architectures, are key to efficient ML systems. However, writing these...

Sakana AI

@SakanaAILabs

1 year

Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery! https://t.co/8wVqIXVpZJ From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI

1

0

2

Anne Ouyang

@anneouyang

9 months

New blog post from Nvidia: LLM-generated GPU kernels showing speedups over FlexAttention and achieving 100% numerical correctness on 🌽KernelBench Level 1

44

233

2K

Anne Ouyang

@anneouyang

11 months

Kernels are the kernel of deep learning. 🙃...but writing kernels sucks. Can LLMs help? 🤔 Introducing 🌽 KernelBench (Preview), a new coding benchmark designed to evaluate the ability of LLMs to generate ⚡️efficient💨 GPU kernels for optimizing neural network performance.

20

98

619

Weixin Liang

@liang_weixin

9 months

🚀 Want 2x faster pretraining for your multi-modal LLM? 🧵 Following up on Mixture-of-Transformers (MoT), we're excited to share Mixture-of-Mamba (MoM)! https://t.co/OTTpAlB4Vq 🔥 Why it matters: MoM applies modality-aware sparsity across image, text, and speech—making

0

1

19

Weixin Liang

@liang_weixin

9 months

📢 Can LLMs program themselves to run faster? 🏃⏱️ LLM self-taught to code for next-gen AI hardware! https://t.co/wiwgiPEpeH 1/ Programming AI accelerators is a major bottleneck in ML. Our self-improving LLM agent learns to write optimized code for new hardware, achieving 3.9x

2

6

38

Allen Nie (🇺🇦☮️)

@allenainie

9 months

Wow. Nice timing. @anjiangw and I just released a new version of our paper https://t.co/ybD8MBarMw. LLM Agents show surprising exploration/sample efficiency (almost 100x faster than UCB bandit) in optimizing system code. A good domain for coding agents🤔😁

Y Combinator

@ycombinator

9 months

AI Coding Agent for Hardware Optimized Code @sdianahu AI hardware is still constrained by software. However, with reasoning models like Deepseek R1 or OpenAI o1 and o3, AI could generate hardware-optimized code that rivals—or surpasses—human CUDA code.

3

28

128

Genghan Zhang

@zhang677

9 months

Excited to see the “self-improvement” idea also works on theorem proof, another application that requires complex reasoning with limited data

Tengyu Ma

@tengyuma

9 months

and SoTA among whole-proof generation methods on miniF2F, ProofNet, and PutnamBench, and double the previous best results on LeanWorkBook. (reposting because it seems that this table has much more views 😝)

0

1

Genghan Zhang

@zhang677

9 months

To learn more about STeP:

0

Genghan Zhang

@zhang677

9 months

Thanks for all the collaborators: Weixin Liang @liang_weixin, Olivia Hsu ( https://t.co/RG3r5yCSdm), and Kunle Olukotun @KunleOlukotun

0

Jiarui Xu

@Jerry_XU_Jiarui

1 year

TTT could model long sequences with linear time complexity. It's a drop-in upgrade for any sequence modeling operators like self-attention. It has been super fun to work on TTT with the amazing team! Code is available:

github.com

Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States - test-time-training/ttt-lm-jax

Karan Dalal

@karansdalal

1 year

I’m excited to share a project I’ve been working on for over a year, which I believe will fundamentally change our approach to language models. We’ve designed a new architecture, which replaces the hidden state of an RNN with a machine learning model. This model compresses

1

13

67

Xiaolong Wang

@xiaolonw

1 year

Cannot believe this finally happened! Over the last 1.5 years, we have been developing a new LLM architecture, with linear complexity and expressive hidden states, for long-context modeling. The following plots show our model trained from Books scale better (from 125M to 1.3B)

20

261

2K

Karan Dalal

@karansdalal

1 year

I’m excited to share a project I’ve been working on for over a year, which I believe will fundamentally change our approach to language models. We’ve designed a new architecture, which replaces the hidden state of an RNN with a machine learning model. This model compresses

50

284

2K

Genghan Zhang

@zhang677

1 year

Thanks, Olivia Hsu ( https://t.co/RG3r5yCSdm) and Fredrik Kjolstad ( https://t.co/iPVwC4ykH3), for their help and advice!

fredrikbk.com

0

1

Genghan Zhang

@zhang677

1 year

To generate code that assembles any output sparse tensors, you need Sparse Workspace ( https://t.co/aBSUoa8E2v). Our work extends the generality of sparse tensor algebra compilers by treating sparse tensors as a first-class concept for any tensor variable, including temporaries.

1

0

5

Mo Tiwari

@mo_tiwari

2 years

Thrilled that a paper from my PhD, "Faster Maximum Inner Product Search in High Dimensions" has been accepted to ICML 2024! In the paper, we accelerate state-of-the-art for the Maximum Inner Product Search (MIPS) problem. MIPS is a core subroutine in systems like recommendation

5

14

80