Genghan Zhang Profile
Genghan Zhang

@zhang677

Followers
98
Following
28
Media
2
Statuses
24

Joined September 2023
Don't wanna be here? Send us removal request.
@zhang677
Genghan Zhang
9 months
🔍 ML library development is crucial but requires expertise in ML algorithms & architecture-specific programming languages (ASPLs). 🤖 LLM agents can enable better automation. We propose an adaptive self-improvement agentic system for generating ML libraries in STeP—a
2
6
26
@anjiangw
Anjiang Wei @ EMNLP25
5 months
We introduce CodeARC, a new benchmark for evaluating LLMs’ inductive reasoning. Agents must synthesize functions from I/O examples—no natural language, just reasoning. 📄 https://t.co/j5fFbgLjQJ 💻 https://t.co/6B7Ig1M1pG 🌐 https://t.co/AQIaeVCmT7 #LLM #Reasoning #LLM4Code #ARC
3
28
95
@simonguozirui
Simon Guo
9 months
LLMs for GPU kernel🌽generation have been getting Pop🍿ular since our preview last Dec; excited to announce 📢 our full paper 📃 for KernelBench! Turns out KernelBench is quite challenging 🧠 — frontier models outperform the PyTorch Eager baseline <20% of the time. More 🧵👇
9
68
310
@zhang677
Genghan Zhang
9 months
As stated in their paper: "A growing subset of work in this field has also begun to target kernel writing (KernelBench( https://t.co/d4K7bJHlmu); Adaptive Self-improvement LLM Agentic System for ML Library Development( https://t.co/ue2bewOthj)) in Architecture Specific Programming
0
0
2
@zhang677
Genghan Zhang
9 months
Excited to see new automation technologies for ML library development using Architecture Specific Programming Language (ASPL). We have recently explored an adaptive self-improvement method for this task
Tweet card summary image
arxiv.org
ML libraries, often written in architecture-specific programming languages (ASPLs) that target domain-specific architectures, are key to efficient ML systems. However, writing these...
@SakanaAILabs
Sakana AI
1 year
Introducing The AI Scientist: The world’s first AI system for automating scientific research and open-ended discovery! https://t.co/8wVqIXVpZJ From ideation, writing code, running experiments and summarizing results, to writing entire papers and conducting peer-review, The AI
1
0
2
@anneouyang
Anne Ouyang
9 months
New blog post from Nvidia: LLM-generated GPU kernels showing speedups over FlexAttention and achieving 100% numerical correctness on 🌽KernelBench Level 1
44
233
2K
@anneouyang
Anne Ouyang
11 months
Kernels are the kernel of deep learning. 🙃...but writing kernels sucks. Can LLMs help? 🤔 Introducing 🌽 KernelBench (Preview), a new coding benchmark designed to evaluate the ability of LLMs to generate ⚡️efficient💨 GPU kernels for optimizing neural network performance.
20
98
619
@liang_weixin
Weixin Liang
9 months
🚀 Want 2x faster pretraining for your multi-modal LLM? 🧵 Following up on Mixture-of-Transformers (MoT), we're excited to share Mixture-of-Mamba (MoM)! https://t.co/OTTpAlB4Vq 🔥 Why it matters: MoM applies modality-aware sparsity across image, text, and speech—making
0
1
19
@liang_weixin
Weixin Liang
9 months
📢 Can LLMs program themselves to run faster? 🏃⏱️ LLM self-taught to code for next-gen AI hardware! https://t.co/wiwgiPEpeH 1/ Programming AI accelerators is a major bottleneck in ML. Our self-improving LLM agent learns to write optimized code for new hardware, achieving 3.9x
2
6
38
@allenainie
Allen Nie (🇺🇦☮️)
9 months
Wow. Nice timing. @anjiangw and I just released a new version of our paper https://t.co/ybD8MBarMw. LLM Agents show surprising exploration/sample efficiency (almost 100x faster than UCB bandit) in optimizing system code. A good domain for coding agents🤔😁
@ycombinator
Y Combinator
9 months
AI Coding Agent for Hardware Optimized Code @sdianahu AI hardware is still constrained by software. However, with reasoning models like Deepseek R1 or OpenAI o1 and o3, AI could generate hardware-optimized code that rivals—or surpasses—human CUDA code.
3
28
128
@zhang677
Genghan Zhang
9 months
Excited to see the “self-improvement” idea also works on theorem proof, another application that requires complex reasoning with limited data
@tengyuma
Tengyu Ma
9 months
and SoTA among whole-proof generation methods on miniF2F, ProofNet, and PutnamBench, and double the previous best results on LeanWorkBook. (reposting because it seems that this table has much more views 😝)
0
0
1
@zhang677
Genghan Zhang
9 months
To learn more about STeP:
0
0
0
@zhang677
Genghan Zhang
9 months
Thanks for all the collaborators: Weixin Liang @liang_weixin, Olivia Hsu ( https://t.co/RG3r5yCSdm), and Kunle Olukotun @KunleOlukotun
0
0
0
@Jerry_XU_Jiarui
Jiarui Xu
1 year
TTT could model long sequences with linear time complexity. It's a drop-in upgrade for any sequence modeling operators like self-attention. It has been super fun to work on TTT with the amazing team! Code is available:
Tweet card summary image
github.com
Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States - test-time-training/ttt-lm-jax
@karansdalal
Karan Dalal
1 year
I’m excited to share a project I’ve been working on for over a year, which I believe will fundamentally change our approach to language models. We’ve designed a new architecture, which replaces the hidden state of an RNN with a machine learning model. This model compresses
1
13
67
@xiaolonw
Xiaolong Wang
1 year
Cannot believe this finally happened! Over the last 1.5 years, we have been developing a new LLM architecture, with linear complexity and expressive hidden states, for long-context modeling. The following plots show our model trained from Books scale better (from 125M to 1.3B)
20
261
2K
@karansdalal
Karan Dalal
1 year
I’m excited to share a project I’ve been working on for over a year, which I believe will fundamentally change our approach to language models. We’ve designed a new architecture, which replaces the hidden state of an RNN with a machine learning model. This model compresses
50
284
2K
@zhang677
Genghan Zhang
1 year
Thanks, Olivia Hsu ( https://t.co/RG3r5yCSdm) and Fredrik Kjolstad ( https://t.co/iPVwC4ykH3), for their help and advice!
fredrikbk.com
0
0
1
@zhang677
Genghan Zhang
1 year
To generate code that assembles any output sparse tensors, you need Sparse Workspace ( https://t.co/aBSUoa8E2v). Our work extends the generality of sparse tensor algebra compilers by treating sparse tensors as a first-class concept for any tensor variable, including temporaries.
1
0
5
@mo_tiwari
Mo Tiwari
2 years
Thrilled that a paper from my PhD, "Faster Maximum Inner Product Search in High Dimensions" has been accepted to ICML 2024! In the paper, we accelerate state-of-the-art for the Maximum Inner Product Search (MIPS) problem. MIPS is a core subroutine in systems like recommendation
5
14
80