
Luyu Gao
@luyu_gao
Followers
2K
Following
275
Media
13
Statuses
159
Research Scientist @MistralAI Work on Code Agents (Devstrals) PhD candidate (on leave) @CarnegieMellon @LTIatCMU
Joined April 2020
[1/4] Introducing HyDE, a method to unsupervisedly build dense retrievers. HyDE zero-shot instructs GPT to generate a fictional document and re-encodes it with Contriever to search in its embedding space. Put it simply, casting retrieval-like behavior in GPT into real retrieval.
9
74
397
RT @b_roziere: Excited to release Devstral Medium and a new version of Devstral Small! .Devstral medium reaches 61.8% on SWE-bench verified….
0
2
0
Among papers I wrote, GradCache is one of my favorite. Very glad to see it still being useful🚀.
We trained all of the Nomic Embed models on limited compute. One trick that helped us train SoTA embeddings on 16 H100s? GradCache, a gradient checkpointing-like technique tailored for contrastive learning. I kept forgetting how it works, so I dug into the math and wrote about it
1
1
26
We released Devstral, a powerful code agent foundation model. With an apache-2 license, it is now the best open source model on swe-bench. One of the fun projects I worked on this year.
Meet Devstral, our SOTA open model designed specifically for coding agents and developed with @allhands_ai .
2
1
25
RT @MinyangTian1: SciCode is our new benchmark that challenges LMs to code solutions for scientific problems from advanced papers. The chal….
0
62
0
RT @anubha_haha: Recent studies show program-aided prompting in LLMs improves reasoning tasks, but do they "know what they know"? 🤔.Our #NA….
0
5
0
[4/4] Really want to thank all JAX authors for building such a fun framework to play with! @jekbradbury @froystig @SingularMattrix @cdleary @jakevdp @DougalMaclaurin @apaszke @zhangqiaorjc.
0
0
6
[1/4] So, I decided to seriously use JAX, and it didn't take long for me to realize its power. With just a couple hundred lines of code, you can do data&tensor parallelism on @huggingface transformers. I've created a toolkit to make this more accessible.
github.com
Supercharge huggingface transformers with model parallelism. - luyug/magix
5
17
134
Attending my my first #NeurIPS conference. Excited to chat with people about retrieval, RAG or just any other LLM phenomena. #NeurIPS2023.
0
1
34
RT @sivil_taram: 🇸🇬We will present the "Active Retrieval Augmented Generation" paper in the Poster Session 2, December 8th, 16:00 SGT. Feel….
0
1
0
RT @gneubig: I have a post-doc position open at @LTIatCMU, starting Summer or Fall 2024. If you are interested in working with me at CMU on….
docs.google.com
Graham Neubig's lab (https://www.cs.cmu.edu/~neulab/) has a post-doc position open starting Summer 2024. If you are interested, please apply through the following form. Please also feel free to get...
0
64
0
RT @tengyuma: 📢 Introducing Voyage AI @Voyage_AI_!. Founded by a talented team of leading AI researchers and me 🚀🚀. We build state-of-the-….
0
95
0
RT @_akhaliq: Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification. paper page: https://t.….
0
39
0
RT @arankomatsuzaki: Solving Challenging Math Word Problems Using GPT-4 Code Interpreter with Code-based Self-Verification. With GPT-4 Code….
0
58
0