Mohammad Shoeybi Profile
Mohammad Shoeybi

@MohammadShoeybi

Followers
341
Following
91
Media
0
Statuses
79

Director of Applied Research @NVIDIA

Santa Clara, CA
Joined May 2013
Don't wanna be here? Send us removal request.
@MohammadShoeybi
Mohammad Shoeybi
2 months
Checkout our detailed study on advancing math and code reasoning using SFT and RL.
@_weiping
Wei Ping
2 months
Introducing AceReason-Nemotron 1.1. Our previous release, AceReason-Nemotron-1.0, introduced a stage-wise RL recipe that was applied sequentially to math-only and code-only prompts, demonstrating both high efficiency and strong effectiveness. Here, we systematically investigate
Tweet media one
1
3
12
@MohammadShoeybi
Mohammad Shoeybi
2 months
RT @ychenNLP: 📢We conduct a systematic study to demystify the synergy between SFT and RL for reasoning models. The result? We trained a 7B….
0
44
0
@MohammadShoeybi
Mohammad Shoeybi
2 months
RT @_albertgu: exciting to see that hybrid models maintain reasoning performance with few attention layers. benefits of linear architecture….
0
13
0
@MohammadShoeybi
Mohammad Shoeybi
2 months
We released reasoning models for Nemotron-H 8B and 47B. Great accuracies at 4x inference speed.
@NVIDIAAIDev
NVIDIA AI Developer
2 months
👀 Nemotron-H tackles large-scale reasoning while maintaining speed -- with 4x the throughput of comparable transformer models.⚡. See how #NVIDIAResearch accomplished this using a hybrid Mamba-Transformer architecture, and model fine-tuning ➡️
Tweet media one
1
6
33
@MohammadShoeybi
Mohammad Shoeybi
3 months
Checkout our recent work on advancing math and code reasoning through RL.
@NVIDIAAIDev
NVIDIA AI Developer
3 months
📣 Introducing AceReason-Nemotron: Advancing Math and Code Reasoning through Reinforcement Learning (RL). Starting from the SFT model DeepSeek-R1-Distill-Qwen-14B, our AceReason-Nemotron-14B achieves substantial improvements in pass@1 accuracy on key benchmarks through RL:. AIME
Tweet media one
1
1
18
@MohammadShoeybi
Mohammad Shoeybi
3 months
RT @ctnzr: It takes great data to make a great model. We're opening the data curation pipeline for Nemotron models, and we're also posting….
0
29
0
@MohammadShoeybi
Mohammad Shoeybi
4 months
RT @_weiping: Introducing AceMath-RL-Nemotron-7B, an open math model trained with reinforcement learning from the SFT-only checkpoint: Deep….
0
22
0
@MohammadShoeybi
Mohammad Shoeybi
4 months
RT @ctnzr: Base model Nemotron-H weights have been released under a research license:.
Tweet card summary image
huggingface.co
0
18
0
@MohammadShoeybi
Mohammad Shoeybi
4 months
RT @_weiping: Introducing UltraLong-8B:.We extended Llama3.1-8B-Instruct to support 1M, 2M, and 4M context windows by continuing pretrainin….
0
17
0
@MohammadShoeybi
Mohammad Shoeybi
5 months
RT @ctnzr: Nemotron-H: A family of Hybrid Mamba-Transformer LLMs. * Hybrid architecture means up to 3X faster at the same accuracy.* Traine….
0
102
0
@MohammadShoeybi
Mohammad Shoeybi
7 months
RT @_weiping: Introducing AceMath, a cutting-edge suite of math models designed to excel at solving complex math problems, complemented by….
0
22
0
@MohammadShoeybi
Mohammad Shoeybi
7 months
RT @_weiping: Open-sourcing the training code for NVLM-1.0 72B in:.
0
33
0
@MohammadShoeybi
Mohammad Shoeybi
8 months
RT @_weiping: We’re at #NeurIPS 2024 in Vancouver, presenting two papers from NVIDIA on advancing state-of-the-art LLM RAG models!. ChatQA:….
Tweet card summary image
arxiv.org
In this work, we introduce ChatQA, a suite of models that outperform GPT-4 on retrieval-augmented generation (RAG) and conversational question answering (QA). To enhance generation, we propose a...
0
11
0
@MohammadShoeybi
Mohammad Shoeybi
8 months
We are very excited to release our Common Crawl based large scale dataset. This 6.3T tokens dataset will help the community develop stronger models. Check it out!.
@MarkusKliegl
Markus Kliegl
8 months
We are excited to release Nemotron-CC, our high quality Common Crawl based 6.3 trillion tokens dataset for LLM pretraining (4.4T globally deduplicated original tokens and 1.9T synthetically generated tokens). Compared to the leading open DCLM dataset, Nemotron-CC enables to
Tweet media one
0
4
16
@MohammadShoeybi
Mohammad Shoeybi
11 months
RT @_weiping: Introducing NVLM 1.0, a family of frontier-class multimodal LLMs that achieve state-of-the-art results on vision-language tas….
0
123
0
@MohammadShoeybi
Mohammad Shoeybi
11 months
RT @_weiping: We are excited to release ChatQA-2 (and its training data!), 128K long-context models that also have exceptional RAG capabili….
0
8
0
@MohammadShoeybi
Mohammad Shoeybi
11 months
RT @_weiping: Our NV-Embed-v2, has achieved a record-breaking score of 72.31 across 56 text embedding / retrieval tasks, reclaiming the top….
0
6
0
@MohammadShoeybi
Mohammad Shoeybi
1 year
RT @PavloMolchanov: 🚀 40x Faster Model Training via Pruning and Distillation! .Permissive Minitron-4B and Minitron-8B models!.🔗 Paper: http….
0
47
0
@MohammadShoeybi
Mohammad Shoeybi
1 year
RT @_weiping: Introducing ChatQA 2, a Llama3-based model with a 128K context window, designed to close the gap between open LLMs and leadin….
0
41
0