Shuvendu Roy @ShuvenduBikash X Profile

Shuvendu Roy

@ShuvenduBikash

Followers

94

Following

1K

Media

25

Statuses

1K

Generalizability in AI l Machine Learning Research Intern @RBCBorealis | Ph.D Candidate (AI) @queensu | Former: Student Researcher @google;@VectorInst

Toronto, Canada

Joined December 2014

Don't wanna be here? Send us removal request.

Shuvendu Roy

@ShuvenduBikash

23 hours

RT @omarsar0: A Deep Dive into RL for LLM Reasoning. Provides a roadmap for practitioners applying RL for LLM reasoning. Nice to have some….

0

99

0

Shuvendu Roy

@ShuvenduBikash

6 days

RT @omarsar0: Efficient Agents. This is a great study full of insights on how to build efficient agents. If you are looking to reduce cost….

0

188

0

Shuvendu Roy

@ShuvenduBikash

23 days

RT @TheAITimeline: 🚨This week's top AI/ML research papers:. - Mixture-of-Recursions.- Scaling Laws for Optimal Data Mixtures.- Training Tra….

0

71

0

Shuvendu Roy

@ShuvenduBikash

29 days

RT @omarsar0: One Token to Fool LLM-as-a-Judge. Watch out for this one, devs!. Semantically empty tokens, like “Thought process:”, “Solutio….

0

122

0

Shuvendu Roy

@ShuvenduBikash

1 month

RT @Kimi_Moonshot: 🚀 Hello, Kimi K2! Open-Source Agentic Model!.🔹 1T total / 32B active MoE model.🔹 SOTA on SWE Bench Verified, Tau2 & Ace….

0

1K

0

Shuvendu Roy

@ShuvenduBikash

2 months

RT @aaronmakelky: @LoicReco Link to guide for those who don’t want screen shots:

0

8

0

Shuvendu Roy

@ShuvenduBikash

2 months

RT @omarsar0: Reinforcement Pre-Training. New pre-training paradigm for LLMs just landed on arXiv!. It incentivises effective next-token re….

0

91

0

Shuvendu Roy

@ShuvenduBikash

2 months

RT @natolambert: Major reasoning models so far with technical reports (focused on those w RL):. 2025-01-22 — DeepSeek R1 — .

arxiv.org

We introduce Open-Reasoner-Zero, the first open source implementation of large-scale reasoning-oriented RL training on the base model focusing on scalability, simplicity and accessibility. Through...

0

197

0

Shuvendu Roy

@ShuvenduBikash

3 months

RT @xuandongzhao: 🚀 Excited to share the most inspiring work I’ve been part of this year:. "Learning to Reason without External Rewards"….

0

513

0

Shuvendu Roy

@ShuvenduBikash

3 months

RT @omarsar0: Understanding Reasoning Capabilities in LLMs. This report shares insights and tips for using reasoning models. Great read fo….

0

199

0

Shuvendu Roy

@ShuvenduBikash

4 months

RT @xhluca: DeepSeek-R1 Thoughtology: Let’s <think> about LLM reasoning. 142-page report diving into the reasoning chains of R1. It spans 9….

0

137

0

Shuvendu Roy

@ShuvenduBikash

5 months

RT @finbarrtimbers: a paper I like a lot is the "Unpacking DPO and PPO: Disentangling Best Practices for Learning from Preference Feedback"….

0

31

0

Shuvendu Roy

@ShuvenduBikash

5 months

RT @iScienceLuvr: Interesting, Microsoft recently published a short paper demonstrating RLVR (R1-style training) on medical QA datasets.….

0

44

0

Shuvendu Roy

@ShuvenduBikash

5 months

RT @iScienceLuvr: Scaling Laws of Synthetic Data for Language Models. "In this work, we systematically investigate the scaling laws of synt….

0

66

0

Shuvendu Roy

@ShuvenduBikash

5 months

RT @y0b1byte: Best explanation of what's going on under the hood in the v3/r1 release.

0

110

0

Shuvendu Roy

@ShuvenduBikash

5 months

RT @gm8xx8: DAPO: An Open-Source LLM Reinforcement Learning System at Scale. DAPO is a reinforcement learning algorithm for large-scale LLM….

0

55

0

Shuvendu Roy

@ShuvenduBikash

5 months

RT @cwolferesearch: Many recent frontier LLMs like Grok-3 and DeepSeek-R1 use a Mixture-of-Experts (MoE) architecture. To understand how it….

0

114

0

Shuvendu Roy

@ShuvenduBikash

5 months

RT @TXhunyuan: 🚀 Introducing Hunyuan-TurboS – the first ultra-large Hybrid-Transformer-Mamba MoE model!.Traditional pure Transformer models….

0

222

0

Shuvendu Roy

@ShuvenduBikash

5 months

RT @_akhaliq: Alibaba just dropped R1-Omni. Explainable Omni-Multimodal Emotion Recognition with Reinforcing Learning .

0

222

0

Shuvendu Roy

@ShuvenduBikash

5 months

RT @_akhaliq: Vision-R1. Incentivizing Reasoning Capability in Multimodal Large Language Models

0

88

0