Tran Rick
@TranRick2
Followers
48
Following
277
Media
36
Statuses
2K
PhD in computer science | Technical Lead at Foxconn AI,| making large Neural networks run on Edge devices
Taoyuan County, Taiwan
Joined January 2021
Hugging Face has released a 214-page MASTERCLASS on how to train LLMs > it’s called The Smol Training Playbook > and if want to learn how to train LLMs, > this GIFT is for you > this training bible walks you through the ENTIRE pipeline > covers every concept that matters from
18
219
1K
Updated & turned my Big LLM Architecture Comparison article into a narrated video lecture. The 11 LLM architectures covered in this video: 1. DeepSeek V3/R1 2. OLMo 2 3. Gemma 3 4. Mistral Small 3.1 5. Llama 4 6. Qwen3 7. SmolLM3 8. Kimi 2 9. GPT-OSS 10. Grok 2.5 11. GLM-4.5
40
495
3K
RAG is not Memory for AI Agents. 5 AI memory engines to build agents that maintain long-term context and learn continuously: (Last 2 released just this month) 1. Zep builds and queries temporally-aware knowledge graphs that evolve with every interaction. 100% Opensource.
18
96
541
Web scraping will never be the same! Firecrawl just released the new v2 endpoint with 10x faster scraping and semantic crawling. Firecrawl lets you input a URL, crawl it, and convert it into clean LLM-ready data.
20
287
2K
Stop Prompting LLMs. Start Programming LLMs. Introducing DSPy by Stanford NLP. This is why you need to learn it:
14
152
1K
GPT-5 just casually did new mathematics. Sebastien Bubeck gave it an open problem from convex optimization, something humans had only partially solved. GPT-5-Pro sat down, reasoned for 17 minutes, and produced a correct proof improving the known bound from 1/L all the way to
Claim: gpt-5-pro can prove new interesting mathematics. Proof: I took a convex optimization paper with a clean open problem in it and asked gpt-5-pro to work on it. It proved a better bound than what is in the paper, and I checked the proof it's correct. Details below.
979
3K
25K
curious about the training data of OpenAI's new gpt-oss models? i was too. so i generated 10M examples from gpt-oss-20b, ran some analysis, and the results were... pretty bizarre time for a deep dive 🧵
126
514
6K
Introducing Codestral Embed, the new state-of-the-art embedding model for code.
27
153
1K
I'm doing a podcast with @sundarpichai soon. Let me know if you have any questions / topic suggestions. The rate of AI progress has been insane. It makes me excited for the future (even more than usual 🤣) and excited to chat with leaders & engineers who are building that
846
259
5K
NeMo RL is now open source! It replaces NeMo-Aligner and is the toolkit we use to post train next generations of our models. Give it a try
github.com
Scalable toolkit for efficient model reinforcement - NVIDIA-NeMo/RL
5
65
394
Must-read on RL by Google DeepMind's Research Scientist Kevin Murphy dropped on ArXiv. It gives a clear, updated overview of deep RL and sequential decision-making, with examples.
12
140
998
FoxBrain #LLM debut at @NVIDIAGTC “Amazing talk! I mean like top 3 of all the talks I attended this week, so congrats!” says participant at Q&A for #GTC25 Session Talk [S74035]: From Open Source to Frontier #AI: Build, Customize, and Extend Foundation Models @HonHai_Foxconn
0
1
3
🚀 Excited to Speak at NVIDIA GTC 2025: The Journey Behind FoxBrain! 🚀 Our session tomorrow, where I’ll be sharing insights into FoxBrain, Foxconn’s first prototype foundation model! 🎉
0
0
0
Introducing Mistral Small 3.1. Multimodal, Apache 2.0, outperforms Gemma 3 and GPT 4o-mini. https://t.co/BHLAAaKZ9w
269
1K
8K
FoxBrain has sped up adoption of inference & AI servers, says @HonHai_Foxconn #YoungLiu at 4Q24 #Investor Call. Come see why. Thu 3/20 #GTC25 Session Talk [S74035]: From Open Source to Frontier #AI ... Foundation Models ➡️ https://t.co/3EmemGoqFY Booth 323 @NVIDIAGTC #LLM
0
3
5
MCP is going crazy viral right now🤯 AI apps can now instantly connect to any tool or live data. USB-C moment for AI. 10 wild examples: https://t.co/qLnMkgBA0H
286
1K
9K
Today @cohere is very excited to introduce Command A, our new model succeeding Command R+. Command A is an open-weights 111B parameter model with a 256k context window focused on delivering great performance across agentic, multilingual, and coding usecases. 🧵
29
119
823
HOLY SHITT, Sesame Labs just dropped CSM (Conversational Speech Model) - Apache 2.0 licensed! 💥 > Trained on 1 MILLION hours of data 🤯 > Contextually aware, emotionally intelligent speech > Voice cloning & watermarking > Ultra fast, real-time synthesis > Based on llama
128
639
5K
🚀 Day 6 of #OpenSourceWeek: One More Thing – DeepSeek-V3/R1 Inference System Overview Optimized throughput and latency via: 🔧 Cross-node EP-powered batch scaling 🔄 Computation-communication overlap ⚖️ Load balancing Statistics of DeepSeek's Online Service: ⚡ 73.7k/14.8k
github.com
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation - deepseek-ai/open-infra-index
782
1K
9K
🚀 Day 4 of #OpenSourceWeek: Optimized Parallelism Strategies ✅ DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training. 🔗 https://t.co/GBtxSvWLT4 ✅ EPLB - an expert-parallel load balancer for V3/R1. 🔗
github.com
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training. - deepseek-ai/DualPipe
445
817
6K