akshay_pachaar Profile Banner
Akshay πŸš€ Profile
Akshay πŸš€

@akshay_pachaar

Followers
215K
Following
21K
Media
4K
Statuses
18K

Simplifying LLMs, AI Agents, RAGs and Machine Learning for you! β€’ Co-founder @dailydoseofds_β€’ BITS Pilani β€’ 3 Patents β€’ ex-AI Engineer @ LightningAI

Learn AI Engineering πŸ‘‰
Joined July 2012
Don't wanna be here? Send us removal request.
@akshay_pachaar
Akshay πŸš€
2 years
My lecture at MIT!✨. From Physics to Linear Algebra & Machine learning, I have learned a lot from MIT!. Yesterday, I had the honour of delivering a guest lecture on The state of AI Engineering, exploring:. - Prompt Engineering.- Retrieval Augmented Generation. - Fine-Tuning
Tweet media one
98
360
3K
@akshay_pachaar
Akshay πŸš€
19 hours
That's a wrap!. If you found it insightful, reshare with your network. Find me β†’ @akshay_pachaar βœ”οΈ.For more insights and tutorials on LLMs, AI Agents, and Machine Learning!.
@akshay_pachaar
Akshay πŸš€
19 hours
How LLMs train LLMs, clearly explained (with visuals):.
0
0
4
@akshay_pachaar
Akshay πŸš€
19 hours
Those were the three techniques to train one LLM using another. We discussed:. - Soft-label distillation.- Hard-label distillation.- Co-distillation. Here's the visual again for your reference πŸ‘‡
1
1
10
@akshay_pachaar
Akshay πŸš€
19 hours
Meta used co-distillation to train Llama 4 Scout and Maverick from Llama 4 Behemoth. Of course, during the initial stages, soft labels of the Teacher LLM won't be accurate. That is why Student LLM is trained using both soft labels + ground-truth hard labels.
1
1
4
@akshay_pachaar
Akshay πŸš€
19 hours
3️⃣ Co-distillation. - Start with an untrained Teacher and Student LLM. - Generate softmax probs over the current batch from both models. - Train the Teacher LLM on the hard labels. - Train the Student LLM to match softmax probs of the Teacher. Check this visual πŸ‘‡
1
1
6
@akshay_pachaar
Akshay πŸš€
19 hours
2️⃣ Hard-label distillation. - Use the Teacher LLM to get the output token. - Get the softmax probs. from the Student LLM. - Train the Student to match Teacher's output. DeepSeek-R1 was distilled into Qwen & Llama using this technique. Check this visual πŸ‘‡
1
1
4
@akshay_pachaar
Akshay πŸš€
19 hours
Say your vocab has 100k tokens and data has 5 trillion tokens. Storing softmax probabilities over the entire vocab for each input token needs 500M GBs of memory under fp8 precision. This is where we jump to our second technique . πŸ‘‡.
1
0
4
@akshay_pachaar
Akshay πŸš€
19 hours
In soft-label distillation, having access to the Teacher's probabilities ensures maximum knowledge transfer. However, to obtain the probability distribution, you must have access to the Teacher’s weights. Even with access, another challenge arises.
1
1
6
@akshay_pachaar
Akshay πŸš€
19 hours
1️⃣ Soft-label Distillation. Generate token-level softmax probabilities over the entire corpus using:. - A frozen, pre-trained Teacher LLM.- An untrained Student LLM. Train the Student LLM to match the Teacher's probabilities. Check this outπŸ‘‡
1
1
5
@akshay_pachaar
Akshay πŸš€
19 hours
LLMs learn not only from raw text but also from other models. Google’s Gemmaβ€―2 andβ€―3, for example, were distilled from the larger Gemini model. Today we cover, the three most common knowledge‑distillation methods. Let's dive in! πŸš€
2
5
29
@akshay_pachaar
Akshay πŸš€
19 hours
How LLMs train LLMs, clearly explained (with visuals):.
5
45
433
@akshay_pachaar
Akshay πŸš€
2 days
If you found it insightful, reshare with your network. Find me β†’ @akshay_pachaar βœ”οΈ.For more insights and tutorials on LLMs, AI Agents, and Machine Learning!.
@akshay_pachaar
Akshay πŸš€
2 days
Let's build a "Chat with your Code" RAG app using Qwen3-Coder:.
1
1
1
@akshay_pachaar
Akshay πŸš€
2 days
Finally, I'll leave you with the architecture diagram of the app we've built. Hope you enjoyed this tutorial. Stay tuned for more! πŸ₯‚
2
1
14
@akshay_pachaar
Akshay πŸš€
2 days
You can find all the code in this GitHub repo:. (don't forget to star 🌟).
1
0
5
@akshay_pachaar
Akshay πŸš€
2 days
Bonus!. We will use @CleanlabAI's AI codex, a smart way to validate and improve your responses. We've used the same for getting the trustworthiness score. Seamlessly integrates with any agentic or AI chat application you're developing. Check this outπŸ‘‡
Tweet media one
1
0
8
@akshay_pachaar
Akshay πŸš€
2 days
7️⃣ The Chat interface. We create a UI using Streamlit to provide a chat interface for our RAG application. The code for this & all we discussed so far is shared in the next tweet!. Check this outπŸ‘‡
Tweet media one
1
0
5
@akshay_pachaar
Akshay πŸš€
2 days
6️⃣ Setting up a query engine. The query engine takes query string to use it to fetch relevant context and combines them using the prompt template before sending it to the LLM that generates final response!. The LLM used here is the latest Qwen3-Coder!
Tweet media one
1
0
4
@akshay_pachaar
Akshay πŸš€
2 days
5️⃣ Creating a prompt template. A custom prompt template is use to refine the response from LLM & include the context as well:
Tweet media one
1
0
3
@akshay_pachaar
Akshay πŸš€
2 days
4️⃣ Indexing & storing. Embeddings created by embedding model are stored in a vector store that offers fast retrieval and similarity search by creating an index over our data. We'll use a self-hosted @Milvusio vector database:
Tweet media one
1
0
3
@akshay_pachaar
Akshay πŸš€
2 days
3️⃣ The embedding model. Embedding is a meaningful representation of text in form of numbers. The embedding model is responsible for creating embeddings for the document chunks & user queries. Here's how we load our embedding model:
Tweet media one
1
0
3
@akshay_pachaar
Akshay πŸš€
2 days
1️⃣ & 2️⃣ : Loading the knowledge base. A knowledge base is a collection of relevant and up-to-date information that serves as a foundation for RAG. In our case it's a GitHub repository!. Here's how we chunk & parse our code base using @Llama_Index's hierarchical code parser:
Tweet media one
1
0
4