SovitRath5 Profile
SovitRath5

@SovitRath5

Followers
142
Following
1K
Media
304
Statuses
569

Lead SWE (GenAI and LLMs) @ Indegene Blog - https://t.co/Rq2WcIT5QC GitHub - https://t.co/9PmVei4IoP

Joined January 2017
Don't wanna be here? Send us removal request.
@SovitRath5
SovitRath5
5 days
LitGPT provides access to several high-performance LLMs along with easy loading of pretrained LLMs for inference, training, and evaluation. In this week's article, we are going to cover an introduction to LitGPT. LitGPT – Getting Started =>
Tweet media one
Tweet media two
Tweet media three
0
0
0
@SovitRath5
SovitRath5
12 days
A while back, Qwen3 was released. They bring MoE (Mixture of Experts) and a unified mode for thinking and non-thinking. We are going to discuss all these in this week's article on DebuggerCafe. Qwen3 – Unified Models for Thinking and Non-Thinking =>
Tweet media one
Tweet media two
0
0
1
@SovitRath5
SovitRath5
19 days
In this week's article on DebuggerCafe, we fine-tune the Web-DINO model for person segmentation. We add a simple 2-layer convolutional decoder head on top of the frozen model to achieve this.
Tweet media one
Tweet media two
0
0
1
@SovitRath5
SovitRath5
26 days
Web-DINO models trained via the Web-SSL framework on internet-scale image data are excellent at downstream tasks. This week's article on DebuggerCafe covers training the Web-DINO model for image classification task. Image Classification with Web-DINO =>
Tweet media one
Tweet media two
0
0
0
@SovitRath5
SovitRath5
27 days
Added semantic segmentation training, image & video inference scripts for I-JEPA downstream tasks repository. Not sure how well the results will come after longer training. Right now, the head is a just a linear pixel classifier.
Tweet media one
Tweet media two
Tweet media three
Tweet media four
0
0
1
@SovitRath5
SovitRath5
27 days
Working on downstream tasks using I-JEPA. Added image similarity code and image classification training & inference. The README contains steps for classification training. Will add semantic segmentation and image search soon.
0
0
1
@SovitRath5
SovitRath5
1 month
New article on DebuggerCafe about Web-SSL: A new family on DINOv2 models trained on internet scale image data without language supervision.
Tweet media one
Tweet media two
Tweet media three
0
0
0
@SovitRath5
SovitRath5
1 month
New article on DebuggerCafe. Inference using SmolVLM2 for image understanding, OCR, and video understanding. Getting Started with SmolVLM2 – Code Inference =>
0
0
0
@SovitRath5
SovitRath5
1 month
My Faster RCNN Training Pipeline repository now supports YOLO annotation type along with Pascal VOC. Who will the above update help? Anybody running benchmarks for YOLO vs Faster RCNN and not wanting to convert dataset format.
Tweet media one
Tweet media two
0
0
1
@SovitRath5
SovitRath5
2 months
This week's article on DebuggerCafe covers Qwen2.5-Omni, one of the open-source multimodal models. Qwen2.5-Omni can accept text, audio, video, and image as inputs. It can produce audio outputs as well. Qwen2.5-Omni: An Introduction =>
Tweet media one
Tweet media two
Tweet media three
0
0
0
@SovitRath5
SovitRath5
2 months
Fine-Tuning SmolVLM for Receipt OCR => Topics covered: .* Why is receipt OCR challenging for smaller VLMs? .* The SROIE v2 dataset. * How to create the ground truth annotations for the receipt OCR using a larger VLM? .* Fine-tuning and inference.
0
1
2
@SovitRath5
SovitRath5
2 months
In this week's article on DebuggerCafe, we cover Gemma 3 => . The article covers the following: * The need for Gemma 3 * Gemma 3 architecture .* Brief discussion on the benchmarks .* Inference covering text generation, image understanding, and OCR.
0
0
0
@SovitRath5
SovitRath5
2 months
In this week's article on DebuggerCafe, we cover SmolVLM. SmolVLM: Accessible Image Captioning with Small Vision Language Model => We cover: .* Image Captioning with SmolVLM .* Building a Gradio application to access SmolVLM functionality
Tweet media one
Tweet media two
0
0
0
@SovitRath5
SovitRath5
2 months
Gradio Application using Qwen2.5-VL => . Topics covered: .* Image Captioning with Qwen2.5-VL .* Video Captioning with Qwen2.5-VL .* Object Detection with Qwen2.5-VL .* Gradio application to access the Qwen2.5-VL functionality
Tweet media one
Tweet media two
0
0
0
@SovitRath5
SovitRath5
3 months
In this week's article on DebuggerCafe, we are covering the Qwen2.5-VL model. Qwen2.5-VL: Architecture, Benchmarks and Inference => #DeepLearning.
0
0
1
@SovitRath5
SovitRath5
3 months
Phi-4 Mini and Phi-4 Multimodal => We cover the following components of Phi-4:.* Phi-4 Mini language model architecture.* Phi-4 multimodal model architecture.* Benchmarks.* Running inference using Phi-4 Mini language model.#DeepLearning
Tweet media one
Tweet media two
0
0
0
@SovitRath5
SovitRath5
3 months
In this week's article on DebuggerCafe, we will cover the ViTPose architecture for human pose estimation. ViTPose – Human Pose Estimation with Vision Transformer => Topics covered:.* What is ViTPose? .* ViTPose architecture. * Image and video inference.
0
0
1
@SovitRath5
SovitRath5
3 months
New article on DebuggerCafe. Microsoft Autogen – An Introduction => * Setting up Microsoft Autogen locally.* Using Claude Models. * Simple model chat. * Creating a single agent. * Creating teams of multiple agents
Tweet media one
0
0
0
@SovitRath5
SovitRath5
3 months
Updated notebooks with Flash Attention training and inference for receipt_ocr. All adapter weights are pushed. Now, you can train with less than 10GB VRAM in less than an hour. Next:.* Publish error rates. * Larger dataset training with augmentations.
0
0
1
@SovitRath5
SovitRath5
4 months
Working on a new project. Fine-tuning SmolVLM for receipt OCR. Initial results after training SmolVLM-256M. Trained adapters are directly pushed to GitHub. Check the notebooks folder. #DeepLearning #HuggingFace #SmolVLM.@huggingface
Tweet media one
0
0
1