SovitRath5 @SovitRath5 X Profile

SovitRath5

@SovitRath5

Followers

142

Following

1K

Media

304

Statuses

569

Lead SWE (GenAI and LLMs) @ Indegene Blog - https://t.co/Rq2WcIT5QC GitHub - https://t.co/9PmVei4IoP

Joined January 2017

Don't wanna be here? Send us removal request.

SovitRath5

@SovitRath5

5 days

LitGPT provides access to several high-performance LLMs along with easy loading of pretrained LLMs for inference, training, and evaluation. In this week's article, we are going to cover an introduction to LitGPT. LitGPT – Getting Started =>

0

SovitRath5

@SovitRath5

12 days

A while back, Qwen3 was released. They bring MoE (Mixture of Experts) and a unified mode for thinking and non-thinking. We are going to discuss all these in this week's article on DebuggerCafe. Qwen3 – Unified Models for Thinking and Non-Thinking =>

0

1

SovitRath5

@SovitRath5

19 days

In this week's article on DebuggerCafe, we fine-tune the Web-DINO model for person segmentation. We add a simple 2-layer convolutional decoder head on top of the frozen model to achieve this.

0

1

SovitRath5

@SovitRath5

26 days

Web-DINO models trained via the Web-SSL framework on internet-scale image data are excellent at downstream tasks. This week's article on DebuggerCafe covers training the Web-DINO model for image classification task. Image Classification with Web-DINO =>

0

SovitRath5

@SovitRath5

27 days

Added semantic segmentation training, image & video inference scripts for I-JEPA downstream tasks repository. Not sure how well the results will come after longer training. Right now, the head is a just a linear pixel classifier.

0

1

SovitRath5

@SovitRath5

27 days

Working on downstream tasks using I-JEPA. Added image similarity code and image classification training & inference. The README contains steps for classification training. Will add semantic segmentation and image search soon.

0

1

SovitRath5

@SovitRath5

1 month

New article on DebuggerCafe about Web-SSL: A new family on DINOv2 models trained on internet scale image data without language supervision.

0

SovitRath5

@SovitRath5

1 month

New article on DebuggerCafe. Inference using SmolVLM2 for image understanding, OCR, and video understanding. Getting Started with SmolVLM2 – Code Inference =>

0

SovitRath5

@SovitRath5

1 month

My Faster RCNN Training Pipeline repository now supports YOLO annotation type along with Pascal VOC. Who will the above update help? Anybody running benchmarks for YOLO vs Faster RCNN and not wanting to convert dataset format.

0

1

SovitRath5

@SovitRath5

2 months

This week's article on DebuggerCafe covers Qwen2.5-Omni, one of the open-source multimodal models. Qwen2.5-Omni can accept text, audio, video, and image as inputs. It can produce audio outputs as well. Qwen2.5-Omni: An Introduction =>

0

SovitRath5

@SovitRath5

2 months

Fine-Tuning SmolVLM for Receipt OCR => Topics covered: .* Why is receipt OCR challenging for smaller VLMs? .* The SROIE v2 dataset. * How to create the ground truth annotations for the receipt OCR using a larger VLM? .* Fine-tuning and inference.

0

1

2

SovitRath5

@SovitRath5

2 months

In this week's article on DebuggerCafe, we cover Gemma 3 => . The article covers the following: * The need for Gemma 3 * Gemma 3 architecture .* Brief discussion on the benchmarks .* Inference covering text generation, image understanding, and OCR.

0

SovitRath5

@SovitRath5

2 months

In this week's article on DebuggerCafe, we cover SmolVLM. SmolVLM: Accessible Image Captioning with Small Vision Language Model => We cover: .* Image Captioning with SmolVLM .* Building a Gradio application to access SmolVLM functionality

0

SovitRath5

@SovitRath5

2 months

Gradio Application using Qwen2.5-VL => . Topics covered: .* Image Captioning with Qwen2.5-VL .* Video Captioning with Qwen2.5-VL .* Object Detection with Qwen2.5-VL .* Gradio application to access the Qwen2.5-VL functionality

0

SovitRath5

@SovitRath5

3 months

In this week's article on DebuggerCafe, we are covering the Qwen2.5-VL model. Qwen2.5-VL: Architecture, Benchmarks and Inference => #DeepLearning.

0

1

SovitRath5

@SovitRath5

3 months

Phi-4 Mini and Phi-4 Multimodal => We cover the following components of Phi-4:.* Phi-4 Mini language model architecture.* Phi-4 multimodal model architecture.* Benchmarks.* Running inference using Phi-4 Mini language model.#DeepLearning

0

SovitRath5

@SovitRath5

3 months

In this week's article on DebuggerCafe, we will cover the ViTPose architecture for human pose estimation. ViTPose – Human Pose Estimation with Vision Transformer => Topics covered:.* What is ViTPose? .* ViTPose architecture. * Image and video inference.

0

1

SovitRath5

@SovitRath5

3 months

New article on DebuggerCafe. Microsoft Autogen – An Introduction => * Setting up Microsoft Autogen locally.* Using Claude Models. * Simple model chat. * Creating a single agent. * Creating teams of multiple agents

0

SovitRath5

@SovitRath5

3 months

Updated notebooks with Flash Attention training and inference for receipt_ocr. All adapter weights are pushed. Now, you can train with less than 10GB VRAM in less than an hour. Next:.* Publish error rates. * Larger dataset training with augmentations.

0

1

SovitRath5

@SovitRath5

4 months

Working on a new project. Fine-tuning SmolVLM for receipt OCR. Initial results after training SmolVLM-256M. Trained adapters are directly pushed to GitHub. Check the notebooks folder. #DeepLearning #HuggingFace #SmolVLM.@huggingface

0

1