Explore tweets tagged as #multimodal
@akshay_pachaar
Akshay 🚀
6 days
if you're looking for a comprehensive guide to LLM finetuning, check this! a free 115-page book on arxiv, covering: > fundamentals of LLM > peft (lora, qlora, dora, hft) > alignment methods (ppo, dpo, grpo) > mixture of experts (MoE) > 7-stage fine-tuning pipeline > multimodal
38
214
953
@Sumanth_077
Sumanth
8 days
All-in-One RAG System! RAG-Anything is a unified framework with a multi-stage multimodal pipeline that extends traditional RAG architectures. 100% Open Source
20
244
1K
@lukias0
Lukias.
4 days
🧠 Google just teased Gemini 3.0 🔹 Enhanced reasoning with “Deep Think” 🔹 Multimodal: video, 3D, geospatial data 🔹 Larger context windows for longer conversations 🔹 Smarter code generation Release expected Nov 12, 2025 #GoogleAI #Gemini3 #AI
5
1
18
@AnsongNi
Ansong Ni
5 days
Our team at FAIR is hiring PhD research interns for 2026 on the topics of multimodal multi-agent learning. If you are interested, feel free to DM me or directly apply using the link below! https://t.co/JrHoDAPDnP
2
32
187
@alex_prompter
Alex Prompter
13 hours
🔥 Holy shit... Apple just did something nobody saw coming They just dropped Pico-Banana-400K a 400,000-image dataset for text-guided image editing that might redefine multimodal training itself. Here’s the wild part: Unlike most “open” datasets that rely on synthetic
36
107
650
@RayFernando1337
Ray Fernando
6 days
This is the JPEG moment for AI. Optical compression doesn't just make context cheaper. It makes AI memory architectures viable. Training data bottlenecks? Solved. - 200k pages/day on ONE GPU - 33M pages/day on 20 nodes - Every multimodal model is data-constrained. Not anymore.
111
724
6K
@rohanpaul_ai
Rohan Paul
7 days
👨‍🔧 Github: RAG-Anything: All-in-One RAG Framework 7.6k Stars ⭐️ All-in-One Multimodal Document Processing RAG system built on LightRAG. You can query documents containing interleaved text, visual diagrams, structured tables, and mathematical formulations through one interface.
15
137
968
@SidharthaGarg
Sidhartha
2 days
🚀Absolutely thrilled to share that our team, royal_recruits, came in rank 4 amongst ~86,000 registrations in the Amazon ML Challenge. The task: predict product prices from just text and images. This was a deep dive into multimodal learning. Here's a thread on how we built it. 🧵
12
2
43
@slow_developer
Haider.
8 days
Andrej Karpathy says today's agents aren't ready to work like real coworkers or interns They lack intelligence, can't use computers, aren't multimodal, lack continual learning, and forget what you tell them Fixing these gaps will take about a decade
200
302
3K
@Prithvir12
PJ
6 days
If this Karpathy interview doesn't pop the ai bubble, nothing will. 10 brutal quotes: 1. LLMs don’t work yet They don’t have enough intelligence, they’re not multimodal enough, they can’t use computers, and they don’t remember what you tell them. They’re cognitively lacking.
316
837
6K
@bvp22294
Bhargavi Paranjape
5 days
📢 PhD Students in GenAI/RL! Our team at FAIR is hiring a Research Intern for Summer 2026 to push the boundaries of multimodal multi-agent social interaction. Learn more and apply: https://t.co/7P66mnEY97
7
48
316
@real_asli
Asli Celikyilmaz
9 days
🚀 Exciting opportunity! We are hiring research interns (current PhD students) at @Meta FAIR to advance multi-agent, multimodal AI! Work on text, audio, images & more, collaborate with top mentors, and help shape the future of AI at scale. Apply:
2
43
220
@manishkumar_dev
Manish Kumar Shah
12 hours
Dreamina 4.0 — the next-generation multimodal AI design model. From text-to-image creation and smart editing to large-scale content generation — Dreamina brings every idea to life through natural language. Your creativity, amplified. ⚡ Prompt: Turn the figure in the image
76
54
154
@HirofumiInaguma
Hirofumi Inaguma
2 days
I was impacted by FAIR layoff this time. I'm looking for a new position on speech, multimodal, 3D human motion, and social behavior modeling. Happy to chat more details:)
16
42
247
@Amey___14
Amey Sankhe
5 days
🌍✈️ Meet my Multimodal Travel Assistant - an AI agent that makes trip planning smarter! 🚀 🗺️ Creates custom travel plans 🎙️ Talks with audio replies 🎨 Generates pop-art travel images 💻 Built with GPT & Gradio #AI #MultimodalAI #GenerativeAI #TravelTech #GPT #Gradio
0
0
2
@AlexanderTong7
Alex Tong
5 days
#AITHYRA, Vienna's new Biomedical AI institute, is hiring Postdocs! Come work with us. Openings in: 🔹 Generative AI 🔹 Multimodal ML 🔹 Virology 🔹 Enzyme Function Apply by Nov 20: https://t.co/8jNpkhdw1x #PostDoc #AI #ML #Vienna #ScienceJobs
1
14
55
@luminohope
Koki Nagano
5 days
Our team at #NVIDIA Research is hiring summer intern 2026 on areas including Video Generative models, Controllable/Physically-grounded (3D/4D) GenAI, human-robot/agent interaction (e.g., multimodal LLM). Please email me with a CV if interested.
12
38
374
@Content_VA
Mamoona
9 days
Top 10 ChatGPT Alternatives (2025) 1. Claude (Anthropic) – Smart, safe, great for long docs. 2. Google Gemini – Strong search + multimodal power. 3. Microsoft Copilot – Best for Office, writing & workflow. 4. Perplexity AI – Research + chat with real sources. 5. DeepSeek –
0
0
4
@valeoai
valeo.ai
8 days
Our recent research will be presented at #ICCV2025 @ICCVConference! We’ll present 5 papers about: 💡 self-supervised & representation learning 🌍 3D occupancy & multi-sensor perception 🧩 open-vocabulary segmentation 🧠 multimodal LLMs & explainability https://t.co/Tg0Vx3oS94
1
7
19