Explore tweets tagged as #multimodal
if you're looking for a comprehensive guide to LLM finetuning, check this! a free 115-page book on arxiv, covering: > fundamentals of LLM > peft (lora, qlora, dora, hft) > alignment methods (ppo, dpo, grpo) > mixture of experts (MoE) > 7-stage fine-tuning pipeline > multimodal
38
214
953
All-in-One RAG System! RAG-Anything is a unified framework with a multi-stage multimodal pipeline that extends traditional RAG architectures. 100% Open Source
20
244
1K
Our team at FAIR is hiring PhD research interns for 2026 on the topics of multimodal multi-agent learning. If you are interested, feel free to DM me or directly apply using the link below! https://t.co/JrHoDAPDnP
2
32
187
🔥 Holy shit... Apple just did something nobody saw coming They just dropped Pico-Banana-400K a 400,000-image dataset for text-guided image editing that might redefine multimodal training itself. Here’s the wild part: Unlike most “open” datasets that rely on synthetic
36
107
650
This is the JPEG moment for AI. Optical compression doesn't just make context cheaper. It makes AI memory architectures viable. Training data bottlenecks? Solved. - 200k pages/day on ONE GPU - 33M pages/day on 20 nodes - Every multimodal model is data-constrained. Not anymore.
111
724
6K
👨🔧 Github: RAG-Anything: All-in-One RAG Framework 7.6k Stars ⭐️ All-in-One Multimodal Document Processing RAG system built on LightRAG. You can query documents containing interleaved text, visual diagrams, structured tables, and mathematical formulations through one interface.
15
137
968
🚀Absolutely thrilled to share that our team, royal_recruits, came in rank 4 amongst ~86,000 registrations in the Amazon ML Challenge. The task: predict product prices from just text and images. This was a deep dive into multimodal learning. Here's a thread on how we built it. 🧵
12
2
43
Andrej Karpathy says today's agents aren't ready to work like real coworkers or interns They lack intelligence, can't use computers, aren't multimodal, lack continual learning, and forget what you tell them Fixing these gaps will take about a decade
200
302
3K
If this Karpathy interview doesn't pop the ai bubble, nothing will. 10 brutal quotes: 1. LLMs don’t work yet They don’t have enough intelligence, they’re not multimodal enough, they can’t use computers, and they don’t remember what you tell them. They’re cognitively lacking.
316
837
6K
📢 PhD Students in GenAI/RL! Our team at FAIR is hiring a Research Intern for Summer 2026 to push the boundaries of multimodal multi-agent social interaction. Learn more and apply: https://t.co/7P66mnEY97
7
48
316
🚀 Exciting opportunity! We are hiring research interns (current PhD students) at @Meta FAIR to advance multi-agent, multimodal AI! Work on text, audio, images & more, collaborate with top mentors, and help shape the future of AI at scale. Apply:
2
43
220
Dreamina 4.0 — the next-generation multimodal AI design model. From text-to-image creation and smart editing to large-scale content generation — Dreamina brings every idea to life through natural language. Your creativity, amplified. ⚡ Prompt: Turn the figure in the image
76
54
154
I was impacted by FAIR layoff this time. I'm looking for a new position on speech, multimodal, 3D human motion, and social behavior modeling. Happy to chat more details:)
16
42
247
🌍✈️ Meet my Multimodal Travel Assistant - an AI agent that makes trip planning smarter! 🚀 🗺️ Creates custom travel plans 🎙️ Talks with audio replies 🎨 Generates pop-art travel images 💻 Built with GPT & Gradio #AI #MultimodalAI #GenerativeAI #TravelTech #GPT #Gradio
0
0
2
#AITHYRA, Vienna's new Biomedical AI institute, is hiring Postdocs! Come work with us. Openings in: 🔹 Generative AI 🔹 Multimodal ML 🔹 Virology 🔹 Enzyme Function Apply by Nov 20: https://t.co/8jNpkhdw1x
#PostDoc #AI #ML #Vienna #ScienceJobs
1
14
55
Our team at #NVIDIA Research is hiring summer intern 2026 on areas including Video Generative models, Controllable/Physically-grounded (3D/4D) GenAI, human-robot/agent interaction (e.g., multimodal LLM). Please email me with a CV if interested.
12
38
374
Top 10 ChatGPT Alternatives (2025) 1. Claude (Anthropic) – Smart, safe, great for long docs. 2. Google Gemini – Strong search + multimodal power. 3. Microsoft Copilot – Best for Office, writing & workflow. 4. Perplexity AI – Research + chat with real sources. 5. DeepSeek –
0
0
4
Our recent research will be presented at #ICCV2025 @ICCVConference! We’ll present 5 papers about: 💡 self-supervised & representation learning 🌍 3D occupancy & multi-sensor perception 🧩 open-vocabulary segmentation 🧠 multimodal LLMs & explainability https://t.co/Tg0Vx3oS94
1
7
19