
Junting Pan @ICCV 2025
@junting9
Followers
928
Following
2K
Media
12
Statuses
136
Research Scientist @ Apple AIML | Prev: PhD@MMLab CUHK, Intern @AIatMeta (FAIR) and @samsungresearch. Working on foundation models.
Joined December 2013
Our MathVision benchmark is accepted by NeurIPS DB Track, 2024! We show a notable performance gap between current LMMs and human performance on simple math problems with visual context. Dataset: https://t.co/vhCqwCHwSU Paper: https://t.co/GB2nyWxoGp
1
1
16
The Foundation Model Team @🍎Apple AI/ML is looking for a Research Intern (flexible start date) to work on Multimodal LLMs and Vision-Language. Interested? DM me to learn more!
28
20
464
Last year at @Apple MLR, we published a number of interesting papers like AIM, AIMv2, and Scaling laws for: Sparsity, Native Multimodal Models, Data mixing. Today the team has open-sourced the training codebase we used for conducting this research! https://t.co/WNvOWMkgm3
github.com
Large multi-modal models (L3M) pre-training. Contribute to apple/ml-l3m development by creating an account on GitHub.
4
57
448
In this report we describe the 2025 Apple Foundation Models ("AFM"). We also introduce the new Foundation Models framework, which gives app developers direct access to the on-device AFM model. https://t.co/nEbtxuGrjD
machinelearning.apple.com
We introduce two multilingual, multimodal foundation language models that power Apple Intelligence features across Apple devices and…
514
94
460
Our computer vision textbook is now available for free online here: https://t.co/ERy2Spc7c2 We are working on adding some interactive components like search and (beta) integration with LLMs. Hope this is useful and feel free to submit Github issues to help us improve the text!
visionbook.mit.edu
35
620
3K
🌟Thrilled to share that SAM 2 was awarded a Best Paper Honourable Mention Award at #ICLR2025, one of 6 papers recognized out of 11000+ submissions! 👏This project was the result of amazing work by an exceptional team at @AIatMeta FAIR: @vgabeur , @YuanTingHu1,@RonghangHu,
Honorable Mentions Data Shapley in One Training Run. Jiachen T. Wang, et al. SAM 2: Segment Anything in Images and Videos. Nikhila Ravi, et al. Faster Cascades via Speculative Decoding. Harikrishna Narasimhan, et al.
3
15
107
I am looking for strong PhD interns to join Apple MLR late 2024 or early 2025! Topics will be around training large-scale diffusion/flow matching models broadly speaking and you’ll be in the bay area (Cupertino/SF). Apply here: https://t.co/5gKIBnK6oP. [1/5]
4
69
368
Looking for a 2025 summer research intern, in the Foundation Model Team at Apple AI/ML, with the focus of Multimodal LLM / Vision-Language. Phd preferred. Apply through https://t.co/m243cnfXay Also email me your resume to haoxuanyou@gmail.com! 😊
16
69
432
So much fun at #AdobeMAX sneak! As a researcher, this is so far the biggest stage I have ever stepped on. Turns out it’s easier to interact with 10k+ audiences than with 1 for an introvert :p Grateful for all the applause, energy, and suports!
0
0
5
I am presenting at #AdobeMAX next week! Get a sneak peak to our latest research on image composition and relighting on Oct15th at MAX sneak session (5.30 to 7 pm EST). Online registration (free): https://t.co/d2VCC0rFzO
1
2
14
SAM2 is truly amazing! I am so proud to have been a part of this incredible team!
Introducing Meta Segment Anything Model 2 (SAM 2) — the first unified model for real-time, promptable object segmentation in images & videos. SAM 2 is available today under Apache 2.0 so that anyone can use it to build their own experiences Details ➡️ https://t.co/eTTDpxI60h
1
1
38
🥁 Llama3 is out 🥁 8B and 70B models available today. 8k context length. Trained with 15 trillion tokens on a custom-built 24k GPU cluster. Great performance on various benchmarks, with Llam3-8B doing better than Llama2-70B in some cases. More versions are coming over the next
212
1K
7K
📢📢 I am looking for a student researcher to work with me and my colleagues at Google DeepMind Zürich on vision-language research. It will be a 100% 24 weeks onsite position in Switzerland. Reach out to me (xzhai@google.com) if interested. Bonus: amazing view🏔️👇
6
29
242
Glad to share that our project Relightful Harmonization: Lighting-aware portrait background replacement has been accepted to #CVPR2024. 🧵 Project page: https://t.co/XRl3tWBzVw Preprint: https://t.co/xMZspSdvxx
8
42
176
Muffin or Chihuahua in a multipanel image? Most people can, but GPT-4V struggles! Contrary to popular belief that only experts can outperform (Multimodal) LLMs, average humans often prove to be more intelligent. Our Multipanel VQA study reveals this gap, where human accuracy
Distinguish muffins from chihuahuas in a multipanel web screenshot? No problem for humans (99% accuracy), but hard for Large Vision-Language Models (LVLMs) (39-72% accuracy)! To find out how LVLMs do and what affects their ability regarding multipanel image understanding, we
2
16
93
#CUHK has learned with deep sorrow about the passing of Prof Tang Xiaoou of the Department of Information Engineering. Prof Tang joined CUHK since 1998 and was one of the most influential scientists working in AI field. Details: https://t.co/lgBOc98iPA
5
9
61