Alexander Toshev
@alexttoshev
Followers
1K
Following
130
Media
17
Statuses
104
Researcher in CV, ML, and Embodied AI.
San Francisco, CA
Joined July 2017
Apple presents Ferret-UI Lite Lessons from Building Small On-Device GUI Agents
1
17
87
Another great collaboration advancing Computer Use Agents here at Apple. We investigate unifying UI interactions with tool use by synthesizing appropriate data and use of RL on OSWorld. This paper is nice behind the scenes peek into building an agentic system.
💡 Computer-use agents (CUAs) rely exclusively on primitive actions (click, type, scroll) that require lengthy execution chains, which can be cumbersome and error-prone. How to improve this? 🔥 🔥 In our native agent UltraCUA, we advocate the idea of "hybrid action" --
0
2
34
If you are excited about Multimodal and Agentic Reasoning with Foundation Models, Apple ML Research has openings for Researchers, Engineers, and Interns in this area. Consider applying through the links below or feel free to send a message for more information. - Machine
jobs.apple.com
Apply for a AIML - Machine Learning Researcher, MLR job at Apple. Read about the role and find out if it’s right for you.
12
54
460
Heading to ICCV Monday - Wednesday. If you are passionate about multimodal foundation models, reasoning, and agentic capabilities, please reach out -- happy to chat. A couple of highlights: -- Talk on Embodied AI Agents at the workshop on Multi-Modal Reason for Agentic
0
3
29
If you are at CVPR you are encouraged to visit our work: From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons ( https://t.co/wDAsIiGLxi), Saturday, 9:00am - 10:15am, Presentation #5, Oral Session 3, Davidson Ballroom World-Consistent Video Diffusion with
lnkd.in
This link will take you to a page that’s not on LinkedIn
0
0
10
Here is an RL perspective on understanding LLMs for decision making. Are LLMs best used as: policies / rewards / transition functions ? How do you fine-tune them ? Can LLMs explore / exploit ? 🧵 Join us down this rabbit hole... (ICLR 2025 paper, done at  ML Research)
2
31
169
Our single generalist Agent GEA trained via online RL on some domains and SFT on other achieves SOTA compared to all other generalist Agents and most specialist models across all these diverse domains and thousands of tasks.
0
0
1
For example, on Habitat Pick an SFT-based generalist agent trained via SFT (dubbed GEA-base) achieves 57% success. If further trained via PPO using the Habitat environment for reward generation, it reaches 83% success.
1
0
0
As a second lesson, while SFT is a great tool to adapt foundation models, it is imperative to utilize data through online RL when it comes to agentic domains that are somewhat removed from the original image/text applications.
1
0
1
Working with agents across multiple domains requires an uniform way of encoding these domains. In particular, for continuous action spaces, we use a unified tokenization scheme across all action parameterizations and embodiments.
1
0
1
What are important design decisions in building AI Agents? We release an empirical analysis of both data and method choices in agents across robotics, planning, UI interactions, and video games in https://t.co/4yJhNCTN2s. Joint work w colleagues at Apple ML Research. 🧵
5
16
143
Attending Neurips Wed -> Fri. Feel free to reach out and connect. We have opening positions for Researchers. You have two posters on Grounding of Agents ( https://t.co/ky3Gr70iti) and Data Filtering for LMs ( https://t.co/PiCWJ7wnbs) on Friday. Hope to see you there!
1
2
78
We are hiring! We are looking for accomplished and hands-on researchers to join us. Topics in Multimodal Foundation Models and AI Agents are of particular interest. Feel free to DM me if interested. https://t.co/QdC4K3KpUK
8
47
435
🚀 Exciting news! @Apple has released its own open-source LLM, DCLM-7B. Everything is open-source, including the model weights and datasets. 💡Why should you be excited? 1. The datasets and tools released as part of this research lay the groundwork for future advancements in
2
5
35
We've publicly released our DataComp-LM models: Truly open 1B and 7B models that's competitive with state-of-the-art (llama3, qwen2, gemma, ...) on most benchmarks, but with a public training recipe, dataset, and code! (1/3)
1
14
56
@ #cvpr Tueaday and Wednesday. Happy to chat about Embodied AI and Multimodal FMs.
0
0
1
Joint work with with Andrew Szot, @bogdan_mazoure, Devon Hjelm, @harsh_092, @zsoltkira at Apple ML Research.
0
0
2
Interestingly, such tokenizers give superior performance for continuous action Embodied AI especially when the policy is initialized from an LLM. Hence, LLM-based policies are to be designed differently than your traditional ones.
1
0
1