Alexander Toshev @alexttoshev X Profile

Alexander Toshev

@alexttoshev

Followers

1K

Following

130

Media

17

Statuses

104

Researcher in CV, ML, and Embodied AI.

https://t.co/Kafjbf7ukp

San Francisco, CA

Joined July 2017

Don't wanna be here? Send us removal request.

AK

@_akhaliq

2 months

Apple presents Ferret-UI Lite Lessons from Building Small On-Device GUI Agents

1

17

87

Alexander Toshev

@alexttoshev

1 month

Another great collaboration advancing Computer Use Agents here at Apple. We investigate unifying UI interactions with tool use by synthesizing appropriate data and use of RL on OSWorld. This paper is nice behind the scenes peek into building an agentic system.

Zhe Gan

@zhegan4

1 month

💡 Computer-use agents (CUAs) rely exclusively on primitive actions (click, type, scroll) that require lengthy execution chains, which can be cumbersome and error-prone. How to improve this? 🔥 🔥 In our native agent UltraCUA, we advocate the idea of "hybrid action" --

0

2

34

Alexander Toshev

@alexttoshev

1 month

If you are excited about Multimodal and Agentic Reasoning with Foundation Models, Apple ML Research has openings for Researchers, Engineers, and Interns in this area. Consider applying through the links below or feel free to send a message for more information. - Machine

jobs.apple.com

Apply for a AIML - Machine Learning Researcher, MLR job at Apple. Read about the role and find out if it’s right for you.

12

54

460

Alexander Toshev

@alexttoshev

1 month

Heading to ICCV Monday - Wednesday. If you are passionate about multimodal foundation models, reasoning, and agentic capabilities, please reach out -- happy to chat. A couple of highlights: -- Talk on Embodied AI Agents at the workshop on Multi-Modal Reason for Agentic

0

3

29

Alexander Toshev

@alexttoshev

6 months

If you are at CVPR you are encouraged to visit our work: From Multimodal LLMs to Generalist Embodied Agents: Methods and Lessons ( https://t.co/wDAsIiGLxi), Saturday, 9:00am - 10:15am, Presentation #5, Oral Session 3, Davidson Ballroom World-Consistent Video Diffusion with

lnkd.in

This link will take you to a page that’s not on LinkedIn

0

10

Martin Klissarov

@MartinKlissarov

8 months

Here is an RL perspective on understanding LLMs for decision making. Are LLMs best used as: policies / rewards / transition functions ? How do you fine-tune them ? Can LLMs explore / exploit ? 🧵 Join us down this rabbit hole... (ICLR 2025 paper, done at  ML Research)

2

31

169

Alexander Toshev

@alexttoshev

1 year

Our single generalist Agent GEA trained via online RL on some domains and SFT on other achieves SOTA compared to all other generalist Agents and most specialist models across all these diverse domains and thousands of tasks.

0

1

Alexander Toshev

@alexttoshev

1 year

For example, on Habitat Pick an SFT-based generalist agent trained via SFT (dubbed GEA-base) achieves 57% success. If further trained via PPO using the Habitat environment for reward generation, it reaches 83% success.

1

0

Alexander Toshev

@alexttoshev

1 year

As a second lesson, while SFT is a great tool to adapt foundation models, it is imperative to utilize data through online RL when it comes to agentic domains that are somewhat removed from the original image/text applications.

1

0

1

Alexander Toshev

@alexttoshev

1 year

Working with agents across multiple domains requires an uniform way of encoding these domains. In particular, for continuous action spaces, we use a unified tokenization scheme across all action parameterizations and embodiments.

1

0

1

Alexander Toshev

@alexttoshev

1 year

What are important design decisions in building AI Agents? We release an empirical analysis of both data and method choices in agents across robotics, planning, UI interactions, and video games in https://t.co/4yJhNCTN2s. Joint work w colleagues at Apple ML Research. 🧵

5

16

143

Alexander Toshev

@alexttoshev

1 year

Attending Neurips Wed -> Fri. Feel free to reach out and connect. We have opening positions for Researchers. You have two posters on Grounding of Agents ( https://t.co/ky3Gr70iti) and Data Filtering for LMs ( https://t.co/PiCWJ7wnbs) on Friday. Hope to see you there!

1

2

78

Alexander Toshev

@alexttoshev

1 year

Only in SF.

0

10

Alexander Toshev

@alexttoshev

1 year

We are hiring! We are looking for accomplished and hands-on researchers to join us. Topics in Multimodal Foundation Models and AI Agents are of particular interest. Feel free to DM me if interested. https://t.co/QdC4K3KpUK

8

47

435

Akash Shetty

@akashlives

1 year

🚀 Exciting news! @Apple has released its own open-source LLM, DCLM-7B. Everything is open-source, including the model weights and datasets. 💡Why should you be excited? 1. The datasets and tools released as part of this research lay the groundwork for future advancements in

2

5

35

Achal Dave

@achalddave

1 year

We've publicly released our DataComp-LM models: Truly open 1B and 7B models that's competitive with state-of-the-art (llama3, qwen2, gemma, ...) on most benchmarks, but with a public training recipe, dataset, and code! (1/3)

1

14

56

Alexander Toshev

@alexttoshev

1 year

@ #cvpr Tueaday and Wednesday. Happy to chat about Embodied AI and Multimodal FMs.

0

1

Alexander Toshev

@alexttoshev

1 year

Joint work with with Andrew Szot, @bogdan_mazoure, Devon Hjelm, @harsh_092, @zsoltkira at Apple ML Research.

0

2

Alexander Toshev

@alexttoshev

1 year

Interestingly, such tokenizers give superior performance for continuous action Embodied AI especially when the policy is initialized from an LLM. Hence, LLM-based policies are to be designed differently than your traditional ones.

1

0

1