Ruoming Pang @ruomingpang profile

Ruoming Pang

@ruomingpang

Followers

2K

Following

823

Media

9

Statuses

112

Apple Foundation Models

Joined January 2010

Don't wanna be here? Send us removal request.

Ruoming Pang

@ruomingpang

10 months

As Apple Intelligence is rolling out to our beta users today, we are proud to present a technical report on our Foundation Language Models that power these features on devices and cloud: 🧵.

13

195

708

Ruoming Pang

@ruomingpang

4 years

We (Apple AI/ML) are looking for strong engineers and researchers in NYC. If you enjoy pushing the frontier of deep learning *and* building products used by millions of users, consider joining us at:

3

18

117

Ruoming Pang

@ruomingpang

1 year

Earlier today at #WWDC24, we introduced Apple Intelligence, the personal intelligence system integrated deeply into iPhone, iPad, and Mac, to enable powerful capabilities across language, images, actions, and personal context. We’re excited to share more about how Apple.

4

35

114

Ruoming Pang

@ruomingpang

1 year

Our team is developing cutting-edge foundation models that power Apple Intelligence. Join our close-knit and fast-moving efforts as researchers and engineers in Cupertino, New York, or Seattle. Be at the heart of shaping the future of Apple Intelligence. Learn more at.

Ruoming Pang

@ruomingpang

1 year

Earlier today at #WWDC24, we introduced Apple Intelligence, the personal intelligence system integrated deeply into iPhone, iPad, and Mac, to enable powerful capabilities across language, images, actions, and personal context. We’re excited to share more about how Apple.

0

15

100

Ruoming Pang

@ruomingpang

10 months

While these LMs are not chatbots, we trained them to have general purpose capabilities so that they can power a wide range of features including summarization, writing assistance, tool-use, and coding.

2

1

42

Ruoming Pang

@ruomingpang

10 months

The report describes the design and evaluation of our LLMs in details, including architecture, data curation, pre-training and post-training recipes, optimization, feature adaptation, and evaluation results.

1

3

32

Ruoming Pang

@ruomingpang

10 months

1

4

31

Ruoming Pang

@ruomingpang

4 years

Proud to present our latest results on semi-supervised learning for speech: by Yu Zhang*, Daniel Park*, and Wei Han*, et. al.

2

6

29

Ruoming Pang

@ruomingpang

2 years

The foundation model team at Apple is looking for top talents in the Bay Area, NYC, and Seattle: Join us to push the frontier of AI and delight our users at this historical junction.

0

6

26

Ruoming Pang

@ruomingpang

10 months

We designed the models to follow Apple's Responsible AI principles, to run fast and efficiently, and most importantly, to be helpful through the user-facing features. Our end-to-end evaluations show that our models are competitive against some of the best models in their classes.

1

2

23

Ruoming Pang

@ruomingpang

10 months

We would appreciate feedback from our users and the research community. I'd also like to take this opportunity to thank our team (including @NoughtAleph, @MrZiruiWang, @cw_aabc, and many others) and collaborators. It has been a privilege to work with you all!.

1

2

21

Ruoming Pang

@ruomingpang

10 months

1

19

Ruoming Pang

@ruomingpang

5 years

Which architecture is better at speech recognition, convolution or transformer? Check out Conformer with 1.9/3.9 on Librispeech: @anmol01gulati.

1

8

19

Ruoming Pang

@ruomingpang

5 years

A new SOTA on Librispeech with an end-to-end convolution model, 2.1/4.6 without external LM, 1.9/4.1 with LM: #ContextNet.

2

7

19

Ruoming Pang

@ruomingpang

10 months

1

0

15

Ruoming Pang

@ruomingpang

10 months

Also calling out to a few more Apple FM team members on X: @markblee @XiangKong4 @chenqibin99 @gyin94 @bwzhang_usc @vivekrathod @yapdianang @DpacGopinath @Phyyysalis @_samwiseman.

1

0

12

Ruoming Pang

@ruomingpang

5 years

1.4%/2.6% on LibriSpeech with Conformer + Noisy Student + Wav2Vec:

0

3

11

Ruoming Pang

@ruomingpang

10 months

and @FlorisWeers @YongqiangWang2 @redjavaC.

1

0

8

Ruoming Pang

@ruomingpang

2 years

What's the current best way to train and serve Transformer to process and generate long sequences (say 16K-64K tokens) efficiently? An example benchmark would be text summarization measured by human preferences.

4

0

8

Ruoming Pang

@ruomingpang

10 months

and @taolei15949106 @jeremy_wang2013.

0

6

Ruoming Pang

@ruomingpang

2 years

@Lingling_Wei @WSJ That’s my personal experience too: “I realized the essence of being American: You’re always welcome, no matter where you were born. The U.S. is built on inclusiveness. It’s one of the biggest factors in American competitiveness and what drew me here two decades ago.”.

1

0

6

Ruoming Pang

@ruomingpang

7 years

@elizashapiro This conclusion is flawed in a number of ways: (1) GPA is not very meaningful across schools; (2) the gap between 4.1 and 3.9 is not spelled out in terms of percentage; (3) it uses the same metric for both selection and evaluation.

1

0

5

Ruoming Pang

@ruomingpang

4 years

Further, the pretraining helps a wide range of ASR and non-ASR tasks.

0

5

Ruoming Pang

@ruomingpang

4 years

We can match previous SoTA with only 3% of the training data and improve the word error rate from 4.8% to 4.1% with the full data set.

1

0

4

Ruoming Pang

@ruomingpang

4 years

It’s well known that SSL helps when there’s limited training data. Our experiments show that it helps even with Google’s 34k-hour production data set.

1

0

3

Ruoming Pang

@ruomingpang

3 years

@karpathy Surprising that no one mentioned the Lingvo params:

0

2

Ruoming Pang

@ruomingpang

3 years

Amazing efficiency and scalability with Jax + GSPMD + XLA + TPU. For more details on GSPMD see by @ukoxyz

Qiao Zhang

@zhangqiaorjc

3 years

JAX + GSPMD + TPU v4 achieves (to our knowledge) the highest Model Flop Utilization (MFU) across a range of Transformer LLMs:

1

0

3

Ruoming Pang

@ruomingpang

4 years

My child was lucky enough to study in Ms Feurtado's math program. I hope we can preserve it for future kids! @NYCSchools @nyclabschool.

New York Post

@nypost

4 years

Parents panic as top NYC school plans to end advanced math program

0

1

2

Ruoming Pang

@ruomingpang

7 years

@elizashapiro On the last point, imagine that one claims that height can be used to predict academic performance. After selecting students based on height, we find that indeed they are all quite tall! A better metric would be to compare GPA from SHS in an A-B study.

0

2

Ruoming Pang

@ruomingpang

7 years

@elizashapiro The problem is not racial integration, but "academic" integration. The proposed admission criteria contain a number of bad ideas, such as assuming that middle schools across the city have comparable academic levels, even if some are screened.

0

2

Ruoming Pang

@ruomingpang

3 years

@RuiQian3 Thanks! The new link is

1

0

2

Ruoming Pang

@ruomingpang

7 years

@NYCMayor As a parent, I know that it's years of hard work, not test prep, that gets students into specialized high schools. Address the real achievement gaps, do not just paper over them!.

0

2

Ruoming Pang

@ruomingpang

2 years

@giffmana MMLU 5-shot? GPT-4 reports 86.4 and Flan PaLM 2 at 81.2. Not sure whether the methodology is the same though.

0

2

Ruoming Pang

@ruomingpang

5 years

A simple and elegant solution from @JHYUXM, to reduce RNN-T latency:

0

1

Ruoming Pang

@ruomingpang

2 years

Chinese Embassy in the US, please reinstate visas and remove obstacles for visiting China - Sign the Petition! via @Change.

0

1

Ruoming Pang

@ruomingpang

11 years

Reform ECPA: Tell the Government to Get a Warrant http://t.co/kuedfDtubV.

0

1

Ruoming Pang

@ruomingpang

2 years

@anmol01gulati @Tim_Dettmers I think the overall lessons from flash attention also hold for TPU, since there’s a similar HBM/SRAM memory hierarchy. OTOH, the software stack is different. We rely on XLA to apply the optimization instead of writing our own kernels.

0

1