Ruoming Pang Profile
Ruoming Pang

@ruomingpang

Followers
2K
Following
823
Media
9
Statuses
112

Apple Foundation Models

Joined January 2010
Don't wanna be here? Send us removal request.
@ruomingpang
Ruoming Pang
10 months
As Apple Intelligence is rolling out to our beta users today, we are proud to present a technical report on our Foundation Language Models that power these features on devices and cloud: đź§µ.
13
195
708
@ruomingpang
Ruoming Pang
4 years
We (Apple AI/ML) are looking for strong engineers and researchers in NYC. If you enjoy pushing the frontier of deep learning *and* building products used by millions of users, consider joining us at:
3
18
117
@ruomingpang
Ruoming Pang
1 year
Earlier today at #WWDC24, we introduced Apple Intelligence, the personal intelligence system integrated deeply into iPhone, iPad, and Mac, to enable powerful capabilities across language, images, actions, and personal context. We’re excited to share more about how Apple.
4
35
114
@ruomingpang
Ruoming Pang
1 year
Our team is developing cutting-edge foundation models that power Apple Intelligence. Join our close-knit and fast-moving efforts as researchers and engineers in Cupertino, New York, or Seattle. Be at the heart of shaping the future of Apple Intelligence. Learn more at.
@ruomingpang
Ruoming Pang
1 year
Earlier today at #WWDC24, we introduced Apple Intelligence, the personal intelligence system integrated deeply into iPhone, iPad, and Mac, to enable powerful capabilities across language, images, actions, and personal context. We’re excited to share more about how Apple.
0
15
100
@ruomingpang
Ruoming Pang
10 months
While these LMs are not chatbots, we trained them to have general purpose capabilities so that they can power a wide range of features including summarization, writing assistance, tool-use, and coding.
2
1
42
@ruomingpang
Ruoming Pang
10 months
The report describes the design and evaluation of our LLMs in details, including architecture, data curation, pre-training and post-training recipes, optimization, feature adaptation, and evaluation results.
1
3
32
@ruomingpang
Ruoming Pang
10 months
Tweet media one
1
4
31
@ruomingpang
Ruoming Pang
4 years
Proud to present our latest results on semi-supervised learning for speech: by Yu Zhang*, Daniel Park*, and Wei Han*, et. al.
2
6
29
@ruomingpang
Ruoming Pang
2 years
The foundation model team at Apple is looking for top talents in the Bay Area, NYC, and Seattle: Join us to push the frontier of AI and delight our users at this historical junction.
0
6
26
@ruomingpang
Ruoming Pang
10 months
We designed the models to follow Apple's Responsible AI principles, to run fast and efficiently, and most importantly, to be helpful through the user-facing features. Our end-to-end evaluations show that our models are competitive against some of the best models in their classes.
1
2
23
@ruomingpang
Ruoming Pang
10 months
We would appreciate feedback from our users and the research community. I'd also like to take this opportunity to thank our team (including @NoughtAleph, @MrZiruiWang, @cw_aabc, and many others) and collaborators. It has been a privilege to work with you all!.
1
2
21
@ruomingpang
Ruoming Pang
10 months
Tweet media one
1
1
19
@ruomingpang
Ruoming Pang
5 years
Which architecture is better at speech recognition, convolution or transformer? Check out Conformer with 1.9/3.9 on Librispeech: @anmol01gulati.
1
8
19
@ruomingpang
Ruoming Pang
5 years
A new SOTA on Librispeech with an end-to-end convolution model, 2.1/4.6 without external LM, 1.9/4.1 with LM: #ContextNet.
2
7
19
@ruomingpang
Ruoming Pang
10 months
Tweet media one
1
0
15
@ruomingpang
Ruoming Pang
10 months
1
0
12
@ruomingpang
Ruoming Pang
5 years
1.4%/2.6% on LibriSpeech with Conformer + Noisy Student + Wav2Vec:
0
3
11
@ruomingpang
Ruoming Pang
10 months
1
0
8
@ruomingpang
Ruoming Pang
2 years
What's the current best way to train and serve Transformer to process and generate long sequences (say 16K-64K tokens) efficiently? An example benchmark would be text summarization measured by human preferences.
4
0
8
@ruomingpang
Ruoming Pang
10 months
0
0
6
@ruomingpang
Ruoming Pang
2 years
@Lingling_Wei @WSJ That’s my personal experience too: “I realized the essence of being American: You’re always welcome, no matter where you were born. The U.S. is built on inclusiveness. It’s one of the biggest factors in American competitiveness and what drew me here two decades ago.”.
1
0
6
@ruomingpang
Ruoming Pang
7 years
@elizashapiro This conclusion is flawed in a number of ways: (1) GPA is not very meaningful across schools; (2) the gap between 4.1 and 3.9 is not spelled out in terms of percentage; (3) it uses the same metric for both selection and evaluation.
1
0
5
@ruomingpang
Ruoming Pang
4 years
Further, the pretraining helps a wide range of ASR and non-ASR tasks.
Tweet media one
0
0
5
@ruomingpang
Ruoming Pang
4 years
We can match previous SoTA with only 3% of the training data and improve the word error rate from 4.8% to 4.1% with the full data set.
Tweet media one
1
0
4
@ruomingpang
Ruoming Pang
4 years
It’s well known that SSL helps when there’s limited training data. Our experiments show that it helps even with Google’s 34k-hour production data set.
1
0
3
@ruomingpang
Ruoming Pang
3 years
@karpathy Surprising that no one mentioned the Lingvo params:
0
0
2
@ruomingpang
Ruoming Pang
3 years
Amazing efficiency and scalability with Jax + GSPMD + XLA + TPU. For more details on GSPMD see by @ukoxyz
@zhangqiaorjc
Qiao Zhang
3 years
JAX + GSPMD + TPU v4 achieves (to our knowledge) the highest Model Flop Utilization (MFU) across a range of Transformer LLMs:
Tweet media one
1
0
3
@ruomingpang
Ruoming Pang
4 years
My child was lucky enough to study in Ms Feurtado's math program. I hope we can preserve it for future kids! @NYCSchools @nyclabschool.
@nypost
New York Post
4 years
Parents panic as top NYC school plans to end advanced math program
Tweet media one
0
1
2
@ruomingpang
Ruoming Pang
7 years
@elizashapiro On the last point, imagine that one claims that height can be used to predict academic performance. After selecting students based on height, we find that indeed they are all quite tall! A better metric would be to compare GPA from SHS in an A-B study.
0
0
2
@ruomingpang
Ruoming Pang
7 years
@elizashapiro The problem is not racial integration, but "academic" integration. The proposed admission criteria contain a number of bad ideas, such as assuming that middle schools across the city have comparable academic levels, even if some are screened.
0
0
2
@ruomingpang
Ruoming Pang
3 years
@RuiQian3 Thanks! The new link is
1
0
2
@ruomingpang
Ruoming Pang
7 years
@NYCMayor As a parent, I know that it's years of hard work, not test prep, that gets students into specialized high schools. Address the real achievement gaps, do not just paper over them!.
0
0
2
@ruomingpang
Ruoming Pang
2 years
@giffmana MMLU 5-shot? GPT-4 reports 86.4 and Flan PaLM 2 at 81.2. Not sure whether the methodology is the same though.
0
0
2
@ruomingpang
Ruoming Pang
5 years
A simple and elegant solution from @JHYUXM, to reduce RNN-T latency:
0
0
1
@ruomingpang
Ruoming Pang
2 years
Chinese Embassy in the US, please reinstate visas and remove obstacles for visiting China - Sign the Petition! via @Change.
0
0
1
@ruomingpang
Ruoming Pang
11 years
Reform ECPA: Tell the Government to Get a Warrant http://t.co/kuedfDtubV.
0
0
1
@ruomingpang
Ruoming Pang
2 years
@anmol01gulati @Tim_Dettmers I think the overall lessons from flash attention also hold for TPU, since there’s a similar HBM/SRAM memory hierarchy. OTOH, the software stack is different. We rely on XLA to apply the optimization instead of writing our own kernels.
0
0
1