Sinong Wang @sinongwang X Profile

Sinong Wang

@sinongwang

Followers

376

Following

12

Media

5

Statuses

15

Research Scientist in Meta Generative AI, working on LLM, NLP, Optimization and Recommendation system.

Bellevue, WA

Joined August 2016

Don't wanna be here? Send us removal request.

Sinong Wang

@sinongwang

2 years

Super excited to share our paper won the outstanding paper in NAACL 2024. Check out our paper:

Chi Han

@Glaciohound

2 years

🎖 Excited to receive an outstanding paper award at NAACL2024 for LM-Infinite "Zero-Shot Extreme Length Generalization for Large Language Models" work! We extend to 200M length with no parameter updates, with downstream improvements https://t.co/T6MSXbtWpv https://t.co/9UHksOOwfp

0

3

Sinong Wang

@sinongwang

2 years

Excited to share Llama3-preview (8B/70B) that achieves best MMLU results in open source models, and also preliminary results for a 405B model. Also super excited to share that we integrate Llama3 into Meta AI, the world’s best AI assistant!

ai.meta.com

Today, we’re introducing Meta Llama 3, the next generation of our state-of-the-art open source large language model. In the coming months, we expect to share new capabilities, additional model sizes,...

0

1

Yam Peleg

@Yampeleg

2 years

Meta just dropped a banger: LLaMA 2 Long. - Continued pretraining LLaMA on long context and studied the effects of pretraining text lengths. - Apparently having abundant long texts in the pretraing dataset is not the key to achieving strong performance. - They also perform a

Aran Komatsuzaki

@arankomatsuzaki

2 years

Effective Long-Context Scaling of Foundation Models LLAMA 70B variant surpasses gpt-3.5-turbo-16k’s overall performance on a suite of long-context tasks https://t.co/QcyP1WXJKl

13

75

534

Sinong Wang

@sinongwang

2 years

Excited to share our latest latest work on long context LLM, which is the new foundation model behind 28 Meta AI agents. The new long-context LLM model also achieves the better performance than ChatGPT-3.5-turbo-16k across various tasks.

AI at Meta

@AIatMeta

2 years

🆕 Effective Long-Context Scaling of Foundation Models ➡️ https://t.co/oMKlrtPB0s Another piece of research that helps us build engaging conversational experiences for our AIs and the Meta AI assistant.

0

Sinong Wang

@sinongwang

2 years

Excited to share our latest work on extending LLM context window length without fine-tuning!

AK

@_akhaliq

2 years

LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models paper page: https://t.co/pa3x0rM7pj In recent years, there have been remarkable advancements in the performance of Transformer-based Large Language Models (LLMs) across various domains. As these LLMs

0

2

Qinyuan Ye

@qinyuan_ye

4 years

Hi #NAACL2022! Last summer we had a crazy idea of distilling transformer models into shallow, sparse, and fast models. Curious about whether and to what extent this idea works? Please come to our presentation tomorrow! 📍 Session 1D @ Elwha A ⏰ Mon 11:30-11:45

2

19

103

Karthik A Sankararaman 🇮🇳🇺🇸

@karthikabinav

4 years

We wondered what happens when aligning dropouts with the common bayesian interpretation as a posterior over the weights, for transformers. Turns out it largely reduces over-fitting; Improvements across many apples-to-apples experiments. @sinongwang @Han_Fang_ @MetaAI

AK

@_akhaliq

4 years

BayesFormer: Transformer with Uncertainty Estimation abs: https://t.co/0OqGgau2D2 introduce BayesFormer, a Transformer model with dropouts designed by Bayesian theory

1

10

65

Sinong Wang

@sinongwang

4 years

Prompt tuning can be instance-dependent. Thrilled to share our new work! "IDPG: An Instance-Dependent Prompt Generation Method". Check out our paper here: https://t.co/s5iWueSJqj

1

2

Xuezhe Ma (Max)

@MaxMa1987

4 years

Thrilled to share our #NeurIPS2021 work! "Luna: Linear Unified Nested Attention". This is a new linear time transformer architecture achieves competitive results across multiple benchmarks. co-authors: @XiangKong4 @sinongwang @violet_zct @jonathanmay @gabema @LukeZettlemoyer

1

8

48

Sinong Wang

@sinongwang

5 years

Thrilled to share our new work! "Luna: Linear Unified Nested Attention". This is a new linear time transformer architecture achieves competitive results across multiple benchmarks. Check our our paper here: https://t.co/BNtqdTAQqH The implementation: https://t.co/US9vTjTG7T.

1

10

38

Sinong Wang

@sinongwang

5 years

You don't need a 175B GPT-3 for few shot learning. All you need is entailment! Check out our new preprints: https://t.co/dknCCTUMoJ In short, we propose a new method turning small LM into better few shot learner. @Han_Fang_ @MadianKhabsa @hanna_mao @gabema

3

17

88

Sinong Wang

@sinongwang

6 years

SOTA in NLP is typically achieved by LM pretraining followed by finetuning. Our recent paper in ACL shows that pretraining has a diminishing return as the number of training examples increases, and LSTM can be within 1 percent of BERT models. Link: https://t.co/9ZhqbUmCAF

4

56

243

Yannic Kilcher 🇸🇨

@ykilcher

6 years

The Linformer projects self-attention into a lower-dimensional space and achieves linear-time instead of quadratic resource-requirements. Independent of sequence length! 💪 Watch the video here: https://t.co/ZKw66C2idf @sinongwang @belindazli @MadianKhabsa @Han_Fang_ @facebookai

11

42

214

Sinong Wang

@sinongwang

6 years

Thrilled to share our new work! "Linformer: Self-attention with Linear Complexity". We show that self-attention is low rank, and introduce a linear-time transformer that performs on par with traditional transformers. Check our here: https://t.co/yLATBD85lE

7

86

340