Sinong Wang
@sinongwang
Followers
376
Following
12
Media
5
Statuses
15
Research Scientist in Meta Generative AI, working on LLM, NLP, Optimization and Recommendation system.
Bellevue, WA
Joined August 2016
Super excited to share our paper won the outstanding paper in NAACL 2024. Check out our paper:
๐ Excited to receive an outstanding paper award at NAACL2024 for LM-Infinite "Zero-Shot Extreme Length Generalization for Large Language Models" work! We extend to 200M length with no parameter updates, with downstream improvements https://t.co/T6MSXbtWpv
https://t.co/9UHksOOwfp
0
0
3
Excited to share Llama3-preview (8B/70B) that achieves best MMLU results in open source models, and also preliminary results for a 405B model. Also super excited to share that we integrate Llama3 into Meta AI, the worldโs best AI assistant!
ai.meta.com
Today, weโre introducing Meta Llama 3, the next generation of our state-of-the-art open source large language model. In the coming months, we expect to share new capabilities, additional model sizes,...
0
0
1
Meta just dropped a banger: LLaMA 2 Long. - Continued pretraining LLaMA on long context and studied the effects of pretraining text lengths. - Apparently having abundant long texts in the pretraing dataset is not the key to achieving strong performance. - They also perform a
Effective Long-Context Scaling of Foundation Models LLAMA 70B variant surpasses gpt-3.5-turbo-16kโs overall performance on a suite of long-context tasks https://t.co/QcyP1WXJKl
13
75
534
Excited to share our latest latest work on long context LLM, which is the new foundation model behind 28 Meta AI agents. The new long-context LLM model also achieves the better performance than ChatGPT-3.5-turbo-16k across various tasks.
๐ Effective Long-Context Scaling of Foundation Models โก๏ธ https://t.co/oMKlrtPB0s Another piece of research that helps us build engaging conversational experiences for our AIs and the Meta AI assistant.
0
0
0
Excited to share our latest work on extending LLM context window length without fine-tuning!
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models paper page: https://t.co/pa3x0rM7pj In recent years, there have been remarkable advancements in the performance of Transformer-based Large Language Models (LLMs) across various domains. As these LLMs
0
0
2
Hi #NAACL2022! Last summer we had a crazy idea of distilling transformer models into shallow, sparse, and fast models. Curious about whether and to what extent this idea works? Please come to our presentation tomorrow! ๐ Session 1D @ Elwha A โฐ Mon 11:30-11:45
2
19
103
We wondered what happens when aligning dropouts with the common bayesian interpretation as a posterior over the weights, for transformers. Turns out it largely reduces over-fitting; Improvements across many apples-to-apples experiments. @sinongwang @Han_Fang_ @MetaAI
BayesFormer: Transformer with Uncertainty Estimation abs: https://t.co/0OqGgau2D2 introduce BayesFormer, a Transformer model with dropouts designed by Bayesian theory
1
10
65
Prompt tuning can be instance-dependent. Thrilled to share our new work! "IDPG: An Instance-Dependent Prompt Generation Method". Check out our paper here: https://t.co/s5iWueSJqj
1
1
2
Thrilled to share our #NeurIPS2021 work! "Luna: Linear Unified Nested Attention". This is a new linear time transformer architecture achieves competitive results across multiple benchmarks. co-authors: @XiangKong4 @sinongwang @violet_zct @jonathanmay @gabema @LukeZettlemoyer
1
8
48
Thrilled to share our new work! "Luna: Linear Unified Nested Attention". This is a new linear time transformer architecture achieves competitive results across multiple benchmarks. Check our our paper here: https://t.co/BNtqdTAQqH The implementation: https://t.co/US9vTjTG7T.
1
10
38
You don't need a 175B GPT-3 for few shot learning. All you need is entailment! Check out our new preprints: https://t.co/dknCCTUMoJ In short, we propose a new method turning small LM into better few shot learner. @Han_Fang_ @MadianKhabsa @hanna_mao @gabema
3
17
88
SOTA in NLP is typically achieved by LM pretraining followed by finetuning. Our recent paper in ACL shows that pretraining has a diminishing return as the number of training examples increases, and LSTM can be within 1 percent of BERT models. Link: https://t.co/9ZhqbUmCAF
4
56
243
The Linformer projects self-attention into a lower-dimensional space and achieves linear-time instead of quadratic resource-requirements. Independent of sequence length! ๐ช Watch the video here: https://t.co/ZKw66C2idf
@sinongwang @belindazli @MadianKhabsa @Han_Fang_ @facebookai
11
42
214
Thrilled to share our new work! "Linformer: Self-attention with Linear Complexity". We show that self-attention is low rank, and introduce a linear-time transformer that performs on par with traditional transformers. Check our here: https://t.co/yLATBD85lE
7
86
340