Sinong Wang Profile
Sinong Wang

@sinongwang

Followers
376
Following
12
Media
5
Statuses
15

Research Scientist in Meta Generative AI, working on LLM, NLP, Optimization and Recommendation system.

Bellevue, WA
Joined August 2016
Don't wanna be here? Send us removal request.
@sinongwang
Sinong Wang
2 years
Super excited to share our paper won the outstanding paper in NAACL 2024. Check out our paper:
@Glaciohound
Chi Han
2 years
๐ŸŽ– Excited to receive an outstanding paper award at NAACL2024 for LM-Infinite "Zero-Shot Extreme Length Generalization for Large Language Models" work! We extend to 200M length with no parameter updates, with downstream improvements https://t.co/T6MSXbtWpv https://t.co/9UHksOOwfp
0
0
3
@sinongwang
Sinong Wang
2 years
Excited to share Llama3-preview (8B/70B) that achieves best MMLU results in open source models, and also preliminary results for a 405B model. Also super excited to share that we integrate Llama3 into Meta AI, the worldโ€™s best AI assistant!
Tweet card summary image
ai.meta.com
Today, weโ€™re introducing Meta Llama 3, the next generation of our state-of-the-art open source large language model. In the coming months, we expect to share new capabilities, additional model sizes,...
0
0
1
@Yampeleg
Yam Peleg
2 years
Meta just dropped a banger: LLaMA 2 Long. - Continued pretraining LLaMA on long context and studied the effects of pretraining text lengths. - Apparently having abundant long texts in the pretraing dataset is not the key to achieving strong performance. - They also perform a
@arankomatsuzaki
Aran Komatsuzaki
2 years
Effective Long-Context Scaling of Foundation Models LLAMA 70B variant surpasses gpt-3.5-turbo-16kโ€™s overall performance on a suite of long-context tasks https://t.co/QcyP1WXJKl
13
75
534
@sinongwang
Sinong Wang
2 years
Excited to share our latest latest work on long context LLM, which is the new foundation model behind 28 Meta AI agents. The new long-context LLM model also achieves the better performance than ChatGPT-3.5-turbo-16k across various tasks.
@AIatMeta
AI at Meta
2 years
๐Ÿ†• Effective Long-Context Scaling of Foundation Models โžก๏ธ https://t.co/oMKlrtPB0s Another piece of research that helps us build engaging conversational experiences for our AIs and the Meta AI assistant.
0
0
0
@sinongwang
Sinong Wang
2 years
Excited to share our latest work on extending LLM context window length without fine-tuning!
@_akhaliq
AK
2 years
LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models paper page: https://t.co/pa3x0rM7pj In recent years, there have been remarkable advancements in the performance of Transformer-based Large Language Models (LLMs) across various domains. As these LLMs
0
0
2
@qinyuan_ye
Qinyuan Ye
4 years
Hi #NAACL2022! Last summer we had a crazy idea of distilling transformer models into shallow, sparse, and fast models. Curious about whether and to what extent this idea works? Please come to our presentation tomorrow! ๐Ÿ“ Session 1D @ Elwha A โฐ Mon 11:30-11:45
2
19
103
@karthikabinav
Karthik A Sankararaman ๐Ÿ‡ฎ๐Ÿ‡ณ๐Ÿ‡บ๐Ÿ‡ธ
4 years
We wondered what happens when aligning dropouts with the common bayesian interpretation as a posterior over the weights, for transformers. Turns out it largely reduces over-fitting; Improvements across many apples-to-apples experiments. @sinongwang @Han_Fang_ @MetaAI
@_akhaliq
AK
4 years
BayesFormer: Transformer with Uncertainty Estimation abs: https://t.co/0OqGgau2D2 introduce BayesFormer, a Transformer model with dropouts designed by Bayesian theory
1
10
65
@sinongwang
Sinong Wang
4 years
Prompt tuning can be instance-dependent. Thrilled to share our new work! "IDPG: An Instance-Dependent Prompt Generation Method". Check out our paper here: https://t.co/s5iWueSJqj
1
1
2
@MaxMa1987
Xuezhe Ma (Max)
4 years
Thrilled to share our #NeurIPS2021 work! "Luna: Linear Unified Nested Attention". This is a new linear time transformer architecture achieves competitive results across multiple benchmarks. co-authors: @XiangKong4 @sinongwang @violet_zct @jonathanmay @gabema @LukeZettlemoyer
1
8
48
@sinongwang
Sinong Wang
5 years
Thrilled to share our new work! "Luna: Linear Unified Nested Attention". This is a new linear time transformer architecture achieves competitive results across multiple benchmarks. Check our our paper here: https://t.co/BNtqdTAQqH The implementation: https://t.co/US9vTjTG7T.
1
10
38
@sinongwang
Sinong Wang
5 years
You don't need a 175B GPT-3 for few shot learning. All you need is entailment! Check out our new preprints: https://t.co/dknCCTUMoJ In short, we propose a new method turning small LM into better few shot learner. @Han_Fang_ @MadianKhabsa @hanna_mao @gabema
3
17
88
@sinongwang
Sinong Wang
6 years
SOTA in NLP is typically achieved by LM pretraining followed by finetuning. Our recent paper in ACL shows that pretraining has a diminishing return as the number of training examples increases, and LSTM can be within 1 percent of BERT models. Link: https://t.co/9ZhqbUmCAF
4
56
243
@ykilcher
Yannic Kilcher ๐Ÿ‡ธ๐Ÿ‡จ
6 years
The Linformer projects self-attention into a lower-dimensional space and achieves linear-time instead of quadratic resource-requirements. Independent of sequence length! ๐Ÿ’ช Watch the video here: https://t.co/ZKw66C2idf @sinongwang @belindazli @MadianKhabsa @Han_Fang_ @facebookai
11
42
214
@sinongwang
Sinong Wang
6 years
Thrilled to share our new work! "Linformer: Self-attention with Linear Complexity". We show that self-attention is low rank, and introduce a linear-time transformer that performs on par with traditional transformers. Check our here: https://t.co/yLATBD85lE
7
86
340