Ayush Kaushal Profile
Ayush Kaushal

@_AyushKaushal

Followers
543
Following
211
Media
4
Statuses
68

Open Source LLMs @Mila_Quebec, @umontreal, @nolanoorg Z Fellows Former Research @Google, @UTAustin, @IITKGP

Joined February 2020
Don't wanna be here? Send us removal request.
@_AyushKaushal
Ayush Kaushal
4 months
RT @imtejas13: 🎉 Thrilled to share that our paper "Surprising effectiveness of pretraining ternary language models at scale" earned a spotl….
0
12
0
@_AyushKaushal
Ayush Kaushal
1 year
Went into this project seeking to disprove viability of Ternary LLMs. After a few modifications, I was convinced otherwise. Check out the paper, model and scaling laws for yourself. Bonus: We also analyse GPU trends and what ternary modeling means for the future accelerators.
@NolanoOrg
Nolano.ai
1 year
🚀 SpectraSuite of Ternary and FP16 LLMs 🚀. We’re thrilled to release the Spectra Suite of open ternary (TriLMs) and FP16 (FloatLMs) language models from 99M to 3.9B parameters. At billion+ parameter scale, TriLMs upto 10x smaller can match the performance of FloatLMs. 1/5
Tweet media one
0
1
5
@_AyushKaushal
Ayush Kaushal
1 year
AI community 2014 recognized cramming^ as bottleneck to model language. We got transformers/attention. AI community 2024 will* recognize fixed downscaled image resolution as bottleneck to model vision+language. ^ * OpenAI already realized this >=2 yrs ago.
0
0
2
@_AyushKaushal
Ayush Kaushal
2 years
Excited to share our work on LLM continual pretraining. What excites me the most:.- Continual pretraining can be used to extend model's capabilities to new domains and languages. - When done right, it can avoid catastrophic forgetting (see CodeLlaMa and LeoLM) over English.
@NolanoOrg
Nolano.ai
2 years
We are pleased to introduce Hi-NOLIN, the best performing 9B Hindi-English Bilingual LLM. Blog:
Tweet media one
0
1
13
@_AyushKaushal
Ayush Kaushal
2 years
LoRD provides an alternative to quantization for LLM compression. The compressed model is differentiable and can use existing (float) GEMMs in PyTorch. Can also be combined with quantization. Monolingual Code LLMs can be decomposed in one-shot without need for retraining.
@NolanoOrg
Nolano.ai
2 years
1/ Introducing LoRD: Low-Rank Decomposition of Monolingual Code LLMs for one-shot compression. Paper:
0
3
3
@_AyushKaushal
Ayush Kaushal
2 years
RT @amasad: Bard can read your Replit or Github projects in less than a second and make suggestions 🤯
Tweet media one
0
197
0
@_AyushKaushal
Ayush Kaushal
2 years
That is true. But Copilot and ChatGPT have crossed threshold required to productize AI. We now see exponential growth in *products* using AI. AI is rapidly becoming pervasive. I remember Geoff Hinton saying GPT2 impressed him, not ChatGPT/GPT3. (.
@amasad
Amjad Masad
2 years
@paulg We’re already deep in the bowels of diminishing returns for scaling GPT. Add to that GPU shortage, things have tremendously slowed down.
1
1
9
@_AyushKaushal
Ayush Kaushal
2 years
RT @pmddomingos: ELIZA still shows through in ChatGPT. It just takes longer to see it.
0
4
0
@_AyushKaushal
Ayush Kaushal
2 years
Weirdest part about LLaMa's architecture is that it doesn't have any additive parameter terms. It's missing bias in MLP since it uses GatedFFN. It's RMSnorm, unlike Layernorm only has a scaling factor. But residual & RoPe (and Matmuls internally) are doing addition operations.
1
2
8
@_AyushKaushal
Ayush Kaushal
2 years
This is either an intentional April Fool's prank or an unwitting error—LLaMa is not sparse. The reported 4GB RAM usage is a measurement error (check ; on my 16GB RAM M1-CPU, it leads to poor CPU utilization & more time spent accessing memory/swap-space.
@eugeneyan
Eugene Yan
2 years
This is why CS fundamentals continue to be crucial: LLaMA 30B only needs 4gb of memory if we use mmap(). Not sure why this works but one reason could be that 30B weights are sparse. Thus, lazy loading the fraction of needed weights reduces memory usage.
Tweet media one
1
1
6
@_AyushKaushal
Ayush Kaushal
2 years
We are making it easier to build applications on LLMs that run locally via Python interface to fast CPP inference. Check out:. In the next 24 hrs we will also be adding CodeGen and LLaMa/Alpaca. Let us know how we can make it easier for you to use.
@NolanoOrg
Nolano.ai
2 years
Introducing Cformers 🚀 - "Transformers with a C-backend for lightning-fast CPU inference". 🔁 Switch between SoTA models.📥 Precompressed models.⬇️ Automatic downloading.🐍 Python interface for easy use. Try it today! #Cformers #AI #LLMs #AGI . Github:
3
4
17
@_AyushKaushal
Ayush Kaushal
2 years
Results from experiments on quantizing LLMs. - int3 GPTQ quantized 13B LLaMa outperforms FP16 7B LLaMa. - GPTQ may not always be better than rounding-to-nearest when Zero-offset is fixed. - 2-bit quantization is still a longshot for 13B LLaMa, but it's better for larger models.
@NolanoOrg
Nolano.ai
2 years
🚀4GB RAM is a lot for running int4 LLaMa, so we are compressing it further. Here's our report on reducing the size of these models further. Tl;dr: Upto 15% for 7B & 30% weight reduction for 13B possible using GPTQ.
2
0
13
@_AyushKaushal
Ayush Kaushal
2 years
Soon personalized models, more powerful than ChatGPT will be residing and running locally on personal devices - every PC, tablet and smartphone. Get excited! We will be sharing more exciting news soon.
@NolanoOrg
Nolano.ai
2 years
🚀 You can now achieve GPT-3 level performance on your Mac at 12 tokens/sec using compressed LLaMa 7B and optimized inference with just 4GB of RAM. Join our Discord for more updates: #GPT3 #ChatGPT #AGI #LLaMa
1
2
19
@_AyushKaushal
Ayush Kaushal
2 years
RT @NolanoOrg: SayIt speaks your language - now supporting Hinglish and more!
0
1
0
@_AyushKaushal
Ayush Kaushal
2 years
RT @ID_AA_Carmack: I wish more people talked in terms of distributions of outcomes. So much discourse is around “facts” as sound bites, onl….
0
79
0
@_AyushKaushal
Ayush Kaushal
2 years
Are @TechCrunch's articles are written by GPT? Hallucinating facts. Claiming @sama to be co-founder of @ycombinator instead of president. #TechNews #Startup #AIFails
Tweet media one
1
0
1
@_AyushKaushal
Ayush Kaushal
2 years
RT @naval: If you think the AI is sentient, you just failed the Turing Test from the other side.
0
951
0
@_AyushKaushal
Ayush Kaushal
2 years
RT @MovingToTheSun: My new favorite thing - Bing's new ChatGPT bot argues with a user, gaslights them about the current year being 2022, sa….
0
10K
0
@_AyushKaushal
Ayush Kaushal
2 years
RT @gdb: Everyone talking about the future of search, but I'm particularly excited about the future of the browser — Edge will now include….
0
275
0
@_AyushKaushal
Ayush Kaushal
2 years
RT @sundarpichai: 1/ In 2021, we shared next-gen language + conversation capabilities powered by our Language Model for Dialogue Applicatio….
0
3K
0