Ayush Kaushal @_AyushKaushal X Profile

Ayush Kaushal

@_AyushKaushal

Followers

543

Following

211

Media

4

Statuses

68

Open Source LLMs @Mila_Quebec, @umontreal, @nolanoorg Z Fellows Former Research @Google, @UTAustin, @IITKGP

Joined February 2020

Don't wanna be here? Send us removal request.

Ayush Kaushal

@_AyushKaushal

4 months

RT @imtejas13: 🎉 Thrilled to share that our paper "Surprising effectiveness of pretraining ternary language models at scale" earned a spotl….

0

12

0

Ayush Kaushal

@_AyushKaushal

1 year

Went into this project seeking to disprove viability of Ternary LLMs. After a few modifications, I was convinced otherwise. Check out the paper, model and scaling laws for yourself. Bonus: We also analyse GPU trends and what ternary modeling means for the future accelerators.

Nolano.ai

@NolanoOrg

1 year

🚀 SpectraSuite of Ternary and FP16 LLMs 🚀. We’re thrilled to release the Spectra Suite of open ternary (TriLMs) and FP16 (FloatLMs) language models from 99M to 3.9B parameters. At billion+ parameter scale, TriLMs upto 10x smaller can match the performance of FloatLMs. 1/5

0

1

5

Ayush Kaushal

@_AyushKaushal

1 year

AI community 2014 recognized cramming^ as bottleneck to model language. We got transformers/attention. AI community 2024 will* recognize fixed downscaled image resolution as bottleneck to model vision+language. ^ * OpenAI already realized this >=2 yrs ago.

0

2

Ayush Kaushal

@_AyushKaushal

2 years

Excited to share our work on LLM continual pretraining. What excites me the most:.- Continual pretraining can be used to extend model's capabilities to new domains and languages. - When done right, it can avoid catastrophic forgetting (see CodeLlaMa and LeoLM) over English.

Nolano.ai

@NolanoOrg

2 years

We are pleased to introduce Hi-NOLIN, the best performing 9B Hindi-English Bilingual LLM. Blog:

0

1

13

Ayush Kaushal

@_AyushKaushal

2 years

LoRD provides an alternative to quantization for LLM compression. The compressed model is differentiable and can use existing (float) GEMMs in PyTorch. Can also be combined with quantization. Monolingual Code LLMs can be decomposed in one-shot without need for retraining.

Nolano.ai

@NolanoOrg

2 years

1/ Introducing LoRD: Low-Rank Decomposition of Monolingual Code LLMs for one-shot compression. Paper:

0

3

Ayush Kaushal

@_AyushKaushal

2 years

RT @amasad: Bard can read your Replit or Github projects in less than a second and make suggestions 🤯

0

197

0

Ayush Kaushal

@_AyushKaushal

2 years

That is true. But Copilot and ChatGPT have crossed threshold required to productize AI. We now see exponential growth in *products* using AI. AI is rapidly becoming pervasive. I remember Geoff Hinton saying GPT2 impressed him, not ChatGPT/GPT3. (.

Amjad Masad

@amasad

2 years

@paulg We’re already deep in the bowels of diminishing returns for scaling GPT. Add to that GPU shortage, things have tremendously slowed down.

1

9

Ayush Kaushal

@_AyushKaushal

2 years

RT @pmddomingos: ELIZA still shows through in ChatGPT. It just takes longer to see it.

0

4

0

Ayush Kaushal

@_AyushKaushal

2 years

Weirdest part about LLaMa's architecture is that it doesn't have any additive parameter terms. It's missing bias in MLP since it uses GatedFFN. It's RMSnorm, unlike Layernorm only has a scaling factor. But residual & RoPe (and Matmuls internally) are doing addition operations.

1

2

8

Ayush Kaushal

@_AyushKaushal

2 years

This is either an intentional April Fool's prank or an unwitting error—LLaMa is not sparse. The reported 4GB RAM usage is a measurement error (check ; on my 16GB RAM M1-CPU, it leads to poor CPU utilization & more time spent accessing memory/swap-space.

Eugene Yan

@eugeneyan

2 years

This is why CS fundamentals continue to be crucial: LLaMA 30B only needs 4gb of memory if we use mmap(). Not sure why this works but one reason could be that 30B weights are sparse. Thus, lazy loading the fraction of needed weights reduces memory usage.

1

6

Ayush Kaushal

@_AyushKaushal

2 years

We are making it easier to build applications on LLMs that run locally via Python interface to fast CPP inference. Check out:. In the next 24 hrs we will also be adding CodeGen and LLaMa/Alpaca. Let us know how we can make it easier for you to use.

Nolano.ai

@NolanoOrg

2 years

Introducing Cformers 🚀 - "Transformers with a C-backend for lightning-fast CPU inference". 🔁 Switch between SoTA models.📥 Precompressed models.⬇️ Automatic downloading.🐍 Python interface for easy use. Try it today! #Cformers #AI #LLMs #AGI . Github:

3

4

17

Ayush Kaushal

@_AyushKaushal

2 years

Results from experiments on quantizing LLMs. - int3 GPTQ quantized 13B LLaMa outperforms FP16 7B LLaMa. - GPTQ may not always be better than rounding-to-nearest when Zero-offset is fixed. - 2-bit quantization is still a longshot for 13B LLaMa, but it's better for larger models.

Nolano.ai

@NolanoOrg

2 years

🚀4GB RAM is a lot for running int4 LLaMa, so we are compressing it further. Here's our report on reducing the size of these models further. Tl;dr: Upto 15% for 7B & 30% weight reduction for 13B possible using GPTQ.

2

0

13

Ayush Kaushal

@_AyushKaushal

2 years

Soon personalized models, more powerful than ChatGPT will be residing and running locally on personal devices - every PC, tablet and smartphone. Get excited! We will be sharing more exciting news soon.

Nolano.ai

@NolanoOrg

2 years

🚀 You can now achieve GPT-3 level performance on your Mac at 12 tokens/sec using compressed LLaMa 7B and optimized inference with just 4GB of RAM. Join our Discord for more updates: #GPT3 #ChatGPT #AGI #LLaMa

1

2

19

Ayush Kaushal

@_AyushKaushal

2 years

RT @NolanoOrg: SayIt speaks your language - now supporting Hinglish and more!

0

1

0

Ayush Kaushal

@_AyushKaushal

2 years

RT @ID_AA_Carmack: I wish more people talked in terms of distributions of outcomes. So much discourse is around “facts” as sound bites, onl….

0

79

0

Ayush Kaushal

@_AyushKaushal

2 years

Are @TechCrunch's articles are written by GPT? Hallucinating facts. Claiming @sama to be co-founder of @ycombinator instead of president. #TechNews #Startup #AIFails

1

0

1

Ayush Kaushal

@_AyushKaushal

2 years

RT @naval: If you think the AI is sentient, you just failed the Turing Test from the other side.

0

951

0

Ayush Kaushal

@_AyushKaushal

2 years

RT @MovingToTheSun: My new favorite thing - Bing's new ChatGPT bot argues with a user, gaslights them about the current year being 2022, sa….

0

10K

0

Ayush Kaushal

@_AyushKaushal

2 years

RT @gdb: Everyone talking about the future of search, but I'm particularly excited about the future of the browser — Edge will now include….

0

275

0

Ayush Kaushal

@_AyushKaushal

2 years

RT @sundarpichai: 1/ In 2021, we shared next-gen language + conversation capabilities powered by our Language Model for Dialogue Applicatio….

0

3K

0