
Ayush Kaushal
@_AyushKaushal
Followers
543
Following
211
Media
4
Statuses
68
Open Source LLMs @Mila_Quebec, @umontreal, @nolanoorg Z Fellows Former Research @Google, @UTAustin, @IITKGP
Joined February 2020
RT @imtejas13: 🎉 Thrilled to share that our paper "Surprising effectiveness of pretraining ternary language models at scale" earned a spotl….
0
12
0
Went into this project seeking to disprove viability of Ternary LLMs. After a few modifications, I was convinced otherwise. Check out the paper, model and scaling laws for yourself. Bonus: We also analyse GPU trends and what ternary modeling means for the future accelerators.
🚀 SpectraSuite of Ternary and FP16 LLMs 🚀. We’re thrilled to release the Spectra Suite of open ternary (TriLMs) and FP16 (FloatLMs) language models from 99M to 3.9B parameters. At billion+ parameter scale, TriLMs upto 10x smaller can match the performance of FloatLMs. 1/5
0
1
5
Excited to share our work on LLM continual pretraining. What excites me the most:.- Continual pretraining can be used to extend model's capabilities to new domains and languages. - When done right, it can avoid catastrophic forgetting (see CodeLlaMa and LeoLM) over English.
0
1
13
LoRD provides an alternative to quantization for LLM compression. The compressed model is differentiable and can use existing (float) GEMMs in PyTorch. Can also be combined with quantization. Monolingual Code LLMs can be decomposed in one-shot without need for retraining.
1/ Introducing LoRD: Low-Rank Decomposition of Monolingual Code LLMs for one-shot compression. Paper:
0
3
3
RT @amasad: Bard can read your Replit or Github projects in less than a second and make suggestions 🤯
0
197
0
That is true. But Copilot and ChatGPT have crossed threshold required to productize AI. We now see exponential growth in *products* using AI. AI is rapidly becoming pervasive. I remember Geoff Hinton saying GPT2 impressed him, not ChatGPT/GPT3. (.
@paulg We’re already deep in the bowels of diminishing returns for scaling GPT. Add to that GPU shortage, things have tremendously slowed down.
1
1
9
This is either an intentional April Fool's prank or an unwitting error—LLaMa is not sparse. The reported 4GB RAM usage is a measurement error (check ; on my 16GB RAM M1-CPU, it leads to poor CPU utilization & more time spent accessing memory/swap-space.
This is why CS fundamentals continue to be crucial: LLaMA 30B only needs 4gb of memory if we use mmap(). Not sure why this works but one reason could be that 30B weights are sparse. Thus, lazy loading the fraction of needed weights reduces memory usage.
1
1
6
We are making it easier to build applications on LLMs that run locally via Python interface to fast CPP inference. Check out:. In the next 24 hrs we will also be adding CodeGen and LLaMa/Alpaca. Let us know how we can make it easier for you to use.
Introducing Cformers 🚀 - "Transformers with a C-backend for lightning-fast CPU inference". 🔁 Switch between SoTA models.📥 Precompressed models.⬇️ Automatic downloading.🐍 Python interface for easy use. Try it today! #Cformers #AI #LLMs #AGI . Github:
3
4
17
Results from experiments on quantizing LLMs. - int3 GPTQ quantized 13B LLaMa outperforms FP16 7B LLaMa. - GPTQ may not always be better than rounding-to-nearest when Zero-offset is fixed. - 2-bit quantization is still a longshot for 13B LLaMa, but it's better for larger models.
🚀4GB RAM is a lot for running int4 LLaMa, so we are compressing it further. Here's our report on reducing the size of these models further. Tl;dr: Upto 15% for 7B & 30% weight reduction for 13B possible using GPTQ.
2
0
13
Soon personalized models, more powerful than ChatGPT will be residing and running locally on personal devices - every PC, tablet and smartphone. Get excited! We will be sharing more exciting news soon.
🚀 You can now achieve GPT-3 level performance on your Mac at 12 tokens/sec using compressed LLaMa 7B and optimized inference with just 4GB of RAM. Join our Discord for more updates: #GPT3 #ChatGPT #AGI #LLaMa
1
2
19
RT @ID_AA_Carmack: I wish more people talked in terms of distributions of outcomes. So much discourse is around “facts” as sound bites, onl….
0
79
0
Are @TechCrunch's articles are written by GPT? Hallucinating facts. Claiming @sama to be co-founder of @ycombinator instead of president. #TechNews #Startup #AIFails
1
0
1
RT @naval: If you think the AI is sentient, you just failed the Turing Test from the other side.
0
951
0
RT @MovingToTheSun: My new favorite thing - Bing's new ChatGPT bot argues with a user, gaslights them about the current year being 2022, sa….
0
10K
0
RT @gdb: Everyone talking about the future of search, but I'm particularly excited about the future of the browser — Edge will now include….
0
275
0
RT @sundarpichai: 1/ In 2021, we shared next-gen language + conversation capabilities powered by our Language Model for Dialogue Applicatio….
0
3K
0