The world's best open-source chat LLM, DBRX, is now available for free, on . Perplexity Labs Playground basically has everything that you need for chat, for free, with better LLMs (Haiku, DBRX, Sonar) than 3.5-turbo, the model powering free chatGPT. Curious
@AravSrinivas
What if the gradients become very small(closer to 0) won't we have issues like stuck gradients and very slow learning or maybe even during back propagation. The only reason they could be using tanh is because their neural network are not as deep as gpt models.
A YC startup in the middle of fundraising found a way to cut their costs by 2/3. Now they don't need to raise at all. They can make it to profitability on the money they already have. Bet you can guess what will happen when they tell investors they're shutting down the raise.
Have you ever wanted to train LLMs in pure C without 245MB of PyTorch and 107MB of cPython? No? Well now you can! With llm.c:
To start, implements GPT-2 training on CPU/fp32 in only ~1,000 lines of clean code. It compiles and runs instantly, and exactly
After almost a decade, I have made the decision to leave OpenAI. The company’s trajectory has been nothing short of miraculous, and I’m confident that OpenAI will build AGI that is both safe and beneficial under the leadership of
@sama
,
@gdb
,
@miramurati
and now, under the
We have just released 🍷 FineWeb: 15 trillion tokens of high quality web data.
We filtered and deduplicated all CommonCrawl between 2013 and 2024.
Models trained on FineWeb outperform RefinedWeb, C4, DolmaV1.6, The Pile and SlimPajama!
Now a days it feels like people/ companies are focusing more on data training to fine tune their models for a specific usecase rather than creating something new in core Machine Learning.
We're excited to announce that Llama 3 is available on Perplexity Labs and our API. Kudos to the team
@AIatMeta
for all of the hard work they put into this release. We can't wait to see what you build with it. Try it free at
OpenAI snd MSFT want to build Stargate - a $100B GPU super cluster!
Great!
It’s time for Google to announce their $500B super cluster and Amazon to double down as well and start takling about their $300B cluster!
They need to keep up with the Joneses 🤣🤣
It’s patently absurd that Apple isn’t smart enough to make their own AI, yet is somehow capable of ensuring that OpenAI will protect your security & privacy!
Apple has no clue what’s actually going on once they hand your data over to OpenAI. They’re selling you down the river.
I saw a job post the other day. 👔
It required 4+ years of experience in FastAPI. 🤦
I couldn't apply as I only have 1.5+ years of experience since I created that thing. 😅
Maybe it's time to re-evaluate that "years of experience = skill level". ♻
@giffmana
@panopstor
@twofifteenam
normalising outputs to a range -1, 1 will help increasing the stability of the neural networks and maintain consistency also the gradients are strongest around zero in this case.
@karpathy
In what scenarios would the use of flash attention over naive attention yield more significant performance benefits, also how can we plan to incorporate flash attention into llm.c?