
The Saddle Point
@TheSaddlePoint
Followers
137
Following
2K
Media
20
Statuses
719
No theory w/o code. No code w/o theory Ph.D (Statistics) (views expressed are personal and personal only)
Joined October 2014
RT @upperwal: EKA-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages. First paper and open source co….
0
5
0
It is nothing short of robbery in plain sight by the insurance companies to torture both patients and doctors alike.
So I did a caesarean section and gave the patient itemised bill and discharge notes. This was a month ago. The husband and my patient stay outside Vadodara. He works in United phosphorus Ltd and medi assist gives the insurance coverage. He comes to be today telling that the.
0
0
1
RT @upperwal: We are building an open source training framework (COOM) inspired by HAI-LLM from @deepseek_ai . Doing an intro call on 26th….
0
5
0
I am in that camp (work in ML with a stats background), and lot of ideas we see in ML have strong roots in statistics.
Generating new samples from an unknown distribution given a finite set of samples (training dataset) is a fundamental statistical problem. Yet Statisticians haven't touched the problem, while AI researchers have solved it (diffusion, GANs, VAE, LLMs). 🤷♀️.
0
0
1
@karpathy 4.2/n.Essentially a look up table. And a dense MLP would have implemented a soft version of the look-up table (this essentially shows again that all models are K-Nearest neighbors). The Boolean MLP might be an interesting model to probe the Lottery Ticket Hypothesis further.
1
0
0
@karpathy 4.1/n.The Lottery Ticket Hypothesis conjectured that, when networks are densely and randomly intialized, some sub-networks would reach test accuracy comparable to original network. In the discrete case, it would mean that some inputs are exactly mapped to the output.
1
0
0
@karpathy 3/n. Initialization:.Draw the weights with mean around +1/-1 for weights and +/-1/2 for bias terms.
1
0
0
2/n.Based @karpathy's micrograd, I implemented BoolGrad, a proof-of-concept implementation of basic ideas introduced in BOLD. The implications for Deep Learning and LLMs are huge, if this scales out well.
1
0
0
RT @AI4Code: I am delighted to announce an exciting new course ✨ CIS 7000: Large Language Models ✨ I am teaching this semester: https://t.c….
0
12
0
RT @YiMaTweets: We will prepare entirely new manuscripts, lecture notes, and teaching materials and will publicize them as soon as they ar….
0
4
0
RT @docmilanfar: There’s a single formula that makes all of your diffusion models possible: Tweedie's. Say 𝐱 is a noisy version of 𝐮 with 𝐞….
0
113
0
RT @docmilanfar: The history of Tweedie’s formula is fascinating. He sent the result privately in a letter to Robbins, who published it in….
0
3
0