The Saddle Point Profile
The Saddle Point

@TheSaddlePoint

Followers
137
Following
2K
Media
20
Statuses
719

No theory w/o code. No code w/o theory Ph.D (Statistics) (views expressed are personal and personal only)

Joined October 2014
Don't wanna be here? Send us removal request.
@TheSaddlePoint
The Saddle Point
1 month
RT @upperwal: EKA-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages. First paper and open source co….
0
5
0
@TheSaddlePoint
The Saddle Point
4 months
It is nothing short of robbery in plain sight by the insurance companies to torture both patients and doctors alike.
@imacuriosguy
𝙍𝘼𝙅𝙀𝙎𝙃 𝙋𝘼𝙍𝙄𝙆𝙃
4 months
So I did a caesarean section and gave the patient itemised bill and discharge notes. This was a month ago. The husband and my patient stay outside Vadodara. He works in United phosphorus Ltd and medi assist gives the insurance coverage. He comes to be today telling that the.
0
0
1
@TheSaddlePoint
The Saddle Point
5 months
RT @upperwal: We are building an open source training framework (COOM) inspired by HAI-LLM from @deepseek_ai . Doing an intro call on 26th….
0
5
0
@TheSaddlePoint
The Saddle Point
10 months
I am in that camp (work in ML with a stats background), and lot of ideas we see in ML have strong roots in statistics.
@jm_alexia
Alexia Jolicoeur-Martineau
10 months
Generating new samples from an unknown distribution given a finite set of samples (training dataset) is a fundamental statistical problem. Yet Statisticians haven't touched the problem, while AI researchers have solved it (diffusion, GANs, VAE, LLMs). 🤷‍♀️.
0
0
1
@TheSaddlePoint
The Saddle Point
10 months
6.3/6. Can we directly burn the model onto silicon? That is, can we take a Boolean MLP specified in PyTorch and create HDL files using the latest and greatest in VLSI technology? See hls4ml project.
0
0
0
@TheSaddlePoint
The Saddle Point
10 months
6.2/n.Modeling for an approach or the more recent BOLD: Boolean Logic Deep Learning where networks are trained based on a new mathematical principle of Boolean Variation with considerations for chip architecture, memory hierarchy, dataflow and arithmetic precision.
1
0
0
@TheSaddlePoint
The Saddle Point
10 months
6.1/n.At the heart of it, it is the matrix multiplication that needs to be done efficiently. What if networks have no explicit MatMul operations - see Scalable MatMul-free LLMs or Boolean Logic Deep Learning.
1
0
0
@TheSaddlePoint
The Saddle Point
10 months
6/n.Hardware Accelerators. We are seeing a surge in hardware accelerators (and downstream toolchains including compilers, and hardware-software co-design) to make both training and inference faster. GPUs are the backbone of compute infra to train Deep Neural Nets.
1
0
0
@TheSaddlePoint
The Saddle Point
10 months
5.2/n.If all the weights are sampled from {-1,0,+1} they require log2⁡(3)=1.58 bits. The optimal Boolean MLPs are precisely 1 bit networks, but it is easy see that, the fully specified and overparametrized Boolean MLPs are 1.58 bits, where 0 weight models skip connections.
1
0
0
@TheSaddlePoint
The Saddle Point
10 months
5.1/n.Pruning and 1-bit LLMsIn the context LLMs, we see large models with parameters in the order of billions are successfully getting compressed/quantized without much degradation in the performance. See for example, The Era of 1-bit LLMs. All LLMs are 1.58 bits.
1
0
0
@TheSaddlePoint
The Saddle Point
10 months
@karpathy 4.2/n.Essentially a look up table. And a dense MLP would have implemented a soft version of the look-up table (this essentially shows again that all models are K-Nearest neighbors). The Boolean MLP might be an interesting model to probe the Lottery Ticket Hypothesis further.
1
0
0
@TheSaddlePoint
The Saddle Point
10 months
@karpathy 4.1/n.The Lottery Ticket Hypothesis conjectured that, when networks are densely and randomly intialized, some sub-networks would reach test accuracy comparable to original network. In the discrete case, it would mean that some inputs are exactly mapped to the output.
1
0
0
@TheSaddlePoint
The Saddle Point
10 months
@karpathy 3/n. Initialization:.Draw the weights with mean around +1/-1 for weights and +/-1/2 for bias terms.
1
0
0
@TheSaddlePoint
The Saddle Point
10 months
2/n.Based @karpathy's micrograd, I implemented BoolGrad, a proof-of-concept implementation of basic ideas introduced in BOLD. The implications for Deep Learning and LLMs are huge, if this scales out well.
1
0
0
@TheSaddlePoint
The Saddle Point
10 months
1/n.BOLD: Boolean Logic Deep Learning is an interesting paper where they developed a framework to define gradients for mixed data types including Booleans.
Tweet media one
1
0
0
@TheSaddlePoint
The Saddle Point
11 months
RT @AI4Code: I am delighted to announce an exciting new course ✨ CIS 7000: Large Language Models ✨ I am teaching this semester: https://t.c….
0
12
0
@TheSaddlePoint
The Saddle Point
1 year
RT @YiMaTweets: We will prepare entirely new manuscripts, lecture notes, and teaching materials and will publicize them as soon as they ar….
0
4
0
@TheSaddlePoint
The Saddle Point
1 year
RT @docmilanfar: There’s a single formula that makes all of your diffusion models possible: Tweedie's. Say 𝐱 is a noisy version of 𝐮 with 𝐞….
0
113
0
@TheSaddlePoint
The Saddle Point
1 year
RT @docmilanfar: The history of Tweedie’s formula is fascinating. He sent the result privately in a letter to Robbins, who published it in….
0
3
0