The Saddle Point @TheSaddlePoint X Profile

The Saddle Point

@TheSaddlePoint

Followers

137

Following

2K

Media

20

Statuses

719

No theory w/o code. No code w/o theory Ph.D (Statistics) (views expressed are personal and personal only)

Joined October 2014

Don't wanna be here? Send us removal request.

The Saddle Point

@TheSaddlePoint

1 month

RT @upperwal: EKA-Eval : A Comprehensive Evaluation Framework for Large Language Models in Indian Languages. First paper and open source co….

0

5

0

The Saddle Point

@TheSaddlePoint

4 months

It is nothing short of robbery in plain sight by the insurance companies to torture both patients and doctors alike.

𝙍𝘼𝙅𝙀𝙎𝙃 𝙋𝘼𝙍𝙄𝙆𝙃

@imacuriosguy

4 months

So I did a caesarean section and gave the patient itemised bill and discharge notes. This was a month ago. The husband and my patient stay outside Vadodara. He works in United phosphorus Ltd and medi assist gives the insurance coverage. He comes to be today telling that the.

0

1

The Saddle Point

@TheSaddlePoint

5 months

RT @upperwal: We are building an open source training framework (COOM) inspired by HAI-LLM from @deepseek_ai . Doing an intro call on 26th….

0

5

0

The Saddle Point

@TheSaddlePoint

10 months

I am in that camp (work in ML with a stats background), and lot of ideas we see in ML have strong roots in statistics.

Alexia Jolicoeur-Martineau

@jm_alexia

10 months

Generating new samples from an unknown distribution given a finite set of samples (training dataset) is a fundamental statistical problem. Yet Statisticians haven't touched the problem, while AI researchers have solved it (diffusion, GANs, VAE, LLMs). 🤷‍♀️.

0

1

The Saddle Point

@TheSaddlePoint

10 months

Compressive Learning

linkedin.com

Compressive Learning - a paradigmic shift to train (not just infer) LLMs in low-precision. Perceive in low dimensions with high fidelity, process high dimensional but sparse representations using...

0

The Saddle Point

@TheSaddlePoint

10 months

6.3/6. Can we directly burn the model onto silicon? That is, can we take a Boolean MLP specified in PyTorch and create HDL files using the latest and greatest in VLSI technology? See hls4ml project.

0

The Saddle Point

@TheSaddlePoint

10 months

6.2/n.Modeling for an approach or the more recent BOLD: Boolean Logic Deep Learning where networks are trained based on a new mathematical principle of Boolean Variation with considerations for chip architecture, memory hierarchy, dataflow and arithmetic precision.

1

0

The Saddle Point

@TheSaddlePoint

10 months

6.1/n.At the heart of it, it is the matrix multiplication that needs to be done efficiently. What if networks have no explicit MatMul operations - see Scalable MatMul-free LLMs or Boolean Logic Deep Learning.

1

0

The Saddle Point

@TheSaddlePoint

10 months

6/n.Hardware Accelerators. We are seeing a surge in hardware accelerators (and downstream toolchains including compilers, and hardware-software co-design) to make both training and inference faster. GPUs are the backbone of compute infra to train Deep Neural Nets.

1

0

The Saddle Point

@TheSaddlePoint

10 months

5.2/n.If all the weights are sampled from {-1,0,+1} they require log2⁡(3)=1.58 bits. The optimal Boolean MLPs are precisely 1 bit networks, but it is easy see that, the fully specified and overparametrized Boolean MLPs are 1.58 bits, where 0 weight models skip connections.

1

0

The Saddle Point

@TheSaddlePoint

10 months

5.1/n.Pruning and 1-bit LLMsIn the context LLMs, we see large models with parameters in the order of billions are successfully getting compressed/quantized without much degradation in the performance. See for example, The Era of 1-bit LLMs. All LLMs are 1.58 bits.

1

0

The Saddle Point

@TheSaddlePoint

10 months

@karpathy 4.2/n.Essentially a look up table. And a dense MLP would have implemented a soft version of the look-up table (this essentially shows again that all models are K-Nearest neighbors). The Boolean MLP might be an interesting model to probe the Lottery Ticket Hypothesis further.

1

0

The Saddle Point

@TheSaddlePoint

10 months

@karpathy 4.1/n.The Lottery Ticket Hypothesis conjectured that, when networks are densely and randomly intialized, some sub-networks would reach test accuracy comparable to original network. In the discrete case, it would mean that some inputs are exactly mapped to the output.

1

0

The Saddle Point

@TheSaddlePoint

10 months

@karpathy 3/n. Initialization:.Draw the weights with mean around +1/-1 for weights and +/-1/2 for bias terms.

1

0

The Saddle Point

@TheSaddlePoint

10 months

2/n.Based @karpathy's micrograd, I implemented BoolGrad, a proof-of-concept implementation of basic ideas introduced in BOLD. The implications for Deep Learning and LLMs are huge, if this scales out well.

1

0

The Saddle Point

@TheSaddlePoint

10 months

1/n.BOLD: Boolean Logic Deep Learning is an interesting paper where they developed a framework to define gradients for mixed data types including Booleans.

1

0

The Saddle Point

@TheSaddlePoint

11 months

RT @AI4Code: I am delighted to announce an exciting new course ✨ CIS 7000: Large Language Models ✨ I am teaching this semester: https://t.c….

0

12

0

The Saddle Point

@TheSaddlePoint

1 year

RT @YiMaTweets: We will prepare entirely new manuscripts, lecture notes, and teaching materials and will publicize them as soon as they ar….

0

4

0

The Saddle Point

@TheSaddlePoint

1 year

RT @docmilanfar: There’s a single formula that makes all of your diffusion models possible: Tweedie's. Say 𝐱 is a noisy version of 𝐮 with 𝐞….

0

113

0

The Saddle Point

@TheSaddlePoint

1 year

RT @docmilanfar: The history of Tweedie’s formula is fascinating. He sent the result privately in a letter to Robbins, who published it in….

0

3

0