adad8m🦞 @adad8m profile

adad8m🦞

@adad8m

Followers

9K

Following

14K

Media

1K

Statuses

5K

#chesspunk 🐙 #ML 🦈

Joined September 2021

Don't wanna be here? Send us removal request.

adad8m🦞

@adad8m

2 years

When you discover that any element in a finite group has a finite order 😅.

25

554

5K

adad8m🦞

@adad8m

2 years

"[Henri Poincaré] worked regularly from 10 till 12 in the morning and from 5 till 7 in the late afternoon. He found that working longer seldom achieved anything"

46

433

3K

adad8m🦞

@adad8m

1 year

"Likes are now private"

9

134

3K

adad8m🦞

@adad8m

2 years

Is there anything more powerful than this?

112

225

3K

adad8m🦞

@adad8m

2 years

Change the way you educate, don't try to prevent students to use #chatGPT-like systems! 👇👇

119

520

3K

adad8m🦞

@adad8m

2 years

A few years ago, #Singapore Prime minister Lee Hsien Loong wrote and shared a C++ Sudoku solver, how cool is that! 😅. Not much comment in the code, but it's available here:

44

327

2K

adad8m🦞

@adad8m

3 years

Fit a polynomial of degree D to N points by minimising the mean squared error. Now, start increasing D and look at the accuracy. The accuracy blows-up around D~N and starts improving *again* for D>>N 🙂. #doubleDescent #statistics #maths

32

248

2K

adad8m🦞

@adad8m

1 year

What happens when you do PCA on a few Brownian trajectories. As the number of trajectories increases, the Principal Components converge to sine waves. #statistics

adad8m🦞

@adad8m

1 year

Interesting! I hope it's not just SVD/PCA on almost random noise because it's likely to have found similar patterns. (ie. you get sine waves when doing PCA on noise) #Statistics.

47

264

2K

adad8m🦞

@adad8m

2 years

Any other fun "consequences" of the fact that any element in a finite group has a finite order? #maths #group

adad8m🦞

@adad8m

2 years

When you discover that any element in a finite group has a finite order 😅.

20

183

2K

adad8m🦞

@adad8m

9 months

Oh, that's why! 😅

Simo Ryu

@cloneofsimo

9 months

Shampoo Scaling law for language model.Plot taste of Kaplan et al, but comparing shampoo and adam. Shampoo is literally such a free lunch, in large scale, in predictable manner.

2

106

2K

adad8m🦞

@adad8m

8 months

Historic Vids

@historyinmemes

8 months

Without Googling, name one thing from India

18

93

2K

adad8m🦞

@adad8m

1 year

#optimization at the beach

11

121

1K

adad8m🦞

@adad8m

2 years

#GPT: "Produce a diagram that explains how to be successful at life. Include all the details."

77

144

1K

adad8m🦞

@adad8m

2 years

😅😅

Daniel Litt

@littmath

2 years

I’ve been told GPT-4 with code interpreter is good at math. GPT-4 with code interpreter:

21

97

1K

adad8m🦞

@adad8m

3 years

I'm still finding it amazing that it is still an open-question whether this innocent-looking series converges or not! #maths

35

159

1K

adad8m🦞

@adad8m

3 years

Fun read of the day 😅.

18

168

1K

adad8m🦞

@adad8m

4 years

"The ability to play chess is the sign of a gentleman. The ability to play chess well is the sign of a wasted life" -- Paul Morphy #chess.

39

169

1K

adad8m🦞

@adad8m

9 months

This Barzilai-Borwein stuff is crazy, it's just choosing a slightly clever stepsize (i.e. no preconditionning) and is doing so much better than gradient descent with backtracking line-search! Still blowing my mind, who has some good readings about that? .#optimization #maths

Gabriel Peyré

@gabrielpeyre

9 months

Barzilai–Borwein method selects the step size for a gradient descent using a cheap approximation of the Hessian. Performs usually better than line search.

13

101

1K

adad8m🦞

@adad8m

2 years

I am just learning about fractional derivatives today, and that's really cool! #maths.

19

134

1K

adad8m🦞

@adad8m

1 year

The perfect #book doesn't exi.

24

99

1K

adad8m🦞

@adad8m

1 year

One of the best books to learn probability, I did spend many hours on that one! What are some similar exercise books that cover a large part of (insert another topic)? #books.

Reads with Ravi

@readswithravi

1 year

3) One Thousand Exercises in Probability by Geoffrey Grimmett and David Stirzaker. This is evergreen. Learning how to do Markov chains and solve the eigenvalues will never, ever not be helpful. This stuff requires more energy to read, but it keeps you sharp.

12

78

977

adad8m🦞

@adad8m

2 years

That's the characteristic polynomial of a random matrix . M ∈ R^{300 x 300}. Guess how the matrix was generated?.#maths #probability

21

85

907

adad8m🦞

@adad8m

8 months

Even AI doesn't believesit😅

The Nobel Prize

@NobelPrize

8 months

BREAKING NEWS.The Royal Swedish Academy of Sciences has decided to award the 2024 #NobelPrize in Physics to John J. Hopfield and Geoffrey E. Hinton “for foundational discoveries and inventions that enable machine learning with artificial neural networks.”

12

131

916

adad8m🦞

@adad8m

2 years

Hey! I used the first 10^8 zeroes of the #zeta function to train a neural network to predict their real part. Training error went all the way to zero. Validation loss as well, no overfitting! I can safely say that all zeroes of the Zeta function satisfy Real(z)=1/2.#maths #AI

31

67

904

adad8m🦞

@adad8m

3 years

Graphical comparison between the standard (linear) #correlation and the Chatterjee's "rank correlation" recently introduced in #statistics #probability @johnleibniz @_bakshay

Preetum Nakkiran

@PreetumNakkiran

3 years

Very clean idea / paper!

14

220

886

adad8m🦞

@adad8m

1 year

#VladimirArnold

14

119

864

adad8m🦞

@adad8m

2 years

Glad to announce I'm joining in September 2023. Our goal is to understand the universe, regulate the use of gzip and prevent human extinction. #AI.

Riley Goodside

@goodside

2 years

this is wild — kNN using a gzip-based distance metric outperforms BERT and other neural methods for OOD sentence classification. intuition: 2 texts similar if cat-ing one to the other barely increases gzip size. no training, no tuning, no params — this is the entire algorithm:

13

55

797

adad8m🦞

@adad8m

2 years

Read yesterday in a ML paper (forgot which one). "One can approximate a Dirac delta function with a Gaussian distribution with variance zero".

😡fermion, PhD

@angryfermion

2 years

physicists: math is so hard!. the math they do:

29

63

770

adad8m🦞

@adad8m

2 years

I've just learned that, as of today, nobody knows whether this thing is an integer or not. #maths

56

39

737

adad8m🦞

@adad8m

1 year

Over the past few weeks, I've been reading some parts of this Linear Algebra #book & the author emphasises block matrices computations, which is quite different from many other textbooks and also surprisingly powerful. Recommended!.#maths

Andy Matuschak

@andy_matuschak

1 year

A few favorite related refs:. 1. Axler's preface in "Linear Algebra Done Right", suggesting par at 1 page/hour. 2. Norvig's "Teach Yourself Programming in Ten Years" (contra Learn C++ in 24 Hours)

6

83

739

adad8m🦞

@adad8m

3 years

Arrived today ❤️ will try to share short #simulations as I go through it 🙂. #book #statphy #maths

10

92

721

adad8m🦞

@adad8m

2 years

Playing with Gaussian-Process regression models. #statistics #JAX #gaussian

4

83

625

adad8m🦞

@adad8m

2 years

Removing a +1.#maths

5

37

618

adad8m🦞

@adad8m

2 years

@dieworkwear If you need a full Twitter thread to explain why something looks great, does it really look that great?.

265

10

559

adad8m🦞

@adad8m

2 years

Reading list of January 😝.Week 1. "Gravitation", Misner & al.Week 2. "Modern Classical Physics", Thorne.Week 3. "Introduction to Differential Geometry", Spivak.Week 4. "The Road to Reality", Penrose. Will start Landau & Lifshitz if I still have a bit of time.

45

38

563

adad8m🦞

@adad8m

1 year

Finally had time to try one of these #KAN architectures. Surprised how well it worked on this low-dim regression example. For a #MLP to converge that fast, it would typically need a bit of feature engineering (eg. Fourier features, etc. ) Good stuff!

Ziming Liu

@ZimingLiu11

1 year

MLPs are so foundational, but are there alternatives? MLPs place activation functions on neurons, but can we instead place (learnable) activation functions on weights? Yes, we KAN! We propose Kolmogorov-Arnold Networks (KAN), which are more accurate and interpretable than MLPs.🧵

9

51

519

adad8m🦞

@adad8m

2 years

The Cayley transform. C(z) = (z+i)/(z-i). maps the upper complex plane onto the unit disk. #complexAnalysis #maths

adad8m🦞

@adad8m

2 years

The Mobius transform . F(z) = (z-z0) / (z*conj(z0) - 1). maps the complex unit disk into itself. It is an involution that exchanges z=0 and z=z_0, and it's beautiful 😍. Except the usual Schwarz lemma, what are some cool applications of these transformations?. #complexAnalysis

4

76

499

adad8m🦞

@adad8m

2 years

I'm visiting my parents today and I just bumped into two very old friends.

8

21

503

adad8m🦞

@adad8m

2 years

Lagrange interpolation, nodes locations matter!. #lagrange #chebyshev #maths

adad8m🦞

@adad8m

2 years

Lagrange interpolation with equidistant nodes kinda sucks. 😅. #maths #lagrange

2

74

498

adad8m🦞

@adad8m

3 years

"waste 6 months to learn mathematics".

parth🥊

@prthgo

3 years

Don't waste 6 months just to learn mathematics to start machine learning.

27

18

480

adad8m🦞

@adad8m

2 years

Just read an AI paper (with respected Google authors) doing algebra / integral / transition probabilities to basically establish that. E[a*X+b*Y|Y] = a*E[X|Y]+b*Y. and I'm not sure whether I need a stronger coffee?!? #maths.

37

22

471

adad8m🦞

@adad8m

3 years

I am a fan of the (❤️Hessian free❤️) Levenberg–Marquardt method! Here it is on the Rosenbrock function 😛. #optimization #math

Gabriel Peyré

@gabrielpeyre

3 years

Barzilai-Borwein method selects the step size for gradient descent using a cheap approximation of the Hessian. Performs usually (much) better than line search.

4

74

483

adad8m🦞

@adad8m

1 year

We understand Newton's method 😛.

Dimitris Papailiopoulos

@DimitrisPapail

1 year

Whoever tells you “we understand deep learning” just show them this. Fractals of the loss landscape as a function of hyperparameters even for small two layers nets. Incredible.

10

25

469

adad8m🦞

@adad8m

1 year

greedy search is enough.

Joel David Hamkins

@JDHamkins

1 year

My daughter had a nice problem in her high-school math club. Suppose you have 1000 white points and 1000 black points in the plane, no three collinear. Can you draw segments connecting them in pairs, from white to black, using each point just once, without any edges crossing?.

11

24

473

adad8m🦞

@adad8m

2 years

In high-dimensional spaces, a step in a random direction is more likely to take you further from the origin! There is a lot of space to explore in high-dimensions 🙂

Fermat's Library

@fermatslibrary

2 years

Probability of returning to the origin in a random walk:.1D → P=1.2D → P=1.3D → P=0.34.Large D → P=1/2D

14

51

467

adad8m🦞

@adad8m

3 years

Nice one (can quite easily be done mentally) 😉

34

63

453

adad8m🦞

@adad8m

2 years

Teaching base 10 to my kids this morning 😅

5

18

440

adad8m🦞

@adad8m

1 year

Alice and Bob are on a perfectly circular island. How can Alice find the shortest path to Bob that also touches the water?.#geometry #maths

adad8m🦞

@adad8m

1 year

#optimization at the beach

56

37

438

adad8m🦞

@adad8m

2 years

What's your go-to tool to produce simple maths plots such as the one below without spending too much time on it? Eg. I should be able to produce such a plot in <5min. Btw, I don't want to learn TikZ & co.#maths

114

38

435

adad8m🦞

@adad8m

3 years

@PreetumNakkiran proposed to look at the jpeg size per pixel to find the correct dimensions: works very well! . #statistics #maths

Matt Henderson

@matthen2

3 years

imagine we have a stream of pixels coming in one by one. They form a video, but we don't know the width and height of each frame. Here we try a bunch of guesses for the dimensions, before locking in on the correct values

7

51

423

adad8m🦞

@adad8m

1 year

This Kozachenko-Leonenko estimator of the #entropy is really neat! Animation below minimises the usual (Energy-Entropy) functional for a mixture of Gaussians 😍. Thanks to @gabrielpeyre and @sp_monte_carlo for making me discover this today #maths

Sam Power

@sp_monte_carlo

1 year

@adad8m @gabrielpeyre i believe that the entropy estimator itself is called the Kozachenko-Leonenko estimator, see e.g. and the original paper (in Russian)

6

65

420

adad8m🦞

@adad8m

8 months

Interesting comparison of the KAN neural architecture when compared to standard MLP!

Quanta Magazine

@QuantaMagazine

9 months

It’s possible to rewrite a certain kind of complicated mathematical function as a combination of simpler ones. This discovery, made in 1957 by Andrey Kolmogorov (left) and Vladimir Arnold (right), is at the heart of a new network architecture that could make AI easier to study

6

57

430

adad8m🦞

@adad8m

2 years

Cool importance sampling estimation of the (absolute) determinant!. "Two equalities expressing the determinant of a matrix in terms of expectations over matrix-vector products" by @jaschasd.

10

49

395

adad8m🦞

@adad8m

2 years

Well, tbh, there is no very good reason to divide by (n-1). And I've never seen a practically relevant situation where this makes a difference. .

Kareem Carr, Statistics Person

@kareem_carr

2 years

WHY do we divide by n-1 when computing the sample variance?. I've never seen this way of explaining this concept anywhere else. Read on if you want a completely new way of looking at this.

35

28

378

adad8m🦞

@adad8m

2 years

Hey, should I do a twitter thread 🧵 on this amazing ❤️unbiased❤️ estimator of the variance???.#statistics #maths

adad8m🦞

@adad8m

2 years

#StatsBro #maths

15

23

357

adad8m🦞

@adad8m

2 years

First time I've tried to use #JAX to implement MCMC. On a Random-Walk-Metropolis smallish example with a super-simple target distribution (i.e. fast to compute), using jax.lax.scan(. ) instead of a #numpy loop gives me a ~1500 speed-up! (yes: 66sec to 0.04sec)🤯🤯.

8

22

375

adad8m🦞

@adad8m

1 year

Interesting! I hope it's not just SVD/PCA on almost random noise because it's likely to have found similar patterns. (ie. you get sine waves when doing PCA on noise) #Statistics.

Nathan Baugh

@nathanbaugh27

1 year

In 2016, researchers at the University of Adelaide tested Kurt Vonnegut's theory that, "There’s no reason why the simple shapes of stories can’t be fed into computers.". They took the emotional arcs of 1300+ novels from Project Gutenberg, turned that into data, used modern tech

12

18

379

adad8m🦞

@adad8m

2 years

Oh come on, is it the standard we expect from our Turing award winners nowadays? "True Bayesian", random nonsensical proba estimate that will change next week, etc.

32

13

334

adad8m🦞

@adad8m

3 years

Consider a 2-layered #neuralnet of the type. F(x) = (1/N) ∑ a_i * ɸ(x-c_i). for some nonlinearity ɸ. During training, the empirical distribution of weights (a_i, c_i) can be described by a cute PDE and it's also quite interesting to visualize 🙂. #maths #deeplearning

6

43

347

adad8m🦞

@adad8m

2 years

Funny how sharing the picture of a sciency book-cover immediately gets many likes! #maths #physics

15

14

332

adad8m🦞

@adad8m

2 years

Beautiful one! Do you see the checkmate for Black? #chess

113

42

347

adad8m🦞

@adad8m

2 years

@juanbuis Same vibe.

3

5

336

adad8m🦞

@adad8m

2 years

Twitter is such an amazing place! How fortunate we are to witness such profound scholastic debate by the intellectual giants of our times.

27

15

339

adad8m🦞

@adad8m

2 years

Seems like a very fun #complexAnalysis book, anyone has read it? I like it that it seems computational, with interesting simulations/computations to perform!.#maths #book

2

39

339

adad8m🦞

@adad8m

2 years

Challenge accepted! This looks like hieroglyphs to me, and hopefully not anymore in a few months time 🙂

16

8

315

adad8m🦞

@adad8m

2 years

I've finally tried it. Meh.

23

7

310

adad8m🦞

@adad8m

2 years

I'm a bit biased, but I'm still finding it funny to meet people who can talk to me about the Langlands programme and the Trace class formula but cannot do a linear regression or simulate a pendulum. I've met plenty!.

Mathieu

@miniapeur

2 years

17

16

298

adad8m🦞

@adad8m

3 years

Fill 5x5 matrices with +1/-1 at random and plot the eigenvalues. Repeat many times. What's that structure?. #randomMatrices #maths

sara 🦕🪐🐌

@souplovr23

3 years

all the complex roots of degree ten polynomials whos coefficients are either 1 or -1

10

32

281

adad8m🦞

@adad8m

2 years

12

22

274

adad8m🦞

@adad8m

1 year

Interesting read: how to go from the slow matrix-multiplication #python code below that runs in 6h to an optimized code than runs in 1sec!

Vipul Vaibhaw

@vaibhaw_vipul

1 year

Matrix Multiplication: Optimizing the code from 6 hours to ~ 1 sec. "Performance Engineering is a lost art." - Charles Leiserson . I followed this lecture -

6

29

276

adad8m🦞

@adad8m

1 year

Monte-Carlo is overrated, if you Cantor enumerate the rational numbers in German you get low discrepancy sequence. #maths.

Daniel Litt

@littmath

1 year

Here’s German, very different!

2

20

269

adad8m🦞

@adad8m

3 years

Take a 2D circle and propagate it through a fully connected #neuralnet whose layers have 256 neurons and randomly generated weights with std = σ/√256. At each layer, visualise a (linear) 2D projection. What's the influence of σ?. #maths #deeplearning

7

25

258

adad8m🦞

@adad8m

2 years

😅

4

7

249

adad8m🦞

@adad8m

2 years

A reminder to appreciate the basics!

6

12

248

adad8m🦞

@adad8m

2 years

@LongFormMath

6

2

242

adad8m🦞

@adad8m

2 years

Say what you want, but Terry Tao winning a bottle of honey mead for proving the convergence of a series is the coolest thing ever 😅. #maths

5

33

242

adad8m🦞

@adad8m

3 years

A cutie 🙂 #maths #inequality

11

24

242

adad8m🦞

@adad8m

1 year

Recommended account to follow, with many wonderful youtube #maths videos!.

Mathemaniac

@mathemaniacyt

1 year

Why do we require Jacobi identity to be satisfied for a Lie bracket? In the process, we also understand intuitively why tr(AB) = tr(BA) without matrix components. Watch now:

8

17

236

adad8m🦞

@adad8m

2 years

The Mobius transform . F(z) = (z-z0) / (z*conj(z0) - 1). maps the complex unit disk into itself. It is an involution that exchanges z=0 and z=z_0, and it's beautiful 😍. Except the usual Schwarz lemma, what are some cool applications of these transformations?. #complexAnalysis

4

28

231

adad8m🦞

@adad8m

2 years

For two PSD matrices, is there a name for this quantity? That's the matrix that describe the optimal transport from two Gaussians.

14

17

232

adad8m🦞

@adad8m

1 year

Don't follow this path, ever!

@yoadrienne.bsky.social

@yoadri_n

1 year

30+ folks, what is your best piece of professional advice?.

9

20

236

adad8m🦞

@adad8m

2 years

This "gamma function" trick is noice!.

⛵️

@zzbar

2 years

@nntaleb Use gamma function trick to decompose the triple integral into a single one:

4

26

240

adad8m🦞

@adad8m

3 years

Anti-social dynamics 😅. Consider 100 persons walking at constant speed and always perpendicular to their closest neighbours. If there were only 2 persons, they would follow the same circle forever. More complicated with 100 persons 😍 . Does it stay bounded? @johncarlosbaez

30

26

234

adad8m🦞

@adad8m

2 years

#StatsBro #maths

adad8m🦞

@adad8m

2 years

Well, tbh, there is no very good reason to divide by (n-1). And I've never seen a practically relevant situation where this makes a difference. .

10

32

231

adad8m🦞

@adad8m

2 years

👌

5

15

234

adad8m🦞

@adad8m

9 months

Variational inference: a Gaussian hesitating between two modes. 😅 . Details: SGD to minimize KL(q || target) with standard reparametrization trick. #optimization #maths #ML

2

15

233

adad8m🦞

@adad8m

10 months

Am in a hotel and there are a few old books on display in the reception hall. Always fun to see some old polymer chemistry

3

7

232

adad8m🦞

@adad8m

1 year

OK, this actually seems to be true that UK Nobel prize winners are more likely to be born in September: the question then is why? #Statistics .

Erik Hoel

@erikphoel

1 year

3. Small advantages compound. This is why UK Nobel Prize winners are 2x as likely to be born in September. It's not the genes - just the small headstart of being developmentally a bit ahead is very potent academically.

88

25

225

adad8m🦞

@adad8m

9 months

Bernard Morin, the first person to "visualize" the eversion of the sphere, was blind since the age of 6. #maths #geometry

5

28

226

adad8m🦞

@adad8m

2 years

Only one chapter in and I'm hooked to the new book by Parisi, recommended! #statphy

1

17

218

adad8m🦞

@adad8m

2 years

#JAX + #gpu makes it super simple to run many #MCMC simulations in parallel. Below, 200 Glauber dynamics Markov chains are run in parallel to locate the phase transition of the 2D #Ising model!

adad8m🦞

@adad8m

2 years

First time I've tried to use #JAX to implement MCMC. On a Random-Walk-Metropolis smallish example with a super-simple target distribution (i.e. fast to compute), using jax.lax.scan(. ) instead of a #numpy loop gives me a ~1500 speed-up! (yes: 66sec to 0.04sec)🤯🤯.

7

27

220

adad8m🦞

@adad8m

3 years

Couldn't resist to reproduce @j_bertolotti fantastic experiment. Below, the eigenmode of N=100 coupled oscillators placed on a circle. Each oscillator is coupled to its 2 neighbours. As the masses get more random, one observe "Anderson #localization" 😍. #physics

2

20

198

adad8m🦞

@adad8m

2 years

For two matrices A and B of compatible dimensions and function f(. ) "nice enough" to be Taylor expanded we have the following identity, as a one line proof shows by expanding f(. ). Silly, but first time I am noticing it. Is there a name for this? Applications?