apoorv @_apoorvnandan X Profile

apoorv

@_apoorvnandan

Followers

5K

Following

711

Media

196

Statuses

416

recreational coding + ml

https://t.co/pqhtOazn6k

Joined November 2016

Don't wanna be here? Send us removal request.

apoorv

@_apoorvnandan

6 months

Using categorical inputs in neural networks sounds trivial until its user IDs for 1 billion users and your nn.Embedding layer won't work because it needs 3TB of memory. Over the weekend, I explored how TikTok engineered for this scale in their recommendation system 🧵

5

27

395

apoorv

@_apoorvnandan

16 days

my fav use case for vibe coding is building control panels for my projects

0

1

apoorv

@_apoorvnandan

2 months

building a tool for testing, debugging and managing webhooks

0

8

apoorv

@_apoorvnandan

2 months

My cursor workspace is slowly evolving into just two panes - the chat (60-70%) and the terminal (30-40%). No files, file explorer etc.

0

5

apoorv

@_apoorvnandan

2 months

You can read the unrolled version here: https://t.co/EYO7pjLZGN And subscribe for more such breakdowns. P.S. I am looking for contract work, so if you need help with an ML/AI project, send me a DM.

0

apoorv

@_apoorvnandan

2 months

An interesting idea they mention for future work is extending VAEs to language modelling. It could potentially making the smart compose suggestions more appropriate and diverse!

1

0

1

apoorv

@_apoorvnandan

2 months

While performing beam search, they interpolate between the probability values given by the two models. Adding personalization improved ExactMatch scores as well as suggestion acceptance rates from the users.

1

0

apoorv

@_apoorvnandan

2 months

Now, let's talk about personalization: Users can have different writing styles and there are billions of them so they chose a lightweight n-gram language model which is easier to train and requires less data. It's also efficiently stored in a compact weighted finite automata.

1

0

1

apoorv

@_apoorvnandan

2 months

Looking to balance inference quality and latency, they went ahead with method A of feeding inputs (faster due to smaller sequence lengths) and LSTMs (lower latency at slightly lower quality).

1

0

1

apoorv

@_apoorvnandan

2 months

They applied beam search during inference and evaluated the models with two metrics: Log Perplexity and ExactMatch@N. ExactMatch for a predicted phrase that is N words long, the percentage of predicted phrase that exactly matches the first N words in the ground truth text.

1

0

1

apoorv

@_apoorvnandan

2 months

Method B: You combine all contextual information along with the prefix text in one long input sequence. This is simpler but the sequence length is longer.

1

0

1

apoorv

@_apoorvnandan

2 months

Second, the model: They experimented with LSTM and transformer models, and two different methods of feeding the inputs. Method A: The input sequence is the current e-mail body. The extra context is separately encoded into one embedding and combined with the input sequence.

1

0

1

apoorv

@_apoorvnandan

2 months

They replace infrequent words and entities like personal names, URLs, e-mail addresses, phone numbers, etc. by special tokens so that the model is not exposed to them. Then, they perform word level tokenization. The vocabulary contains the most frequent 50k English words.

1

0

1

apoorv

@_apoorvnandan

2 months

First up, preparing the data. They supplement e-mail contents with extra context: - Date and time: helps suggest good morning/evening, happy new year etc at the appropriate time - Locale of the user: helps the model distinguish between en-US and en-GB spellings

1

0

1

apoorv

@_apoorvnandan

2 months

Challenges: - extremely low latency: inference on almost every keystroke - personalization at large scale (1.5B users) - privacy: model should never expose personal information - high quality suggestions in subtly different contexts

1

0

3

apoorv

@_apoorvnandan

2 months

Based on https://t.co/x7Q4ui9BAp.

1

0

3

apoorv

@_apoorvnandan

2 months

Making Gmail’s smart compose system sounds trivial until you’re tasked with running inference for 1.5 billion users with 90th percentile latency under 60ms and personalization based on user’s writing style! Here’s a breakdown of how google approached this:

1

2

13

apoorv

@_apoorvnandan

2 months

https://t.co/fsgW0idjvZ

0

2

apoorv

@_apoorvnandan

2 months

nano-vllm: minimal reimplementation of vllm in 1200 lines of python

2

6

74

apoorv

@_apoorvnandan

2 months

if you wanna learn about neural nets, this is the most important plot you need to understand credits: @zhaisf

1

11

apoorv

@_apoorvnandan

2 months

left: python tkinter right: c++ SFML

0