_apoorvnandan Profile Banner
apoorv Profile
apoorv

@_apoorvnandan

Followers
5K
Following
711
Media
196
Statuses
416

recreational coding + ml

Joined November 2016
Don't wanna be here? Send us removal request.
@_apoorvnandan
apoorv
6 months
Using categorical inputs in neural networks sounds trivial until its user IDs for 1 billion users and your nn.Embedding layer won't work because it needs 3TB of memory. Over the weekend, I explored how TikTok engineered for this scale in their recommendation system đź§µ
5
27
395
@_apoorvnandan
apoorv
16 days
my fav use case for vibe coding is building control panels for my projects
0
0
1
@_apoorvnandan
apoorv
2 months
building a tool for testing, debugging and managing webhooks
0
0
8
@_apoorvnandan
apoorv
2 months
My cursor workspace is slowly evolving into just two panes - the chat (60-70%) and the terminal (30-40%). No files, file explorer etc.
0
0
5
@_apoorvnandan
apoorv
2 months
You can read the unrolled version here: https://t.co/EYO7pjLZGN And subscribe for more such breakdowns. P.S. I am looking for contract work, so if you need help with an ML/AI project, send me a DM.
0
0
0
@_apoorvnandan
apoorv
2 months
An interesting idea they mention for future work is extending VAEs to language modelling. It could potentially making the smart compose suggestions more appropriate and diverse!
1
0
1
@_apoorvnandan
apoorv
2 months
While performing beam search, they interpolate between the probability values given by the two models. Adding personalization improved ExactMatch scores as well as suggestion acceptance rates from the users.
1
0
0
@_apoorvnandan
apoorv
2 months
Now, let's talk about personalization: Users can have different writing styles and there are billions of them so they chose a lightweight n-gram language model which is easier to train and requires less data. It's also efficiently stored in a compact weighted finite automata.
1
0
1
@_apoorvnandan
apoorv
2 months
Looking to balance inference quality and latency, they went ahead with method A of feeding inputs (faster due to smaller sequence lengths) and LSTMs (lower latency at slightly lower quality).
1
0
1
@_apoorvnandan
apoorv
2 months
They applied beam search during inference and evaluated the models with two metrics: Log Perplexity and ExactMatch@N. ExactMatch for a predicted phrase that is N words long, the percentage of predicted phrase that exactly matches the first N words in the ground truth text.
1
0
1
@_apoorvnandan
apoorv
2 months
Method B: You combine all contextual information along with the prefix text in one long input sequence. This is simpler but the sequence length is longer.
1
0
1
@_apoorvnandan
apoorv
2 months
Second, the model: They experimented with LSTM and transformer models, and two different methods of feeding the inputs. Method A: The input sequence is the current e-mail body. The extra context is separately encoded into one embedding and combined with the input sequence.
1
0
1
@_apoorvnandan
apoorv
2 months
They replace infrequent words and entities like personal names, URLs, e-mail addresses, phone numbers, etc. by special tokens so that the model is not exposed to them. Then, they perform word level tokenization. The vocabulary contains the most frequent 50k English words.
1
0
1
@_apoorvnandan
apoorv
2 months
First up, preparing the data. They supplement e-mail contents with extra context: - Date and time: helps suggest good morning/evening, happy new year etc at the appropriate time - Locale of the user: helps the model distinguish between en-US and en-GB spellings
1
0
1
@_apoorvnandan
apoorv
2 months
Challenges: - extremely low latency: inference on almost every keystroke - personalization at large scale (1.5B users) - privacy: model should never expose personal information - high quality suggestions in subtly different contexts
1
0
3
@_apoorvnandan
apoorv
2 months
1
0
3
@_apoorvnandan
apoorv
2 months
Making Gmail’s smart compose system sounds trivial until you’re tasked with running inference for 1.5 billion users with 90th percentile latency under 60ms and personalization based on user’s writing style! Here’s a breakdown of how google approached this:
1
2
13
@_apoorvnandan
apoorv
2 months
nano-vllm: minimal reimplementation of vllm in 1200 lines of python
2
6
74
@_apoorvnandan
apoorv
2 months
if you wanna learn about neural nets, this is the most important plot you need to understand credits: @zhaisf
1
1
11
@_apoorvnandan
apoorv
2 months
left: python tkinter right: c++ SFML
0
0
0