@minilek
Jelani Nelson
2 months
Major surprise by 3 @Berkeley_EECS grad students: Meghal Gupta, @HongxunWu , Mihir Singhal. *Deterministic* eps-approximate quantiles in O(1/eps) mem. The previous best was the KLL sketch, which was randomized and used O(lglg(1/p)/eps) mem, p = fail prob 1/
2
22
166

Replies

@minilek
Jelani Nelson
2 months
You see a stream of n tokens and want to query the qth quantile, i.e. an elt bigger than qn stream elts. In the approx version, you're satisfied with returning any element at the (q +/- eps)th quantile. Algorithm dev for this problem started almost 30 yrs ago; this is a real 2/
1
0
3
@minilek
Jelani Nelson
2 months
problem in industry, with solutions deployed in many companies. The most popular solution is KLL (see ). It is randomized Monte Carlo, which means there's a chance it gives the wrong answer. This new sketch uses less asymptotic mem *and* is deterministic 3/
1
0
5
@minilek
Jelani Nelson
2 months
A typical application is system performance monitoring: what is the 99.9%th percentile response time for system? Note now the tokens are ints (e.g. time in milliseconds) -- the new sketch exploits this, whereas KLL and many previous sketches operated in the comparison model. 4/
1
0
5
@minilek
Jelani Nelson
2 months
This switch away from the comparison model is necessary, since in the comparison model there is a tight deterministic Omega((1/eps) log n) mem lower bound due to Cormode and Vesely in 2020 (matched by the Greenwald-Khanna sketch from 2001). 5/
1
0
3
@minilek
Jelani Nelson
2 months
Achieving O(1/eps) words of mem is significant, b/c it's the best you could hope for even if I allow you to store the entire stream and just ask you to compress it (you'd store the elts at quantile positions j*eps*n for j=1,2,...,1/eps). No experts saw an algo like this coming 6/
1
0
4
@minilek
Jelani Nelson
2 months
I recently applied for a grant where I mentioned this (amongst many) as an interesting problem to work on. Our students are too clever and solved it too quickly!😂 There are still many interesting problems left though, e.g. this new sketch is not 'fully mergeable', which is 7/
1
0
7
@minilek
Jelani Nelson
2 months
important for industry adoption (basically it means streams can be sharded across multiple servers, sketched separately, and the sketches can be merged w/ no accuracy loss, as if the stream had all been processed on the same machine). There's also an interesting lower bound .. 8/
1
1
5
@minilek
Jelani Nelson
2 months
question as to whether the (1/eps)lglg n in the lower bound can be upped to (1/eps)lg n (though (1/eps)lg U is tight, where the ints are in the range {1,...,U}). Results like this remind me of why this job is so awesome -- I get to see the future created right before my eyes! 9/
1
0
6
@minilek
Jelani Nelson
2 months
The idea of the new algorithm from 1000 feet is to combine an algorithm from 20 yrs ago (the "q-digest") with recursion. In the q-digest, you imagine the universe 1,...,U are leaves of a perfect binary tree. To insert a new stream token x, walk toward x starting at the root 10/
1
0
2
@minilek
Jelani Nelson
2 months
and place x into the first node that isn't "full" by incrementing its count (fullness means the node count hit some threshold T). Also, whenever a node u satisfies u.count + sibling(u).count + parent(u).count <= T you "compress" by merging all counts into the parent. If you 11/
1
0
2
@minilek
Jelani Nelson
2 months
set T = eps n / lg U, the q-digest paper proves that the number of nodes stored at any time is <= (1/eps) lg U. This means the mem is U' = (1/eps) lg U words. The new algorithm basically says: these new U' active nodes are a new universe, and future insertions are telling you 12/
1
0
2
@minilek
Jelani Nelson
2 months
which new universe elt's subtree to insert under. This then can be handled by a recursive sub-structure, which occasionally flushes back up to the parent. An invariant is maintained that at all levels of recursion, nodes are either completely full or completely empty, meaning 13/
1
0
3
@minilek
Jelani Nelson
2 months
the (1/eps) lg U full nodes at say the topmost level can be stored using (1/eps) lg U *bits* instead of words (the bit just tells you whether the node is full or empty, rather than storing the count). There are many details to make everything work. Read the full paper! 14/14
0
0
3
@jon_barron
Jon Barron
2 months
@minilek @Berkeley_EECS @HongxunWu I like this. Think it could be implemented in a deep learning framework efficiently? Tree-like datastructures are usually hard to work with in JAX or Pytorch. Also, is it straightforward to return the mean or something of the samples within the epsilon bin instead of a sample?
1
0
3
@minilek
Jelani Nelson
2 months
@jon_barron @Berkeley_EECS @HongxunWu As far as implementation, I don't know. I don't believe the authors have any implementation prepared. Note though any implementation that gets the right memory complexity would have to make use of bitpacking, storing info about diff. parts of the data structure into the same.. 1/
1
0
0