Major surprise by 3
@Berkeley_EECS
grad students: Meghal Gupta,
@HongxunWu
, Mihir Singhal. *Deterministic* eps-approximate quantiles in O(1/eps) mem. The previous best was the KLL sketch, which was randomized and used O(lglg(1/p)/eps) mem, p = fail prob
1/
You see a stream of n tokens and want to query the qth quantile, i.e. an elt bigger than qn stream elts. In the approx version, you're satisfied with returning any element at the (q +/- eps)th quantile. Algorithm dev for this problem started almost 30 yrs ago; this is a real 2/
problem in industry, with solutions deployed in many companies. The most popular solution is KLL (see ). It is randomized Monte Carlo, which means there's a chance it gives the wrong answer. This new sketch uses less asymptotic mem *and* is deterministic 3/
A typical application is system performance monitoring: what is the 99.9%th percentile response time for system? Note now the tokens are ints (e.g. time in milliseconds) -- the new sketch exploits this, whereas KLL and many previous sketches operated in the comparison model. 4/
This switch away from the comparison model is necessary, since in the comparison model there is a tight deterministic Omega((1/eps) log n) mem lower bound due to Cormode and Vesely in 2020 (matched by the Greenwald-Khanna sketch from 2001). 5/
Achieving O(1/eps) words of mem is significant, b/c it's the best you could hope for even if I allow you to store the entire stream and just ask you to compress it (you'd store the elts at quantile positions j*eps*n for j=1,2,...,1/eps). No experts saw an algo like this coming 6/
I recently applied for a grant where I mentioned this (amongst many) as an interesting problem to work on. Our students are too clever and solved it too quickly!😂 There are still many interesting problems left though, e.g. this new sketch is not 'fully mergeable', which is 7/
important for industry adoption (basically it means streams can be sharded across multiple servers, sketched separately, and the sketches can be merged w/ no accuracy loss, as if the stream had all been processed on the same machine). There's also an interesting lower bound .. 8/
question as to whether the (1/eps)lglg n in the lower bound can be upped to (1/eps)lg n (though (1/eps)lg U is tight, where the ints are in the range {1,...,U}). Results like this remind me of why this job is so awesome -- I get to see the future created right before my eyes! 9/
The idea of the new algorithm from 1000 feet is to combine an algorithm from 20 yrs ago (the "q-digest") with recursion. In the q-digest, you imagine the universe 1,...,U are leaves of a perfect binary tree. To insert a new stream token x, walk toward x starting at the root 10/
and place x into the first node that isn't "full" by incrementing its count (fullness means the node count hit some threshold T). Also, whenever a node u satisfies
u.count + sibling(u).count + parent(u).count <= T
you "compress" by merging all counts into the parent. If you 11/
set T = eps n / lg U, the q-digest paper proves that the number of nodes stored at any time is <= (1/eps) lg U. This means the mem is U' = (1/eps) lg U words. The new algorithm basically says: these new U' active nodes are a new universe, and future insertions are telling you 12/
which new universe elt's subtree to insert under. This then can be handled by a recursive sub-structure, which occasionally flushes back up to the parent. An invariant is maintained that at all levels of recursion, nodes are either completely full or completely empty, meaning 13/
the (1/eps) lg U full nodes at say the topmost level can be stored using (1/eps) lg U *bits* instead of words (the bit just tells you whether the node is full or empty, rather than storing the count). There are many details to make everything work. Read the full paper! 14/14
@minilek
@Berkeley_EECS
@HongxunWu
I like this. Think it could be implemented in a deep learning framework efficiently? Tree-like datastructures are usually hard to work with in JAX or Pytorch.
Also, is it straightforward to return the mean or something of the samples within the epsilon bin instead of a sample?
@jon_barron
@Berkeley_EECS
@HongxunWu
As far as implementation, I don't know. I don't believe the authors have any implementation prepared. Note though any implementation that gets the right memory complexity would have to make use of bitpacking, storing info about diff. parts of the data structure into the same.. 1/