Major surprise by 3 @Berkeley_EECS grad students: Meghal Gupta, @HongxunWu, Mihir Singhal. *Deterministic* eps-approximate quantiles in O(1/eps) mem. The previous best was the KLL sketch, which was randomized and used O(lglg(1/p)/eps) mem, p = fail prob 1/ Tweet added by Jelani Nelson @minilek

Jelani Nelson

2 months

Major surprise by 3 @Berkeley_EECS grad students: Meghal Gupta, @HongxunWu , Mihir Singhal. *Deterministic* eps-approximate quantiles in O(1/eps) mem. The previous best was the KLL sketch, which was randomized and used O(lglg(1/p)/eps) mem, p = fail prob 1/

Optimal quantile estimation: beyond the comparison model

Estimating quantiles is one of the foundational problems of data sketching. Given $n$ elements $x_1, x_2, \dots, x_n$ from some universe of size $U$ arriving in a data stream, a quantile sketch...

arxiv.org

2

22

166

Jelani Nelson

@minilek

2 months

You see a stream of n tokens and want to query the qth quantile, i.e. an elt bigger than qn stream elts. In the approx version, you're satisfied with returning any element at the (q +/- eps)th quantile. Algorithm dev for this problem started almost 30 yrs ago; this is a real 2/

1

0

3

Jelani Nelson

@minilek

2 months

problem in industry, with solutions deployed in many companies. The most popular solution is KLL (see ). It is randomized Monte Carlo, which means there's a chance it gives the wrong answer. This new sketch uses less asymptotic mem *and* is deterministic 3/

1

0

5

Jelani Nelson

@minilek

2 months

A typical application is system performance monitoring: what is the 99.9%th percentile response time for system? Note now the tokens are ints (e.g. time in milliseconds) -- the new sketch exploits this, whereas KLL and many previous sketches operated in the comparison model. 4/

1

0

5

Jelani Nelson

@minilek

2 months

This switch away from the comparison model is necessary, since in the comparison model there is a tight deterministic Omega((1/eps) log n) mem lower bound due to Cormode and Vesely in 2020 (matched by the Greenwald-Khanna sketch from 2001). 5/

1

0

3

Jelani Nelson

@minilek

2 months

Achieving O(1/eps) words of mem is significant, b/c it's the best you could hope for even if I allow you to store the entire stream and just ask you to compress it (you'd store the elts at quantile positions j*eps*n for j=1,2,...,1/eps). No experts saw an algo like this coming 6/

1

0

4

Jelani Nelson

@minilek

2 months

I recently applied for a grant where I mentioned this (amongst many) as an interesting problem to work on. Our students are too clever and solved it too quickly!😂 There are still many interesting problems left though, e.g. this new sketch is not 'fully mergeable', which is 7/

1

0

7

Jelani Nelson

@minilek

2 months

important for industry adoption (basically it means streams can be sharded across multiple servers, sketched separately, and the sketches can be merged w/ no accuracy loss, as if the stream had all been processed on the same machine). There's also an interesting lower bound .. 8/

1

5

Jelani Nelson

@minilek

2 months

question as to whether the (1/eps)lglg n in the lower bound can be upped to (1/eps)lg n (though (1/eps)lg U is tight, where the ints are in the range {1,...,U}). Results like this remind me of why this job is so awesome -- I get to see the future created right before my eyes! 9/

1

0

6

Jelani Nelson

@minilek

2 months

The idea of the new algorithm from 1000 feet is to combine an algorithm from 20 yrs ago (the "q-digest") with recursion. In the q-digest, you imagine the universe 1,...,U are leaves of a perfect binary tree. To insert a new stream token x, walk toward x starting at the root 10/

1

0

2

Jelani Nelson

@minilek

2 months

and place x into the first node that isn't "full" by incrementing its count (fullness means the node count hit some threshold T). Also, whenever a node u satisfies u.count + sibling(u).count + parent(u).count <= T you "compress" by merging all counts into the parent. If you 11/

1

0

2

Jelani Nelson

@minilek

2 months

set T = eps n / lg U, the q-digest paper proves that the number of nodes stored at any time is <= (1/eps) lg U. This means the mem is U' = (1/eps) lg U words. The new algorithm basically says: these new U' active nodes are a new universe, and future insertions are telling you 12/

1

0

2

Jelani Nelson

@minilek

2 months

which new universe elt's subtree to insert under. This then can be handled by a recursive sub-structure, which occasionally flushes back up to the parent. An invariant is maintained that at all levels of recursion, nodes are either completely full or completely empty, meaning 13/

1

0

3

Jelani Nelson

@minilek

2 months

the (1/eps) lg U full nodes at say the topmost level can be stored using (1/eps) lg U *bits* instead of words (the bit just tells you whether the node is full or empty, rather than storing the count). There are many details to make everything work. Read the full paper! 14/14

0

3

Jon Barron

@jon_barron

2 months

@minilek @Berkeley_EECS @HongxunWu I like this. Think it could be implemented in a deep learning framework efficiently? Tree-like datastructures are usually hard to work with in JAX or Pytorch. Also, is it straightforward to return the mean or something of the samples within the epsilon bin instead of a sample?

1

0

3

Jelani Nelson

@minilek

2 months

@jon_barron @Berkeley_EECS @HongxunWu As far as implementation, I don't know. I don't believe the authors have any implementation prepared. Note though any implementation that gets the right memory complexity would have to make use of bitpacking, storing info about diff. parts of the data structure into the same.. 1/

1

0

Replies