Quantiles and Inverse Quantiles

Heinrich Hartmann June 27, 2014


1 Setting

2 Quantiles

3 Inverse Quantiles

4 Discussion

4.1 Scenario A

4.2 Scenario B

1 Setting

Say we are running a database and want to meet the following requirement: 99.9% of all queries shall be answered within 100ms.

In order to verify this requirement, we measure the response times of all queries and put them into a histogram-graph. This graph consists of a color map that indicates many queries have been answered at which durations over time. Such a diagram can reveal if the majority of queries is answered in less then 100ms, but it is not possible to check the requirement directly from the color map. The requirement concerns 0.1% of the query volume which leaves hardly any footprint in the color map.

In order to check the requirement we have to add an analytics overlay. There are in fact two different overlays we can choose from: The Quantile and Inverse Quantile.

2 Quantiles

We add the 99.9% quantile Q to the histograms. This will add a curve to the diagram, whose values Q(t) at a given point in time represents the smallest duration such that 99.9% of all queries at time t are answered in less then Q(t) ms.

The requirement is thus translated into the condition that Q(t) ≤ 100ms, which is easy to read off the graph.

3 Inverse Quantiles

Alternatively, we can add the 100ms inverse quantile R to the graph. This will also add a curve, whose value R(t) is the ratio of queries which were answered in less than 100ms. Note that the the value of R(t) is always between 0 and 1. The web interface shows this value on an independent y-axis which is suitably rescaled, depending on the values of R(t).

The requirement can be translated to the condition that R(t) ≥ 0.999.

4 Discussion

Let's say all queries are answered in less than 100ms. In this case, Q(t) will have some value that is smaller than 100ms. The distance between Q(t) and 100ms is an indicator of how much capacity is left before the requirement is violated.

The inverse quantile R(t) will have the value 1 if all queries are below 100ms. The (very small) distance to 0.999 is another measure for the remaining capacity.

But how are these capacity measures different?

4.1 Scenario A:

Let's assume all queries in a given time take precisely the same time and that a problem occurs (e.g. high server load) that causes all queries to get simultaneously slower and slower.

In this scenario, the quantile Q will follow the query duration and give an early indication of an approaching problem. In contrast, the inverse quantile R(t) stays at its value 1 until one (and hence all) queries take longer than 100ms, when it will drop to 0. No warning was given! The condition was directly violated heavily.

We see that monitoring the quantile is essential to detect such a problem early.

4.2 Scenario B:

We assume again that all our queries are initially answered in 80ms. Now a problem occurs (e.g. hdd failure, new code path) that only affects a very small number of queries and causes them to be answered in 200ms. In this case, Q(t) will stay at 80ms, since 99.9%, of all queries are still answered in 80ms. As soon as more than 0.1% of the queries are affected, the quantile jumps to 200ms without any early warning!

In contrast, R(t) will be affected directly when the problem occurs and dropping down more and more as the problem affects more and more queries. We see that this time R(t) gives us an early warning.