A solution – part III (final)

So adding everything up…

We’re going to read the extents one by one into memory. As we read an extent, we’ll break it up into a set of non-overlapping extents, each holding a counter of how many overlapping extents contain it.
Having non-overlapping extents means that in fact we’re ordering the extents. In the previous example, we have A < C < B < D. This ordering allows us to use data structures that order their elements, a binary tree for instance. Each ‘side’ of the extent becomes a node in the binary tree. And each node can store the number of extents including the extent either to its right or to its left. To be able to use lower_bound, we’ll store the number of extents including the extent to the left of the node value.

As such, in our example, we’ll have a binary tree like this:

   C
 /   \
A     B
        \
         D

And the value they store are:
node_value(A) = 0 (there are no extents to the left of A, in other words in (-inifinity, A)
node_value(C) = 1 (there is only one extent overlapping the [A, C) extent
node_value(B) = 2 (there are 2 extents overlapping [C, B) extent)
node_value(D) = 1 (…)

For our extents, we will have to build a data structure like this. Let’s assume we have a number x that we want to check against our extents. Let’s assume that we have

A < C < x < B < D

The result in this case should be 2. Remember that our overlapping extents are [A, B) and [C, D) where A < C < B < D.
That means that x belongs in both [A, B) and [C, D) extent.
Let’s see how to get this result with our data structure. Let’s see what happens if we call lower_bound(x) on our data structure. From MSDN: lower_bound “Returns an iterator to the first element in a map with a key value that is equal to or greater than that of a specified key.”
So, calling lower_bound(x) will return an iterator pointing to B. If we get the value associated with node B, that is node_value(B), we get our result: 2.

I leave it as an exercise for you to write the code in your favourite language.

The lower_bound algorithm has logarithmic complexity, which is within our constraints. It might be possible to improve this solution by using hash tables which have amortized constant complexity. However, there’s one problem, hash tables are not ordered as binary trees are, as such, it is not possible to get from a node B to its immediate neighbours C and D for example. The solution presented here relies on that, as whenever a new extent is inserted in the binary tree, it needs to update the ‘overlapping counters’ of all the nodes within the extent.

If you find a better solution, please let me know.

Advertisements

Tags: , ,

One Response to “A solution – part III (final)”

  1. Tom Says:

    I found a better solution. 😉

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: