From charlesreid1

No edit summary
 
(20 intermediate revisions by the same user not shown)
Line 1: Line 1:
==Problem Description==
=TOC1=


Consider a set of (x,y) points that we would like to
This page covers an algorithm for implementing a Hilbert Sort, which sorts points according to the order in which they would be visited by a space-filling curve.
have access to, as quickly as possible.


A first pass approach might be: sort the points according to their x-coordinate values.
* Introduction
** Motivation - why sort via space-filling curve?
** Applications of space-filling curves
** Figures


However, this method of sorting the points is not optimal:
* Problem statement - w's
two points with nearby x values may still be on opposite sides of the grid.  
** What is the problem, focus on the inputs
** Explain that we are using an already-drawn curve as a grid, not worrying about the curve itself.
** Grid dictated by number of unique points. Can make them arbitrarily close (doubles).


Another approach is to sort points according to Euclidean distance,
* Solution
so that the nearness of points becomes a function of the coordinate values
** Solution approach and algorithmic thinking
x and y.
** Quadrant, spatial approach
** Recursive method
** Two problems
*** First problem, dividing up space and keeping track of coordinates, is going to get messy
*** Second problem, trying to keep track of rotation/reflection across multiple levels of recursion is going to get complicated
** Two solutions
*** First solution, who cares about (x,y), all they really care about is which one the curve visits first. So, if you shift (x,y) in a way that does not change the order in which the (x,y) points are visited by a Hilbert curve, then you're perfectly fine.
*** Second solution, if you're allowed to change (x,y) coordinates and rescale, you can also apply the reflections on the fly on the data in this quadrant, before passing them on to the next level of recursion. None of the levels need to know about any other levels.
** Summary - apply the first -> second recursion step to your square/quadrant. Then apply a rotation/transformation to each of the four resulting squares so that each one is re-oriented (back to "normal" position - such the the curve travels from southwest to northwest to northeast to southeast). Keep them in order in-place. Repeatedly apply this procedure.


More generally, we can preserve distances between points by sorting points
=TOC2=
along a space-filling curve.


(More information is needed here.)
* Table of Contents
 
* The Problem
** Motivation
** Peano Curves and Hilbert Curves
** Constructing the Hilbert Curve
** Problem Description
* The Solution
** Solution Analysis: Recursion
** Solution Analysis: Rotation
** Solution Algorithm
** Solution Pseudocode
** Solution Code
* References
 
==Notes==
 
Motivating problem:
* How to organize point info so that (x,y) points are organized by relative distance, without using a quadratic algorithm?
* Suppose solving problem with (x,y) points
* Set of points, transforming, etc,whatever
* Given a random point, find nearest/enclosing rectangles
* PCA
* Want nearby/neighboring points in space to be nearby in storage
 
Motivation:
* Just storing points, as rectangles, to operate on the rectangles as a set. Want points in a given rectangle to live nearby in memory
* Now, how to ensure the points on the grid, as stored, will be conducive to any particular arrangement?
 
Take one step further:
* Lots of dense, lots of local points, lots of empty spcae
* Distributing points to different nodes
* Parallel algorithm implementation
* Chopping up data and sending it off, then receiving it back
* Priority queue implementation: the closer a point is to other points, the less it contributes to the overall cost of that "chunk" of work or points
* The tradeoff: amount of intensive, in-CPU work versus amount of extensive communication overhead
* This is going a bit far though
 
Lagrangian particle simulation:
* Particlees on a grid at (x,y)
* Organize particles for a ''turbulent'' fluid-particle interaction simulation
* Locally cluster particle info in memory to farm out to processors
* How to allocate bunches of particles to a quadrant or a group of 8 processors
* Furl the 2D/3D points into a 1D line using a Hilbert Curve
* Arrange points in order visited, so linear closeness indicative of spatial closeness
* Then, can walk along that 1D array of data and snip it at chunks of X entries
 
Constructing a Hilbert Curve:
* First step: take yer square
* Second step: quadruple it
* Third step: rotate bottom L, bottom R via diagonal reflection
* Fourth step: That's yer next square.
 
=Background=
 
==Problem Setup and Motivation==
 
Consider a problem in which we are trying to store/pack up spatial data in an efficient way. We need a way of sorting the data in such a way that (x,y) data that are located in nearby regions to one another will live closer in memory to one another. This speeds up access to the data in memory.
 
If we sort by, say, x-coordinates, and let y-coordinates break ties, then we end up with points with very similar x values but dissimilar y values. These are inefficient and spread out the points in memory. (x,y) data points that are far away in space end up being neighbors in memory, and vice-versa.
 
Using Euclidean distance is another possible approach, but this does not utilize direction, and so you can again end up with points that are similar distances from a reference point, but in opposite directions, leading to data points that are far away in space becoming neighbors in memory.
 
If we imagine this problem getting much larger - for example, if the set of (x,y) data points becomes millions of points on a map - the Hilbert sort provides a way to pack up data in a way that preserves local, spatial structure. The closer two given points are in memory, the shorter the interval time between visits by the space-filling curve.


==Space Filling Curves==
==Space Filling Curves==
Line 31: Line 105:
Mathematician Giuseppe Peano was the first to discover the space filling curves.
Mathematician Giuseppe Peano was the first to discover the space filling curves.
His Peano curve is an example of a triplet curve that is scaled down by 1/3 on both
His Peano curve is an example of a triplet curve that is scaled down by 1/3 on both
sides and repeated 9 times.
sides and repeated 9 times:
 
[[Image:PeanoCurve.png|500px]]


David Hilbert then expanded on the idea with his Hilbert Curve in a paper  
David Hilbert then expanded on the idea with a new curve in a paper  
published in 1890.
published in 1890, subsequently called the Hilbert curve.


==Hilbert Curve==
==Hilbert Curve==
[[Image:HilbertCurve.png|500px]]


The Hilbert Curve is a particular space-filling curve invented by
The Hilbert Curve is a particular space-filling curve invented by
Line 43: Line 121:
influential mathematician.
influential mathematician.


Hilbert proposed the construction of a space-filling curve based on
Hilbert constructed a curve by bending a line at two points. Starting with this simple shape,
bending a line at two additional points. By starting with a squared-off U shape,
the curve is shrunk by a factor of 2, and repeated four times. Two of the curves are rotated,
the curve is repeated by shrinking it in half on both sides, and repeating it 4 times,
and each of the four curves are connected together.
connecting each of the four curves.  


The Hilbert Curve can be drawn in a region bounded by the four points (0,0), (0,S), (S,0), (S,S).
The Hilbert Curve can be drawn in a region bounded by the four points (0,0), (0,S), (S,0), (S,S).
Construct it by splitting the square into four quadrants meeting at (S/2, S/2). Recursively fill them
Construct it by splitting the square into four quadrants meeting at (S/2, S/2). Recursively fill them
with a rotated and scaled copy of the curve.
with a rotated and scaled copy of the curve.
===einer Linie auf ein Flächenstück===
From original paper: [[File:HilbertCurvePaper.pdf]]
[[Image:HilbertCurveOrig.png|500px]]
[[Image:HilbertCurveOrig2.png|500px]]
{{Quote|
Abstract (translated via Google Translate):
Peano has recently shown in the Mathematical Annals, 2 by an arithmetical observation, how the points of a line can be mapped continuously to the points of a surface part. The functions required for such a mapping can be produced more clearly by using the following geometrical view. Let us divide the line to be represented-about a straight line of the length 1-into four equal parts 1, 2, 3, 4, and the surface which we assume in the form of a square of the side length 1 Straight into 4 equal squares 1, 2, 3, 4 (Fig. 1). Secondly, we divide each of the partial sections 1, 2, 3, 4 again into 4 equal parts so that we obtain on the straight the 16 partial sections 1, 2, 3, ..., 16; At the same time, each of the 4 squares 1, 2, 3, 4 is divided into 4 equal squares, and the numbers 1, 2, ..., 16 are then written to the resulting 16 squares, That each successive square follows the previous one with one side (Fig. 2). If we think of this method, as shown in Fig. 3, the next step, it is easy to see how to assign a single definite point of the square to any given point of the line. It is only necessary to determine the partial stretches of the line to which the given point falls. The squares indicated by the same numbers are necessarily in one another and include a certain point of the surface piece in the boundary. This is the point assigned to the given point. The image thus found is unambiguous and continuous, and vice versa, each point of the square corresponds to one, two, or four points of the line. Moreover, it appears remarkable that, by a suitable modification of the partial lines in the squares, a clear and continuous representation can easily be found, the reversal of which is nowhere more than three-fold.
- David Hilbert, "Über die stetige Abbildung einer Linie auf ein Flächenstück", Mathematische Annalen Vol 38
}}
=Problem Statement=


Given (x,y) coordinate locations, sort them according to when  
Given (x,y) coordinate locations, sort them according to when  
Line 59: Line 154:
==Solution Approach==
==Solution Approach==


The Hilbert Curve sort problem can be solved by observing a simple fact:
A few keys to successfully solving the Hilbert Curve problem:
the coarsest level of quadrant organization takes precedence over
* The motive for the Hilbert Curve is that data is grouped by how close it is spatially; if a point is in the lower left quadrant, it will always be visited before another point in the upper right quadrant. This means that ''all'' points in the lower left quadrant will always be visited before ''all'' points in the upper right quadrant.
finer grained levels of quadrant organization. So, if one point is in
* The rotation is the tricky part of the problem, mainly because when we rotate the curve, we use the rotated version of the curve as the starting point for any additional calls in that quadrant. So, the reflections have to recurse with us.
the southwest corner and one point ins in the northeast corner,
* The curves are not simply rotated 90 degrees, the curves are actually reflected - flipped and rotated about a diagonal axis (going from one corner to another). This affects subsequent rotations.
then no matter what iteration it takes for the Hilbert Curve to  
 
visit those two points, we definitely know that the curve will visit
==Implementation Strategy==
the southwest corner before it visits the northeast corner.
 
The implementation strategy is, obviously, recursive. What we want to do at each level is:
* Start with the Hilbert curve contained in a square.
* Cut the square under consideration into four quadrants.
* Apply a transformation to each square so that it is re-oriented in a manner that matches our original Hilbert curve.
* Once each of those squares goes through all of its respective recursive calls, it will return a sorted list of points. Then we will know what to do - we collect each of the sorted points from each of the four quadrants in order, maintain that order, and return those sorted quadrants.
 
To nail down the details, treat the square under consideration as ranging from (0,0) to (S,S).
* Each time we cut a square into quadrants, we re-orient ourselves as to where (0,0) is located and which quadrants will be visited in which order.
* If we are in the lower left quadrant, x is below S/2 and y is below S/2, so we rotate and reflect by swapping x and y.
** X -> Y
** Y -> X
* If we are in the upper left quadrant, x is below S/2, y is above S/2, so subtract S/2 from y and we're done
** X -> X
** Y -> Y-(S/2)
* If we are in the upper right quadrant, x is above S/2, y is above S/2, so subtract S/2 from both
** X -> X - S/2
** Y -> Y - S/2
* If we are in the lower right quadrant, our x and y values are now relative to the quadrant bounding box. The distance to the top of the bounding box to the y coordinate becomes our new x coordinate, while the distance from the right of the bounding  box S to the x coordinate becomes our new y coordinate:
** X -> S/2 - Y
** Y -> S - X


This begins to take on the characteristics of a recursive method:
Recursion always requires a base case and a recursive case.
our "base case" is the simple comparison of one or no points in  
Our "base case" is the simple comparison of one or no points in  
each of our four quadrants. If we get to this base case, we know the  
each of our four quadrants. If we get to this base case, we know the  
order in which the Hilbert Curve will visit each of those points.  
order in which the Hilbert Curve will visit each of those points.  
Line 76: Line 191:
the order in which those points go with an additional level of finer granularity.
the order in which those points go with an additional level of finer granularity.


==Implementation Details==
==Bookkeeping==
 
The biggest challenge to implementing this is dealing with the rotation
of the curve for the southwest and southeast corners.
 
There are a couple of ways to handle the rotation - there's a messy conditional approach,
and a more elegant approach that transforms the (x,y) data in a bin, rather than trying to
transform the bin to fit the (x,y) data.


===Conditional Approach===
===Conditional Approach===
Line 115: Line 223:
For example, the southwest corner is always inverted and rotated
For example, the southwest corner is always inverted and rotated
90 degrees. So rather than trying to account for that inversion and rotation
90 degrees. So rather than trying to account for that inversion and rotation
when looking at the (southwest, northwest, northeast, and southeast  
when looking at the (southwest, northwest, northeast, and southeast) quadrants
of subsequent quadrants, instead we can just
transform each (x,y) point so that the order in which the (correct, transformed) Hilbert curve
would visit them is preserved as the order in which the (now transformed back to its original shape)
Hilbert curve would visit them.
 
 
 
==Examples==


[[Image:HilbertSortRotation.jpg|500px]]


[[Image:HilbertSortRotation3.jpg|800px]]


==Flags==


{{CSFlag}}


[[Category:Sorting]]


[[Category:CS]]
[[Category:CS]]
Line 127: Line 248:
[[Category:Math]]
[[Category:Math]]
[[Category:Competitive Programming]]
[[Category:Competitive Programming]]
[[Category:Final Project]]

Latest revision as of 22:16, 27 June 2017

TOC1

This page covers an algorithm for implementing a Hilbert Sort, which sorts points according to the order in which they would be visited by a space-filling curve.

  • Introduction
    • Motivation - why sort via space-filling curve?
    • Applications of space-filling curves
    • Figures
  • Problem statement - w's
    • What is the problem, focus on the inputs
    • Explain that we are using an already-drawn curve as a grid, not worrying about the curve itself.
    • Grid dictated by number of unique points. Can make them arbitrarily close (doubles).
  • Solution
    • Solution approach and algorithmic thinking
    • Quadrant, spatial approach
    • Recursive method
    • Two problems
      • First problem, dividing up space and keeping track of coordinates, is going to get messy
      • Second problem, trying to keep track of rotation/reflection across multiple levels of recursion is going to get complicated
    • Two solutions
      • First solution, who cares about (x,y), all they really care about is which one the curve visits first. So, if you shift (x,y) in a way that does not change the order in which the (x,y) points are visited by a Hilbert curve, then you're perfectly fine.
      • Second solution, if you're allowed to change (x,y) coordinates and rescale, you can also apply the reflections on the fly on the data in this quadrant, before passing them on to the next level of recursion. None of the levels need to know about any other levels.
    • Summary - apply the first -> second recursion step to your square/quadrant. Then apply a rotation/transformation to each of the four resulting squares so that each one is re-oriented (back to "normal" position - such the the curve travels from southwest to northwest to northeast to southeast). Keep them in order in-place. Repeatedly apply this procedure.

TOC2

  • Table of Contents
  • The Problem
    • Motivation
    • Peano Curves and Hilbert Curves
    • Constructing the Hilbert Curve
    • Problem Description
  • The Solution
    • Solution Analysis: Recursion
    • Solution Analysis: Rotation
    • Solution Algorithm
    • Solution Pseudocode
    • Solution Code
  • References

Notes

Motivating problem:

  • How to organize point info so that (x,y) points are organized by relative distance, without using a quadratic algorithm?
  • Suppose solving problem with (x,y) points
  • Set of points, transforming, etc,whatever
  • Given a random point, find nearest/enclosing rectangles
  • PCA
  • Want nearby/neighboring points in space to be nearby in storage

Motivation:

  • Just storing points, as rectangles, to operate on the rectangles as a set. Want points in a given rectangle to live nearby in memory
  • Now, how to ensure the points on the grid, as stored, will be conducive to any particular arrangement?

Take one step further:

  • Lots of dense, lots of local points, lots of empty spcae
  • Distributing points to different nodes
  • Parallel algorithm implementation
  • Chopping up data and sending it off, then receiving it back
  • Priority queue implementation: the closer a point is to other points, the less it contributes to the overall cost of that "chunk" of work or points
  • The tradeoff: amount of intensive, in-CPU work versus amount of extensive communication overhead
  • This is going a bit far though

Lagrangian particle simulation:

  • Particlees on a grid at (x,y)
  • Organize particles for a turbulent fluid-particle interaction simulation
  • Locally cluster particle info in memory to farm out to processors
  • How to allocate bunches of particles to a quadrant or a group of 8 processors
  • Furl the 2D/3D points into a 1D line using a Hilbert Curve
  • Arrange points in order visited, so linear closeness indicative of spatial closeness
  • Then, can walk along that 1D array of data and snip it at chunks of X entries

Constructing a Hilbert Curve:

  • First step: take yer square
  • Second step: quadruple it
  • Third step: rotate bottom L, bottom R via diagonal reflection
  • Fourth step: That's yer next square.

Background

Problem Setup and Motivation

Consider a problem in which we are trying to store/pack up spatial data in an efficient way. We need a way of sorting the data in such a way that (x,y) data that are located in nearby regions to one another will live closer in memory to one another. This speeds up access to the data in memory.

If we sort by, say, x-coordinates, and let y-coordinates break ties, then we end up with points with very similar x values but dissimilar y values. These are inefficient and spread out the points in memory. (x,y) data points that are far away in space end up being neighbors in memory, and vice-versa.

Using Euclidean distance is another possible approach, but this does not utilize direction, and so you can again end up with points that are similar distances from a reference point, but in opposite directions, leading to data points that are far away in space becoming neighbors in memory.

If we imagine this problem getting much larger - for example, if the set of (x,y) data points becomes millions of points on a map - the Hilbert sort provides a way to pack up data in a way that preserves local, spatial structure. The closer two given points are in memory, the shorter the interval time between visits by the space-filling curve.

Space Filling Curves

Space filling curves are curves that are capable of "filling up" a finite area with longer and longer distances.

We can think of space-filling curves as being parameterized on a numerical parameter that controls the amount of curvature or the degree of "packing" of the curve.

The way that space-filling curves are constructed is to create a pattern, then scale it down and repeat it, attaching the subsequent scaled-down, repeated curves.

Mathematician Giuseppe Peano was the first to discover the space filling curves. His Peano curve is an example of a triplet curve that is scaled down by 1/3 on both sides and repeated 9 times:

PeanoCurve.png

David Hilbert then expanded on the idea with a new curve in a paper published in 1890, subsequently called the Hilbert curve.

Hilbert Curve

HilbertCurve.png

The Hilbert Curve is a particular space-filling curve invented by David Hilbert, a famous mathematician who lived around the turn of the 20th century and is recognized as a universally influential mathematician.

Hilbert constructed a curve by bending a line at two points. Starting with this simple shape, the curve is shrunk by a factor of 2, and repeated four times. Two of the curves are rotated, and each of the four curves are connected together.

The Hilbert Curve can be drawn in a region bounded by the four points (0,0), (0,S), (S,0), (S,S). Construct it by splitting the square into four quadrants meeting at (S/2, S/2). Recursively fill them with a rotated and scaled copy of the curve.

einer Linie auf ein Flächenstück

From original paper: File:HilbertCurvePaper.pdf

HilbertCurveOrig.png

HilbertCurveOrig2.png


Abstract (translated via Google Translate):

Peano has recently shown in the Mathematical Annals, 2 by an arithmetical observation, how the points of a line can be mapped continuously to the points of a surface part. The functions required for such a mapping can be produced more clearly by using the following geometrical view. Let us divide the line to be represented-about a straight line of the length 1-into four equal parts 1, 2, 3, 4, and the surface which we assume in the form of a square of the side length 1 Straight into 4 equal squares 1, 2, 3, 4 (Fig. 1). Secondly, we divide each of the partial sections 1, 2, 3, 4 again into 4 equal parts so that we obtain on the straight the 16 partial sections 1, 2, 3, ..., 16; At the same time, each of the 4 squares 1, 2, 3, 4 is divided into 4 equal squares, and the numbers 1, 2, ..., 16 are then written to the resulting 16 squares, That each successive square follows the previous one with one side (Fig. 2). If we think of this method, as shown in Fig. 3, the next step, it is easy to see how to assign a single definite point of the square to any given point of the line. It is only necessary to determine the partial stretches of the line to which the given point falls. The squares indicated by the same numbers are necessarily in one another and include a certain point of the surface piece in the boundary. This is the point assigned to the given point. The image thus found is unambiguous and continuous, and vice versa, each point of the square corresponds to one, two, or four points of the line. Moreover, it appears remarkable that, by a suitable modification of the partial lines in the squares, a clear and continuous representation can easily be found, the reversal of which is nowhere more than three-fold.

- David Hilbert, "Über die stetige Abbildung einer Linie auf ein Flächenstück", Mathematische Annalen Vol 38


Problem Statement

Given (x,y) coordinate locations, sort them according to when the Hilbert Curve visits them. If the number of nodes on the grid, S, is odd, then the curve will not intersect itself and each integer point will only be visited once.

Solution Approach

A few keys to successfully solving the Hilbert Curve problem:

  • The motive for the Hilbert Curve is that data is grouped by how close it is spatially; if a point is in the lower left quadrant, it will always be visited before another point in the upper right quadrant. This means that all points in the lower left quadrant will always be visited before all points in the upper right quadrant.
  • The rotation is the tricky part of the problem, mainly because when we rotate the curve, we use the rotated version of the curve as the starting point for any additional calls in that quadrant. So, the reflections have to recurse with us.
  • The curves are not simply rotated 90 degrees, the curves are actually reflected - flipped and rotated about a diagonal axis (going from one corner to another). This affects subsequent rotations.

Implementation Strategy

The implementation strategy is, obviously, recursive. What we want to do at each level is:

  • Start with the Hilbert curve contained in a square.
  • Cut the square under consideration into four quadrants.
  • Apply a transformation to each square so that it is re-oriented in a manner that matches our original Hilbert curve.
  • Once each of those squares goes through all of its respective recursive calls, it will return a sorted list of points. Then we will know what to do - we collect each of the sorted points from each of the four quadrants in order, maintain that order, and return those sorted quadrants.

To nail down the details, treat the square under consideration as ranging from (0,0) to (S,S).

  • Each time we cut a square into quadrants, we re-orient ourselves as to where (0,0) is located and which quadrants will be visited in which order.
  • If we are in the lower left quadrant, x is below S/2 and y is below S/2, so we rotate and reflect by swapping x and y.
    • X -> Y
    • Y -> X
  • If we are in the upper left quadrant, x is below S/2, y is above S/2, so subtract S/2 from y and we're done
    • X -> X
    • Y -> Y-(S/2)
  • If we are in the upper right quadrant, x is above S/2, y is above S/2, so subtract S/2 from both
    • X -> X - S/2
    • Y -> Y - S/2
  • If we are in the lower right quadrant, our x and y values are now relative to the quadrant bounding box. The distance to the top of the bounding box to the y coordinate becomes our new x coordinate, while the distance from the right of the bounding box S to the x coordinate becomes our new y coordinate:
    • X -> S/2 - Y
    • Y -> S - X

Recursion always requires a base case and a recursive case. Our "base case" is the simple comparison of one or no points in each of our four quadrants. If we get to this base case, we know the order in which the Hilbert Curve will visit each of those points.

If we are not at the base case, if we have a large number of points to sort, we can bin together all the points in a given quadrant, and consider the order in which those points go with an additional level of finer granularity.

Bookkeeping

Conditional Approach

The first way is to look at two different levels when binning points into quadrants; that allows you to say, "if point A is in the southeast quadrant on this level, and was in the southwest quadrant on the prior level, then it should go second, but if it was in the northwest quadrant on the prior level, then it should go fourth, etc."

This is clunky and awkward to code, and you wind up with a bunch of nested if/else statements.

Transform the Data, Not the Bin

The second way is to say, okay, when we bin these (x,y) points into their respective quadrants, the only thing we care about is what order they're visited by the Hilbert Curve, and that only depends on what quadrant they're in, so we can actually change the (x,y) values to be anything we want - as long as we don't change the order in which they would be visited by the Hilbert curve. Which is to say, which quadrants they're in.

This technique can be thought of as transforming the (x,y) data associated with each point to fit the bin structure, rather than transforming the bin structure to fit the (x,y) data.

For example, the southwest corner is always inverted and rotated 90 degrees. So rather than trying to account for that inversion and rotation when looking at the (southwest, northwest, northeast, and southeast) quadrants of subsequent quadrants, instead we can just transform each (x,y) point so that the order in which the (correct, transformed) Hilbert curve would visit them is preserved as the order in which the (now transformed back to its original shape) Hilbert curve would visit them.


Examples

HilbertSortRotation.jpg

HilbertSortRotation3.jpg

Flags





See also: