## Why?

I'm writing the blog because searching has not helped me on this. Why do I need to know? well the answer is I am applying a Ramer–Douglas–Peucker algorithm to GPS tracking data.

You can look it up, but it is pretty simple to explain. Imagine you have a set of points starting from A and going to B - in my case it is GPS log data, so points on a map, every second, so lots of points. You want to reduce how many points you have without losing significant information.

Ideally all the intermediate points are on a straight line A to B, and can all be removed leaving just A and B. I practice you allow a tolerance, an allowable *deviation* from the line. If all of the intermediate points are within that tolerance you can delete them all. If not, you find the most deviant point, call it C. C is effectively a *corner* point. You apply the algorithm on A to C and C to B recursively.

In the end you have a set of points, which could just be A and B, or A, C, and B, or more. If you join the dots with straight lines you can be sure all of the removed points lay within your tolerance of that line, and so be sure any *detail*, that is more than the tolerance allowed, has not been missed.

I use this years ago on GPS data.

Before |

After |

## Point to line

The crux of the algorithm is working our the deviation from the line, what wikipedia helpfully just refers to as perpendicularDistance().

## Two dimensions

In two dimensions this is simple, and searching finds an algorithm. I used this on latitude and longitude. But this causes problems.

- Latitude and longitude are not a flat plane. This is not likely to be a real issue unless tracking a plane (pun intended) or a ship over a long distance.
- Latitude and longitude are in degrees not metres, so working out your distance is not easy. A degree North is not the same distance as a degree West and changes depending one where you are. You need trigonometry to compensate and convert to distance units.

## Three dimensions

Of course, we also have altitude, and so seems sensible to include that - why not. Algorithms for a point to a line in 3D are also easy to find, but there seem to be several different approaches. At least altitude is already in metres.

However, the GPS can provide not just lat/lon/alt, but also ECEF data. This is simple Cartesian X/Y/Z co-ordinates from centre of the Earth. This avoids any need to adjust from degrees, and makes it simple.

## Four dimensions

I have to push it one step further, don't I? If you travel in a straight line, but stop for a while, you would remove all points. By including time as a 4th dimension we create a *corner* when speed changes notably or you stop for a while.

It does mean you have to *scale* time to a distance, but that can be a simple parameter in the algorithm.

However, there is a snag - finding an algorithm for point to line in four dimensions is not simple. The only ones I could find explain it in vector maths - and even with an A in A-level maths, it was a long time ago and I really did not get on with vector maths. Nobody seems to give me a simple algebraic answer which is driving me mad.

## Making it simple

So, back to basics, I am interested in three points, and regardless of how many dimensions that is, they can be said to lie on a flat two dimensional plane. This means working out the deviation is a simple matter of considering a triangle.

So I want to find *h*. To do this I need distances *a*, *b*, and *c*. But that is easy to find as it is simple Pythagoras, regardless of how many dimensions. For two dimensions it is √((Ax-Bx)²+(Ay-By)²). For three dimensions it is √((Ax-Bx)²+(Ay-By)²+(Az-Bz)²), and so on.

Knowing *a*, *b*, and c I can use Heron's formula to find area...

*s*= ½(*a*+*b*+*c*)- 𝔸 = √(
*s*(*s*-*a*)(*s*-*b*)(*s*-*c*))

Then use the formula for area from base and height

- 𝔸 = ½
*hb* *h*= 2𝔸/*b**h = 2*√(*s*(*s*-*a*)(*s*-*b*)(*s*-*c*))/b

So it can be done with any number of dimensions.

## Faster

One of the considerations is performance when coding for a microcontroller. They can do integer add/subtract very fast, multiply reasonably fast, division slower, and floating point even slower. As it happens the ESP32 has hardware single precision floating point, which helps.

So can I simplify this?

Thankfully wikipedia helpfully expands Heron's formula to:

- 𝔸 = ¼√(4
*a*²*b*²-(*a*²+*b*²-*c*²)²)

Which gives us:

*h*= 2(¼√(4*a*²*b*²-(*a*²+*b*²-*c*²)²)/b)*h*= ½√(4*a*²*b*²-(*a*²+*b*²-*c*²)²)/b*h*² = (4*a*²*b*²-(*a*²+*b*²-*c*²)²)/4*b*²

Working our *h*² is useful as ultimately I am just comparing this distance so no need to apply a square root. I can square my reference threshold.

*a*²,

*b*², and

*c*². This means that I don't need to use a square root to work out

*a*,

*b*, and

*c*in the first place. The whole lot is simple addition, subtraction, multiplication, and just one division.

## Turning back

There is another problem with all of the perpendicular distance algorithms, including this one. The all work out a distance of a point to an *infinite* line which is defined as going through two points.

In this case point C is way off the line A-B, in fact it is

*c*off it, but

*h*is small. If

*h*is within the tolerance then it would be removed leaving only A-B.

To solve this I can use the fact that if *c*²-*b*² is > *a*² the C must be *beyond* A and I should use *a*² as the distance².

Similarly if

*a*²-

*b*² >

*c*² then C is

*beyond*B, meaning I should use

*c*² as the distance².

Fortunately I have *a*², *b*², and *c*² already, so is a simple test.

Of course, what if you have a straight line A-B-C-D and travel A-B-C-B-C-D? No points will be "outside" the A-D range... Well the use of time as a 4th dimension catches that - which means the first case would also have been caught... Even so, this simple test on end points is worth doing I think.

## C what I mean...

inline float dist2 (fix_t * A, fix_t * B)

{

float X = x (A) - x (B);

float Y = y (A) - y (B);

float Z = z (A) - z (B);

float T = t (A) - t (B);

return X * X + Y * Y + Z * Z + T * T;

}

float b2 = dist2 (A, B);

fix_t *m = NULL;

float best = 0;

for (fix_t * C = A->next; C && C != B; C = C->next)

{

float h2 = 0;

float a2 = dist2 (A, C);

if (b2 == 0)

h2 = a2; // A/B same, so distance from A

else

{

float c2 = dist2 (B, C);

if (c2 - b2 >= a2)

h2 = a2; // Off end of A

else if (a2 - b2 >= c2)

h2 = c2; // Off end of B

else

h2 = (4 * a2 * b2 - (a2 + b2 - c2) * (a2 + b2 - c2)) / (b2 * 4);

}

if (m && h2 <= best)

continue;

best = h2;

m = C;

}

I wrote some CAM software that simplified toolpaths in a similar way, but instead of using the perpendicular distance from the line, I measured the angle in the corner, and said we could omit the middle point if the angle was small enough. That would seem to solve the "turning back" problem, maybe it would be a useful idea for you.

ReplyDeleteInteresting. I have solved anyway for off end of A and B, but also by using time, even a straight line A-B-C-B-C-D would pick up that the distance and time are not clean so see a "corner" at C and second B.

ReplyDeleteI had to do this for an in car GPS at one point. Did a proper distance calc (more or less convert to radians and multiply by the radius of the earth, but that bit I copied from the internet.. took a few goes to find one that actually worked in the real world) and added in curve compensation for speed (basic trig if you treat a curve as more or less a triangle over short distances) and height change compensation (again just a triangle).. got it pretty accurate.

ReplyDelete