### Reducing the number of points

One of the challenges is managing how often we sample location, and if/how we reduce the number of points. There are a number of reasons to do this:-#### Local Storage

One reason to reduce the number of points stored is where the tracker is storing the points - either for download later or for transmission when mobile coverage comes back, etc. The local storage of a small embedded device is likely to be limited. The GPS module I am using has several ways to do this (I have yet to work out how to do other than simple*interval*). Storing every second, which it will happily do, runs out of space in a few hours.

#### Transmission

Transmitting the data over a mobile network has costs. Yes, there are SIM packages with unlimited data, but if you have a fleet of hundreds of vehicles you can do better than each of them on a package for tens of pounds a month and unlimited data. There are packages with much lower monthly costs but data usage costs, and there are foreign SIM packages with roaming for better coverage and higher data costs. Reducing the data transmission can get the costs down to pence a month. This means not only reducing the number of points but finding ways to reduce the overhead of transmission of the data itself.My plan is to pack data in to an encrypted UDP packet, with a simple resend packet sent back if the data has not arrived at the expected time. I.e. an entirely NAK based protocol so that normally it is one packet in one direction, probably every 10 minutes or so. This means really low mobile data usage.

#### Database storage

Reducing the number of points also helps with back-end storage. Hundreds of vehicles tracked every second means a lot of data. Simply increasing the interval loses useful data so you ideally need a smarter way to reduce the data.#### Display

Perhaps a small point, with computers and networks being so fast, but reducing the number of points sent to a browser to display a route can make things much more responsive.### Where to reduce points

At present I have code to reduce the points when pulling from a database and sending to a browser, with an option to also delete the removed points from the database. But ideally you want to do this in the tracker itself - that way you have all of the benefits for local storage and transmission as well.This also creates an extra advantage - if you have a means to reduce data points effectively you can start by collecting much more data in the tracker. The GPS modules I am using can do 1/10 second fixes, so why not start with this, and reduce points from there.

### How to reduce points

I pondered how to do this, and came upon with something based on removing points within a certain distance. I then found there is a standard algorithm which is basically very similar to what I came up with. The Ramer Douglas Peucker algorithm.

It recursively takes a set of points. If all intermediate points are within a specified margin of the straight line between start and end, then those intermediate points are removed. If not, the one furthest away is kept and the point from start to this, and from this to end, are recursively processed. The end result is a path of straight lines where all the removed points must have been within the specified margin of that line.

This works well, and removes points on long straight roads but keeps points driving around a roundabout!

Even setting a margin as small as 1m massively reduces the number of points, and what is even better is that you can start with say 10 times as many points (e.g. 1/10 second fixes) and still end up with similar final number - just that the points you select are more precise (i.e. the exact edge of a turn, not just nearest second).

Of course I won't have a full set of points in the tracker, I'll have to work on time periods, like 10 minutes, collect data, and then apply the algorithm. This means there will be points based purely on time as the end points for the chunks of data I process.

### Distances

One issue is that the raw data is lat/lon and not ground distances. There are ways to work out exact ground distances around the curved surface of the Earth, but a simpler way is to just assume we can map lat/lon on to a flat plane. What I do is, for each stage of the algorithm, take all lat/lon relative to the middle, and convert to metres. For latitude this is simply multiple my 111111 (yes, apparently this comes from French definition of a metre). For longitude it is 111111*cos(lat), remembering to convert to radians for cos(). This means I can then work out distance from a line in metres with minimal error. Of course as the algorithm recurses and processes smaller line segments, the error because the Earth is not flat (really, it is not!) reduces.

### 3D

For a change, I am not talking of 3D printing, but we have altitude as well. So I am including altitude in the calculations. This means that a straight road that has a peak hill gets a point kept at the top of the hill!

### 4D

Yes, 4D printing would be cool, but no, I am now including time in the algorithm. That way we get points if a vehicle stops and starts or changes speed. The only issue here is deciding how time works as a unit in the distance calculation, so I need a metres/second. I have made this small, treating a 1 second deviation from the line between two points in space and time as a small fraction of a metre, hence you have to change speed or stop for many seconds to create a point you keep. It seems to work well.

I’m using the ST901 hardware with a custom backend to collect the data. They cost under £15 and have battery backup. Data wise 2 vechicles cost me about 1mb data a month.

ReplyDeleteYou should look at the The Things Network for data connectivity. https://www.thethingsnetwork.org/community/reading/

ReplyDeleteI would love for a+a to do a tracker SIM package (SMS and data only), but I'm guessing there's not going to be much money in it?

ReplyDeleteI guess I should have looked at the product page before commenting. 2p/mb would work well for a tracker ðŸ¤¦♂️

DeleteThis sounds really interesting!

ReplyDeleteI'd love to see how the Ramer Douglas Peucker algorithm performs in built-up areas where the GPS signal suffers from multi-path interference and the point jumps around a lot. It seems like an effective way to reduce the resolution of the points when you have a good fix tho'.

As you say, when moving around roundabouts and bends, you need more points and this algorithm seem to the optimal number within your tuning parameter.

The data rate for transmission is a different tradeoff tho'. Assuming you have the minimum number of points, you *still* want to use the absolute minimum bandwidth to transmit them, even if there happen to be a lot of them in a given time.

Assuming that the change in location over time is "small" (which it is for something moving in real life, even if the fix is reasonably unstable), you can represent the points using a delta encoding.

So 1,1 ; 2,2 ; 3,3 ; 4,2 ; 5;1 becomes 1,1 ; 1,1 ; 1;1, 1,-1 ; 1,-1. This significantly reduces the distinct number of symbols in the data and makes it very amenable to something like Huffman coding ( https://en.wikipedia.org/wiki/Huffman_coding (which I now see is in GCSC Computer Science: wow!))

Huffman coding is a very effective way of reducing reasonably predictable time-series data like you have to practically nothing in terms of data. It's the (theoretical) basis of the Lempel-Ziv-Markov compression algorithms and whilst you might not have enough space in the micro for a full-on compression codec you could implement some of the techniques yourself.

You can probably get the data down so small that you will struggle to fill a whole packet over the time period in which you might like to be receiving packets for display. Filling a packet is important because you don't want a significant percentage of your data transmission to be protocol headers!

Given that you have two way communication, perhaps you can teach the server to tell the devices that are "probably" inside the currently open viewing windows to send data more frequently, trading off latency for a short term increase in the data transmission cost of a small number of the vehicles.

Anyway, just a few ideas that I hope might inspire you!

As a reference point, I have here 1.6GiB of GPS log files in NEMA format, which is a "simple" ASCII encoding consisting of digits, uppercase letters and a bit of punctuation.

DeleteDebian's lzma command (i.e. just the compression algorithm, not the more modern xz implementation) gets a tar-file full of them down to 95MiB.

Coming to this blog post late, but I've been using Owntracks for some time. it's not been massively updated recently, but the back-end looks like it does most of what you have implemented. Not as much fun as creating a better implementation from scratch, though, of course! :)

ReplyDelete