How to compress map data by by 99.9998%


Unlike every other location-based game with Points of Interest, Weekend Space Command (Prototype) uses a built-in data set of about 8 million PoIs. Most of the big LBGs games you can name use some API from Google Maps, Mapbox, or handle their own copy of OSM data to sort out places worth going to. Most offline locative games just settle for using a grid and randomizing each square in it. So how does WSC(P) manage to fit a global usable data set in between the existing options of "everything from a server" and "nothing"? 


We start with OpenStreetMap and its full dataset, planet.osm. Using PraxisMapper to convert that into a usable format, the global OSM data set is about 1.2TB in size. Goal 1 is to trim this down into a set of meaningful Places of Interest. The ideal places to go to for a game should be publicly accessible, traversable, cost nothing to visit, and have something to see, do, or learn. We cannot guarantee all of these are true for each individual place in the data set, and some places may have a different balance of these factors.

The list of categories PraxisMapper and WSC(P) uses is: parks, nature reserves, universities, cemeteries, historical points, theaters,  arts centers, planetariums, libraries, public bookcases("Little Free Libraries"), aquariums, public artwork, attractions, galleries, museums, theme parks, viewpoints, zoos, named trails, concert halls, community centers, conference centers, exhibition centers, and events venues

This is a good set of data to have available. It gives us 8.3 million places to use for our game. We won't be able to navigate the players around turn by turn, and they won't have a lot of details to orient themselves with on a local image, but as far as places that the player should be encouraged to go, this is a solid set. We won't go into why each category is or isn't included, but these should all be worth visiting at some point.

If we set up PraxisMapper to only process places in these categories, the database is reduced to 11GB of storage space. We've cut out 99% of the unnecessary data, but this is still far too large for a mobile game to have built-in. We can do better. Goal 2 is to crush this down to under 1GB. That's what I would consider the upper limit of a manageable APK file for an Android game. Smaller is better, but that's the target.

Most of the storage space right now is a bunch of lat/lon coordinate pairs down to 7 decimal points. In text form, those look like "12.3456789,-98.7654231", and hit an average of 20 characters long written out. We can shorten those down significantly if we change those from global reference points to local points in a grid. We use PlusCodes all over PraxisMapper, so it's a matter of which level of the grid to use.

After some experimenting, using 6 digit PlusCodes as the basis for this local coordinate set is the right size. If each individual cell (pixel) is a 10-digit PlusCode, that becomes a 400x400 grid. For higher accuracy, we can use the rectangular 11 digit (1600x2000) or 12 digit (6400x10000) grids instead instead. For maximum drawing accuracy, we'll try converting the 12-digit plus codes into local coordinate points. This means the largest a single text coordinate pair can be is "6400,10000", or 10 characters, cutting the needed storage space in half. For this text format output, we will also strip out all the text tag data on each place except for the name (which we will keep only the default name entry) and its category (which will become a numeric id). We will also choose to favor size over readability as far as the properties of these JSON objects, and make names very short. Each object has 'nid' (name ID, absent if the place has no name), 'tid' (type ID, which category its in), and 'p' (points list, separated by | characters).

Saving this preserves the shape of each place perfectly, and cuts out a lot of unnecessary data. It also means we could pull in only the files needed for where the player is currently at, since things are now nicely broken down into the grid PraxisMapper games use. When we run this process for our specific Place of Interest set, the end results are about 2.5GB. That get us to 99.8% compression, and wouldn't be tough to have a webserver hold these files for a game client to download, but it's still too big to just ship built-in. We have to get a little more clever.

But first, for the record: If you do this current process to the full global set of data, and create files that a game client could draw directly that show OpenStreetMap-levels of detail and accuracy, you end up with 120GB of JSON data. That's a 90% drop in storage size for base data, but we're trying to get enough data in a phone-usable size. Zipped up, it gets cut down to 41G, which could now fit on a Bluray disc. You could probably run a server that holds and distributes this data without a lot of work, or make a game that can load that data from files the user downloaded, but we're trying to make an offline-only game in this example.

Back to the minimized data. If we reduce the accuracy of the coordinate pairs for drawing down to the 10 digit PlusCode level, that cuts the size down a little as the average coordinate point is now 7 letters long instead of 10, but we're still at almost 2GB after that change, so we'll have to try a new technique. We're going to abandon the idea of drawing accurate shapes, and work on a close-enough estimate instead. For this, we'll crop places to fit the current 6-digit PlusCode, store the center of each cropped place on the map as the coordinate pair for this 10 PlusCode (again, we're now using a 400x400 grid for drawing and tracking purposes at this point), and one more calculated value. We will get the area of the place's actual shape, and calculate the size of a circle with the same area. That radius will be rounded to the nearest whole number, and that is saved with the center point to create our estimated place. Every place, no matter how complicated it's actual shape is, will now be its name and 3 whole numbers between 0 and 199. We also replace 'p' in our object with 'c', for "center" and 'r' for 'radius'.

This plan does give us one big issue with a category we chose: trails. It's entirely reasonably to estimate the shape of a park, a building, or any closed shape as a circle. A line, however, is not at all close to correct if you made a circle that covered the entire thing. As a compromise for this, lines get saved as 2 places: The start and the end. Each one is treated as a single point and given a minimum radius value of 2 for gameplay purposes. In general, this works out well since if you're going to go out to a particular trail, you'll have to hit the start or end of it at some point.

To do a little more number crunching, we're going to store names in a table. This handles trails as mentioned above, plus some weird edge cases where a place is a MultiPolygon in the source data and shows up multiple times in this compressed estimated data, or if there are multiple places with the same name in the area we're processing. That's more likely to happen when including retail stores than our current list, but it gets treated the same way to minimize storage used.

At this point, we're down to just under 1GB of disk space but we're almost out of tricks. We COULD try and save this data into a binary format instead of JSON, but that adds a lot of complexity and makes the format impossible for humans to read (versus abbreviating the property names). Instead, we're just going to zip the data twice. As it turns out, this is actually a critical step in WSC for a different reason than you expect. The Godot engine on Android scans all of the files included in it's APK that aren't in the typical file formats used by Godot, and these offline data files have to be included in a way that gets them scanned because they're zipped. The game will spend a very long time scanning these files before anything happens that the user can see. If we zip each individual file, that's 26 million files to scan. Zipping up each 4 digit PlusCode's worth of files still mean there's 64,800 files to check, and that takes 90 seconds or so of waiting before the game will complete starting up. We zip the Cell4 files up, and then zip all of those into groups of 2 digit PlusCodes, leaving only 140 files to scan and a minimal pause before the game starts up as normal. The cost in complexity is that the code to read those files now has to unzip 2 files to get to the actual JSON data instead of 1.

The 2nd zip grouping doesn't give us any file space reduction, but the first set does reduce a lot of the JSON overhead: The end result is 280MB. 99.9998% smaller than our source data. That's an average of 34 bytes to store each individual point of interest on the globe. When we started, that could have held 2 of the lat/lon coordinate pairs making up the original shape. That's all there is to it: throwing out every single byte of stuff you don't need and reusing as much as you can. Our goal was not to recreate Google Maps inside of a game, our goal was to tell the player about interesting places to go.

And all of this is open source: the resulting data, the game using it, and PraxisMapper for processing it. There's room to customize the category list and see what it looks like, or to trim it down more aggressively and see if we can fit a game with some of this data into the Google Play Store's 250MB limit.

Files

WeekendSpaceCommandProtoB.apk 327 MB
18 days ago

Get Weekend Space Command (Prototype)

Download NowName your own price

Leave a comment

Log in with itch.io to leave a comment.