Loading complex data with PostgreSQL JSON functions

Problem

With an extension of initialization data, where I added definitions of NPCs and Monsters and their spawn points to about 10 game maps, I noticed that the time to start the server went up dramatically - from ~13 seconds to over 1 minute. The reason was, that my strategy to load the whole game configuration at the start at the server led to almost 2000 SQL queries. As the configuration data is not yet complete, you can already guess that if nothing had been changed, the loading time would ramp up further, probably to 10 minutes or more.

Idea

I had created a GitHub issue and came up with a nice approach. The idea is to load the whole object graph of the GameConfiguration with just one query. PostgreSQL has functions to build JSON for the query result, even very complex ones with deeply nested subqueries. So I got the idea that I could build queries based on the entity framework meta model of the GameConfiguration class. The result of the query can then get de-serialized to a GameConfiguration object and if possible, added to the change tracker of the entity framework. EntityFrameworkCore supports adding objects (and their child objects) to it's context from the outside. And we could also tell the context that it should treat this data as 'unchanged' - so we could still do changes to the object and EF Core would figure out what needs to be changed on the database to save them. Best of both worlds, if all goes well :)

Implementation

SQL Query Building

To create the query, I pass a IEntityType (meta data about the entity type) of GameConfiguration to the JsonQueryBuilder. It will programmatically build up a complex SQL query which uses PostgreSQL JSON functions. The resulting query has a size of about 18 KB, so it's too big to show it in this blog post. It looks like this:

select result."Id" "$id", result."Id", row_to_json(result) as GameConfiguration
from (
    select a."Id", a.*, (
       -- additional subqueries which return json of collections and other navigation properties
    )
    from config."GameConfiguration" a
) result;

The process to create this query is actually pretty fast (~10 ms) and we could cache the result as the query would never change after the program has been started.
The result of the query is currently a JSON string which is about 21 MB in size. I suspect there is some garbage in it, but at the moment it's fast enough - room for improvement, however. The query returns the data in a little bit under 2 seconds on my system.

De-serializing

I'm using Newtonsoft.Json in version 9 to de-serialize the JSON back to an object. One de-serialize call takes about ~600 ms, which is also not so bad if you look at this amount of data. However, we need two calls to get the final result, to resolve all references.

Byte Arrays

I needed to write a custom JsonConverter to de-serialize byte arrays. Newtonsoft.Json expects a base64-encoded string. However, PostgreSQL gives us the data in a hexadecimal encoded string.

Reference resolving

To prevent that every object is serialized over and over again, I use $ref-objects to reference these objects by their id (GUID). To resolve these objects, there is a resolver. Whenever Newtonsoft.Json finds an object with "$id" as first property, it adds it to the resolver. However, it reads the JSON just one time in forward direction, so a reference to an object which was not yet added to the resolver yet, can't be resolved. The object model of GameConfiguration contains some circular references, so this is actually happening. The current solution is, that we de-serialize the JSON two times - this leads to other problems as you will see below.

Remaining problems

Change Tracker

We need to de-serialize the JSON of the GameConfiguration twice, because the document contains circular references which can't be resolved otherwise. The first run is just there to pick up all objects. In the second run these objects are used in the IdReferenceResolver to resolve the $ref-objects.
So in the end the resulting GameConfiguration might contain objects with the same id but being a different object instance. However, the change tracker of EF Core can only keep a look at one instance per id. As a result, an exception is thrown when you try to add two different instances with the same id.
So for now, the DbContext to load the game server configuration doesn't track changes anymore. That's not so much of a problem at the moment, as we don't modify it.

One solution could be to replace all instances of the first run, which still sit in the GameConfiguration object, by the "new" one which has been created in the second run.

Newtonsoft.Json is pretty limited in this particular case. It would be nice to be able to postpone reference resolving until the whole document has been read.

Serialization of back references

See also Issue #11 - when we fix this, we can expect further performance gains and we'll be able to use it to load Account objects. These are candidates for getting very complex, too.

Summary

The JSON functions of PostgreSQL are a great way to load complex object graphs with just one roundtrip to the database. We have some minor issues left, but I'm sure we can solve them. Additionally, there is still a lot of performance to gain because the query retrieves obviously too much data.
If we manage to optimize that, we keep all benefits of a relational database and still only need one roundtrip to load complex data - so we have no reason to switch to a document database, yet :)