Skip to main content

How to Understand AppEngine Datastore Under the Hood: Part 1 - An Overview of the Underview

There are a lot of wrong perceptions about the datastore in Google AppEngine. People both familiar and foreign with AppEngine don't really understand what the datastore is. There is a deeper system underneath the nice API we are given. Understanding the guts can help us understand the skin. We may also find there are times when we must shed the skin for new clothing.

The biggest misconception about the datastore is the assumption that "kinds" are anything like "tables". You could use a set of entity kinds similar to the way you would use a set of tables, but they simply are different beasts, entirely. A table controls a strict requirement on the structure of its rows. Every entity, on the other hand, is free to hold any properties of allowed types. The published Model API is all an abstraction provided to give us a nice interface on top of an otherwise much looser foundation.

Many people would be very surprised to learn that a given kind doesn't actually require anything of its entities, but from the right angle it makes perfect sense. Meeting the kind of scalability requirements the datastore is designed for places interesting limitations. Schema changes can't get in the way when you could have such a large dataset that no operation can ever effectively operate on the entire set at once. This means what was a simple matter of ALTER TABLE in SQL is practically impossible in this new world, as the logistics behind updating and migrating potentially millions of entities to a new schema grows beyond the acceptable resources to give to a schema change. However, if we allow flexibility, we simply start creating new entities in the updated form and be sure that when we load one of the previous versions, we're prepared to use or upgrade it on the spot. For this and other reasons, allowing all entities to be free-form is the simplest direction to provide the foundation we need.

With a better understanding of our foundation we can better understand the abstractions in google.api.ext.db, with the Model subclasses most AppEngine developers know. I've seen quite a few people asking about migrating to changes in their db.Model subclasses, not understanding why or how their existing entities will change to match the newly defined properties. The behavior and how to work with it is a lot easier to understand when you view the individual entities are independent property bags, and not rows following a defined column schematic. We can also come to understand db.Expando as closer to the wire, so to speak, than its stricter Model cousin.

Perhaps a more exciting gain from this different view of the datastore is that we aren't bound by the published Model-centric API at all. In fact, we can access the underlying Entity class directly, providing us with a simple, persisted mapping object, without anything building on top of it. If we need some structure to our persistence, but the provided API simply isn't to taste, then an understanding of this layer gives us what we need to build our own variant datastore API. We may even use this understand to provide implementations compatible with previous ORM solutions, but powered by the entities and BigTable, rather than traditional SQL databases. The possibilities open up with our deeper understanding.

The more variation we have in what everyone is doing on AppEngine, the more value it has to all of us. Take this information and do some exciting. Share it and we're all reap the benefits.

Look for Part 2: The Raw Datastore API

Please vote on Reddit and/or Digg this article.


Popular posts from this blog

Why I Switched From Git to Microsoft OneDrive

I made the unexpected move with a string of recent projects to drop Git to sync between my different computers in favor of OneDrive, the file sync offering from Microsoft. Its like Dropbox, but "enterprise."

Feeling a little ashamed at what I previously would have scoffed at should I hear of it from another developer, I felt a little write up of the why and the experience could be a good idea. Now, I should emphasize that I'm not dropping Git for all my projects, just specific kinds of projects. I've been making this change in habit for projects that are just for me, not shared with anyone else. It has been especially helpful in projects I work on sporadically. More on why a little later.

So, what drove me away from Git, exactly?

On the smallest projects, like game jam hacks, I just wanted to code. I didn't want to think about revisions and commit messages. I didn't need branching or merges. I didn't even need to rollback to another version, ever. I just …

Respect and Code Reviews

Code Reviews in a development team only function best, or possible at all, when everyone approaches them with respect. That’s something I’ve usually taken for granted because I’ve had the opportunity to work with amazing developers who shine not just in their technical skills but in their interpersonal skills on a team. That isn’t always the case, so I’m going to put into words something that often exists just in assumptions.
You have to respect your code. This is first only because the nature and intent of code reviews are to safeguard the quality of your code, so even having code reviews demonstrates a baseline of respect for that code. But, maybe not everyone on the team has the same level of respect or entered a team with existing review traditions that they aren’t acquainted with.
There can be culture shock when you enter a team that’s really heavy on code reviews, but also if you enter a team or interact with a colleague who doesn’t share that level of respect for the process or…

CARDIAC: The Cardboard Computer

I am just so excited about this.

CARDIAC. The Cardboard Computer. How cool is that? This piece of history is amazing and better than that: it is extremely accessible. This fantastic design was built in 1969 by David Hagelbarger at Bell Labs to explain what computers were to those who would otherwise have no exposure to them. Miraculously, the CARDIAC (CARDboard Interactive Aid to Computation) was able to actually function as a slow and rudimentary computer. 
One of the most fascinating aspects of this gem is that at the time of its publication the scope it was able to demonstrate was actually useful in explaining what a computer was. Could you imagine trying to explain computers today with anything close to the CARDIAC?

It had 100 memory locations and only ten instructions. The memory held signed 3-digit numbers (-999 through 999) and instructions could be encoded such that the first digit was the instruction and the second two digits were the address of memory to operate on. The only re…