Skip to main content

How to Understand AppEngine Datastore Under the Hood: Part 1 - An Overview of the Underview

There are a lot of wrong perceptions about the datastore in Google AppEngine. People both familiar and foreign with AppEngine don't really understand what the datastore is. There is a deeper system underneath the nice API we are given. Understanding the guts can help us understand the skin. We may also find there are times when we must shed the skin for new clothing.

The biggest misconception about the datastore is the assumption that "kinds" are anything like "tables". You could use a set of entity kinds similar to the way you would use a set of tables, but they simply are different beasts, entirely. A table controls a strict requirement on the structure of its rows. Every entity, on the other hand, is free to hold any properties of allowed types. The published Model API is all an abstraction provided to give us a nice interface on top of an otherwise much looser foundation.

Many people would be very surprised to learn that a given kind doesn't actually require anything of its entities, but from the right angle it makes perfect sense. Meeting the kind of scalability requirements the datastore is designed for places interesting limitations. Schema changes can't get in the way when you could have such a large dataset that no operation can ever effectively operate on the entire set at once. This means what was a simple matter of ALTER TABLE in SQL is practically impossible in this new world, as the logistics behind updating and migrating potentially millions of entities to a new schema grows beyond the acceptable resources to give to a schema change. However, if we allow flexibility, we simply start creating new entities in the updated form and be sure that when we load one of the previous versions, we're prepared to use or upgrade it on the spot. For this and other reasons, allowing all entities to be free-form is the simplest direction to provide the foundation we need.

With a better understanding of our foundation we can better understand the abstractions in google.api.ext.db, with the Model subclasses most AppEngine developers know. I've seen quite a few people asking about migrating to changes in their db.Model subclasses, not understanding why or how their existing entities will change to match the newly defined properties. The behavior and how to work with it is a lot easier to understand when you view the individual entities are independent property bags, and not rows following a defined column schematic. We can also come to understand db.Expando as closer to the wire, so to speak, than its stricter Model cousin.

Perhaps a more exciting gain from this different view of the datastore is that we aren't bound by the published Model-centric API at all. In fact, we can access the underlying Entity class directly, providing us with a simple, persisted mapping object, without anything building on top of it. If we need some structure to our persistence, but the provided API simply isn't to taste, then an understanding of this layer gives us what we need to build our own variant datastore API. We may even use this understand to provide implementations compatible with previous ORM solutions, but powered by the entities and BigTable, rather than traditional SQL databases. The possibilities open up with our deeper understanding.

The more variation we have in what everyone is doing on AppEngine, the more value it has to all of us. Take this information and do some exciting. Share it and we're all reap the benefits.

Look for Part 2: The Raw Datastore API

Please vote on Reddit and/or Digg this article.

Comments

Popular posts from this blog

On Pruning Your Passions

We live in a hobby-rich world. There is no shortage of pastimes to grow a passion for. There is a shortage of one thing: time to indulge those passions. If you're someone who pours your heart into that one thing that makes your life worthwhile, that's a great deal. But, what if you've got no shortage of interests that draw your attention and you realize you will never have the time for all of them?

If I look at all the things I'd love to do with my life as a rose bush I'm tending, I realize that careful pruning is essential for the best outcome. This is a hard lesson to learn, because it can mean cutting beautiful flowers and watching the petals fall to the ground to wither. It has to be done.

I have a full time job that takes a lot of my mental energy. I have a wife and a son and family time is very important in my house. I try to read more, and I want to keep up with new developments in my career, and I'm trying to make time for simple, intentional relaxing t…

The Insidiousness of The Slow Solution

In software development, slow solutions can be worse than no progress at all. I'll even say its usually worse and if you find yourself making slow progress on a problem, consider stopping while you're a head.

Its easy to see why fast progress is better: either you solve the problem or you prove a proposed solution wrong and find a better one. Even a total standstill in pushing forward on a task or a bug or a request can force you to seek out new information or a second opinion.

Slow solutions, on the other hand, is kind of sneaky. Its insidious. Slow solution is related the Sunk Cost Fallacy, but maybe worse. Slow solutions have you constantly dripping more of your time, energy, and hope into a path that's still unproven, constantly digging a hole. Slow solutions are deceptive, because they still do offer real progress. It is hard to justify abandoning it or trying another route, because it is "working", technically.

We tend to romanticize the late night hacking…

Why I Switched From Git to Microsoft OneDrive

I made the unexpected move with a string of recent projects to drop Git to sync between my different computers in favor of OneDrive, the file sync offering from Microsoft. Its like Dropbox, but "enterprise."

Feeling a little ashamed at what I previously would have scoffed at should I hear of it from another developer, I felt a little write up of the why and the experience could be a good idea. Now, I should emphasize that I'm not dropping Git for all my projects, just specific kinds of projects. I've been making this change in habit for projects that are just for me, not shared with anyone else. It has been especially helpful in projects I work on sporadically. More on why a little later.

So, what drove me away from Git, exactly?

On the smallest projects, like game jam hacks, I just wanted to code. I didn't want to think about revisions and commit messages. I didn't need branching or merges. I didn't even need to rollback to another version, ever. I just …