Monday, August 11, 2008

How to Understand AppEngine Datastore Under the Hood: Part 2 - The Raw Datastore API

If you haven't yet read the first part of this series, feel free to start from the beginning with Part 1 - An Overview of the Underview

Every AppEngine developer is familiar with the module. In Part 1 I introduced what goes on under the hood of this API, to give everyone a better understanding of what they are taking advantage of. Now, in Part 2, I'm going to detail the actual API that is used to utilize the raw entities behind our Model instances. At this time I am unsure if anything in this API is suspect to change, but I doubt anything is subject to drastic flux and I'm fairly confident everything here is safe for actual use, as much as anything else in AppEngine.

Module: google.appengine.api.datastore

Our main focus here is the Entity class. Everything supports it, from the Get, Put, and Delete functions to the Query class. Their uses are obvious. As previous exposed, each entity is essential a property bag and will take any given properties to the datastore for storage, query, and retrieval. Now, the entity is much more flat than its abstract cousin, the Model. It stores and retrieves the values, and then its job is done. It will tell you the key of a reference, but its up to you to request the actual entity based on that key.

Here is a full round trip for creating, storing, querying, and retrieving and entity at this low-level API.

from google.appengine.api import datastore

e = datastore.Entity(kind='test')
e['name'] = 'My Test Entity'
datastore.Put([e]) # The list must be of entities of the same kind only
also_e = datastore.Query(kind='test').Get(1)

assert e == also_e
assert e is not also_e


This is a very basic overview. We'll look at the details very soon. The entity is used very much like a dictionary, with value types restricted to datastore-compatible types of str, unicode, int, float, datastore.Key, or lists of one of these types.

One detail to note is that there are no provisions in place to ensure that Entities are cached or that when loading an entity, an existing instance with the same key is reused. This means that two entities (or Models) could represent the same persisted record, and changes to one or both that conflict will meet a race condition. This is something I would like to see change in the overall Datastore API. For now, keep it in mind and consider a cache of your own.

For complete API details, look in the AppEngine SDK's copy of this module. It is not the same as what runs on the AppEngine servers, but the API matches for all the public functions and classes.

While researching this I came across an interesting detail about the keys as represented by the datastore library. Every key is basically a trio of the Kind, ID, and the application identifier. Most of us are familiar just with the hash-looking form of the entity key and know that entities have numeric IDs, but we shouldn't rely on them as strongly as the keys. A little investigation into the source reveals that every key is actually a Protocol Buffer message, and that the hash-like key we see is actually the encoded PB message in url-safe base64, containing all three components. The keys are actually full paths to individual entities, mapped by application, kind, and ID. This intrigued me to attempt loading an entity by key giving another application name (of my own), to which I received an interesting error "BadRequestError: untrusted app shell cannot access app foo's data". The interesting thing about the error is that it doesn't tell us one application cannot access another's data, but that this particular application can't access this specific other applications data. Does this mean a future feature will allow it? The possibilities here are very exciting.

Please vote on Reddit and/or Digg this article.


bd_ said...

Perhaps the GAE panel itself uses the 'trusted app shell' system to do the datastore explorer stuff.

Anonymous said...

Where does it really store this information ? a file in local hard-disk or inside google servers?

bd_ said...

@anonymous: take a look at

Also, 'local disk' would be inside a google server, remember :)

Anonymous said...

I was very clear on my question. Recently I downloaded Google App engine and I was playing with the samples. If I use http://localhost:8080 and use some features for my own developements, the date store the app engine uses much be in the same machine as localhost. OR is it doing something like web-service data storage into the google servers ? Hope my questions is clear now. I'm concerned only about the data I use it to try out some app. The deployinh in appspot is entirley a different matter altogether. I wanted to see the data file which it created on my localdisk...


Anonymous said...

oops!, I should have said I was NOT very clear on my question...

Calvin Spealman said...

When you are running the dev server it uses a low performance local store. It is not the same implementation you get running on the google servers.

Hannson said...

Nice wrap-up.

Do you have an idea of how the Entity table looks like in BigTable?

I write here about programming, how to program better, things I think are neat and are related to programming. I might write other things at my personal website.

I am happily employed by the excellent Caktus Group, located in beautiful and friendly Carrboro, NC, where I work with Python, Django, and Javascript.

Blog Archive