Skip to main content

How to Understand AppEngine Datastore Under the Hood: Part 2 - The Raw Datastore API

If you haven't yet read the first part of this series, feel free to start from the beginning with Part 1 - An Overview of the Underview

Every AppEngine developer is familiar with the module. In Part 1 I introduced what goes on under the hood of this API, to give everyone a better understanding of what they are taking advantage of. Now, in Part 2, I'm going to detail the actual API that is used to utilize the raw entities behind our Model instances. At this time I am unsure if anything in this API is suspect to change, but I doubt anything is subject to drastic flux and I'm fairly confident everything here is safe for actual use, as much as anything else in AppEngine.

Module: google.appengine.api.datastore

Our main focus here is the Entity class. Everything supports it, from the Get, Put, and Delete functions to the Query class. Their uses are obvious. As previous exposed, each entity is essential a property bag and will take any given properties to the datastore for storage, query, and retrieval. Now, the entity is much more flat than its abstract cousin, the Model. It stores and retrieves the values, and then its job is done. It will tell you the key of a reference, but its up to you to request the actual entity based on that key.

Here is a full round trip for creating, storing, querying, and retrieving and entity at this low-level API.


from google.appengine.api import datastore

e = datastore.Entity(kind='test')
e['name'] = 'My Test Entity'
datastore.Put([e]) # The list must be of entities of the same kind only
also_e = datastore.Query(kind='test').Get(1)

assert e == also_e
assert e is not also_e

datastore.Delete(e)


This is a very basic overview. We'll look at the details very soon. The entity is used very much like a dictionary, with value types restricted to datastore-compatible types of str, unicode, int, float, datastore.Key, or lists of one of these types.

One detail to note is that there are no provisions in place to ensure that Entities are cached or that when loading an entity, an existing instance with the same key is reused. This means that two entities (or Models) could represent the same persisted record, and changes to one or both that conflict will meet a race condition. This is something I would like to see change in the overall Datastore API. For now, keep it in mind and consider a cache of your own.

For complete API details, look in the AppEngine SDK's copy of this module. It is not the same as what runs on the AppEngine servers, but the API matches for all the public functions and classes.

While researching this I came across an interesting detail about the keys as represented by the datastore library. Every key is basically a trio of the Kind, ID, and the application identifier. Most of us are familiar just with the hash-looking form of the entity key and know that entities have numeric IDs, but we shouldn't rely on them as strongly as the keys. A little investigation into the source reveals that every key is actually a Protocol Buffer message, and that the hash-like key we see is actually the encoded PB message in url-safe base64, containing all three components. The keys are actually full paths to individual entities, mapped by application, kind, and ID. This intrigued me to attempt loading an entity by key giving another application name (of my own), to which I received an interesting error "BadRequestError: untrusted app shell cannot access app foo's data". The interesting thing about the error is that it doesn't tell us one application cannot access another's data, but that this particular application can't access this specific other applications data. Does this mean a future feature will allow it? The possibilities here are very exciting.

Please vote on Reddit and/or Digg this article.

Comments

bd_ said…
Perhaps the GAE panel itself uses the 'trusted app shell' system to do the datastore explorer stuff.
Anonymous said…
Where does it really store this information ? a file in local hard-disk or inside google servers?
bd_ said…
@anonymous: take a look at http://labs.google.com/papers/bigtable.html

Also, 'local disk' would be inside a google server, remember :)
Anonymous said…
I was very clear on my question. Recently I downloaded Google App engine and I was playing with the samples. If I use http://localhost:8080 and use some features for my own developements, the date store the app engine uses much be in the same machine as localhost. OR is it doing something like web-service data storage into the google servers ? Hope my questions is clear now. I'm concerned only about the data I use it to try out some app. The deployinh in appspot is entirley a different matter altogether. I wanted to see the data file which it created on my localdisk...

thanks
Anonymous said…
oops!, I should have said I was NOT very clear on my question...
Calvin Spealman said…
When you are running the dev server it uses a low performance local store. It is not the same implementation you get running on the google servers.
hannson said…
Nice wrap-up.

Do you have an idea of how the Entity table looks like in BigTable?

Popular posts from this blog

CARDIAC: The Cardboard Computer

I am just so excited about this. CARDIAC. The Cardboard Computer. How cool is that? This piece of history is amazing and better than that: it is extremely accessible. This fantastic design was built in 1969 by David Hagelbarger at Bell Labs to explain what computers were to those who would otherwise have no exposure to them. Miraculously, the CARDIAC (CARDboard Interactive Aid to Computation) was able to actually function as a slow and rudimentary computer.  One of the most fascinating aspects of this gem is that at the time of its publication the scope it was able to demonstrate was actually useful in explaining what a computer was. Could you imagine trying to explain computers today with anything close to the CARDIAC? It had 100 memory locations and only ten instructions. The memory held signed 3-digit numbers (-999 through 999) and instructions could be encoded such that the first digit was the instruction and the second two digits were the address of memory to operate on

Statement Functions

At a small suggestion in #python, I wrote up a simple module that allows the use of many python statements in places requiring statements. This post serves as the announcement and documentation. You can find the release here . The pattern is the statement's keyword appended with a single underscore, so the first, of course, is print_. The example writes 'some+text' to an IOString for a URL query string. This mostly follows what it seems the print function will be in py3k. print_("some", "text", outfile=query_iostring, sep="+", end="") An obvious second choice was to wrap if statements. They take a condition value, and expect a truth value or callback an an optional else value or callback. Values and callbacks are named if_true, cb_true, if_false, and cb_false. if_(raw_input("Continue?")=="Y", cb_true=play_game, cb_false=quit) Of course, often your else might be an error case, so raising an exception could be useful

How To Teach Software Development

How To Teach Software Development Introduction Developers Quality Control Motivation Execution Businesses Students Schools Education is broken. Education about software development is even more broken. It is a sad observation of the industry from my eyes. I come to see good developers from what should be great educations as survivors, more than anything. Do they get a headstart from their education or do they overcome it? This is the first part in a series on software education. I want to open a discussion here. Please comment if you have thoughts. Blog about it, yourself. Write about how you disagree with me. Write more if you don't. We have a troubled industry. We care enough to do something about it. We hark on the bad developers the way people used to point at freak shows, but we only hurt ourselves but not improving the situation. We have to deal with their bad code. We are the twenty percent and we can't talk to the eighty percent, by definition, so we need to impro