Skip to main content

How to Understand AppEngine Datastore Under the Hood: Part 2 - The Raw Datastore API

If you haven't yet read the first part of this series, feel free to start from the beginning with Part 1 - An Overview of the Underview

Every AppEngine developer is familiar with the module. In Part 1 I introduced what goes on under the hood of this API, to give everyone a better understanding of what they are taking advantage of. Now, in Part 2, I'm going to detail the actual API that is used to utilize the raw entities behind our Model instances. At this time I am unsure if anything in this API is suspect to change, but I doubt anything is subject to drastic flux and I'm fairly confident everything here is safe for actual use, as much as anything else in AppEngine.

Module: google.appengine.api.datastore

Our main focus here is the Entity class. Everything supports it, from the Get, Put, and Delete functions to the Query class. Their uses are obvious. As previous exposed, each entity is essential a property bag and will take any given properties to the datastore for storage, query, and retrieval. Now, the entity is much more flat than its abstract cousin, the Model. It stores and retrieves the values, and then its job is done. It will tell you the key of a reference, but its up to you to request the actual entity based on that key.

Here is a full round trip for creating, storing, querying, and retrieving and entity at this low-level API.


from google.appengine.api import datastore

e = datastore.Entity(kind='test')
e['name'] = 'My Test Entity'
datastore.Put([e]) # The list must be of entities of the same kind only
also_e = datastore.Query(kind='test').Get(1)

assert e == also_e
assert e is not also_e

datastore.Delete(e)


This is a very basic overview. We'll look at the details very soon. The entity is used very much like a dictionary, with value types restricted to datastore-compatible types of str, unicode, int, float, datastore.Key, or lists of one of these types.

One detail to note is that there are no provisions in place to ensure that Entities are cached or that when loading an entity, an existing instance with the same key is reused. This means that two entities (or Models) could represent the same persisted record, and changes to one or both that conflict will meet a race condition. This is something I would like to see change in the overall Datastore API. For now, keep it in mind and consider a cache of your own.

For complete API details, look in the AppEngine SDK's copy of this module. It is not the same as what runs on the AppEngine servers, but the API matches for all the public functions and classes.

While researching this I came across an interesting detail about the keys as represented by the datastore library. Every key is basically a trio of the Kind, ID, and the application identifier. Most of us are familiar just with the hash-looking form of the entity key and know that entities have numeric IDs, but we shouldn't rely on them as strongly as the keys. A little investigation into the source reveals that every key is actually a Protocol Buffer message, and that the hash-like key we see is actually the encoded PB message in url-safe base64, containing all three components. The keys are actually full paths to individual entities, mapped by application, kind, and ID. This intrigued me to attempt loading an entity by key giving another application name (of my own), to which I received an interesting error "BadRequestError: untrusted app shell cannot access app foo's data". The interesting thing about the error is that it doesn't tell us one application cannot access another's data, but that this particular application can't access this specific other applications data. Does this mean a future feature will allow it? The possibilities here are very exciting.

Please vote on Reddit and/or Digg this article.

Comments

bd_ said…
Perhaps the GAE panel itself uses the 'trusted app shell' system to do the datastore explorer stuff.
Anonymous said…
Where does it really store this information ? a file in local hard-disk or inside google servers?
bd_ said…
@anonymous: take a look at http://labs.google.com/papers/bigtable.html

Also, 'local disk' would be inside a google server, remember :)
Anonymous said…
I was very clear on my question. Recently I downloaded Google App engine and I was playing with the samples. If I use http://localhost:8080 and use some features for my own developements, the date store the app engine uses much be in the same machine as localhost. OR is it doing something like web-service data storage into the google servers ? Hope my questions is clear now. I'm concerned only about the data I use it to try out some app. The deployinh in appspot is entirley a different matter altogether. I wanted to see the data file which it created on my localdisk...

thanks
Anonymous said…
oops!, I should have said I was NOT very clear on my question...
Calvin Spealman said…
When you are running the dev server it uses a low performance local store. It is not the same implementation you get running on the google servers.
Hannson said…
Nice wrap-up.

Do you have an idea of how the Entity table looks like in BigTable?

Popular posts from this blog

Respect and Code Reviews

Code Reviews in a development team only function best, or possible at all, when everyone approaches them with respect. That’s something I’ve usually taken for granted because I’ve had the opportunity to work with amazing developers who shine not just in their technical skills but in their interpersonal skills on a team. That isn’t always the case, so I’m going to put into words something that often exists just in assumptions.
You have to respect your code. This is first only because the nature and intent of code reviews are to safeguard the quality of your code, so even having code reviews demonstrates a baseline of respect for that code. But, maybe not everyone on the team has the same level of respect or entered a team with existing review traditions that they aren’t acquainted with.
There can be culture shock when you enter a team that’s really heavy on code reviews, but also if you enter a team or interact with a colleague who doesn’t share that level of respect for the process or…

CARDIAC: The Cardboard Computer

I am just so excited about this.


CARDIAC. The Cardboard Computer. How cool is that? This piece of history is amazing and better than that: it is extremely accessible. This fantastic design was built in 1969 by David Hagelbarger at Bell Labs to explain what computers were to those who would otherwise have no exposure to them. Miraculously, the CARDIAC (CARDboard Interactive Aid to Computation) was able to actually function as a slow and rudimentary computer. 
One of the most fascinating aspects of this gem is that at the time of its publication the scope it was able to demonstrate was actually useful in explaining what a computer was. Could you imagine trying to explain computers today with anything close to the CARDIAC?

It had 100 memory locations and only ten instructions. The memory held signed 3-digit numbers (-999 through 999) and instructions could be encoded such that the first digit was the instruction and the second two digits were the address of memory to operate on. The only re…

How To Care If BSD, MIT, or GPL Licenses Are Used

The two recent posts about some individuals' choice of GPL versus others' preference for BSD and MIT style licensing has caused a lot of debate and response. I've seen everything as an interesting combination of very important topics being taken far too seriously and far too personally. All involved need to take a few steps back.

For the uninitiated and as a clarifier for the initiated, we're dealing with (basically) three categories of licensing when someone releases software (and/or its code):
Closed Source. Easiest to explain, because you just get nothing.GPL. If you get the software, you get the source code, you get to change it, and anything you combine it with must be under the same terms.MIT and BSD. If you get the software, you might get the source code, you get to change it, and you have no obligations about anything else you combine it with.The situation gets stickier when we look at those combinations and the transitions between them.

Use GPL code with Closed S…