Skip to main content

How to Understand AppEngine Datastore Under the Hood: Part 2 - The Raw Datastore API

If you haven't yet read the first part of this series, feel free to start from the beginning with Part 1 - An Overview of the Underview

Every AppEngine developer is familiar with the module. In Part 1 I introduced what goes on under the hood of this API, to give everyone a better understanding of what they are taking advantage of. Now, in Part 2, I'm going to detail the actual API that is used to utilize the raw entities behind our Model instances. At this time I am unsure if anything in this API is suspect to change, but I doubt anything is subject to drastic flux and I'm fairly confident everything here is safe for actual use, as much as anything else in AppEngine.

Module: google.appengine.api.datastore

Our main focus here is the Entity class. Everything supports it, from the Get, Put, and Delete functions to the Query class. Their uses are obvious. As previous exposed, each entity is essential a property bag and will take any given properties to the datastore for storage, query, and retrieval. Now, the entity is much more flat than its abstract cousin, the Model. It stores and retrieves the values, and then its job is done. It will tell you the key of a reference, but its up to you to request the actual entity based on that key.

Here is a full round trip for creating, storing, querying, and retrieving and entity at this low-level API.


from google.appengine.api import datastore

e = datastore.Entity(kind='test')
e['name'] = 'My Test Entity'
datastore.Put([e]) # The list must be of entities of the same kind only
also_e = datastore.Query(kind='test').Get(1)

assert e == also_e
assert e is not also_e

datastore.Delete(e)


This is a very basic overview. We'll look at the details very soon. The entity is used very much like a dictionary, with value types restricted to datastore-compatible types of str, unicode, int, float, datastore.Key, or lists of one of these types.

One detail to note is that there are no provisions in place to ensure that Entities are cached or that when loading an entity, an existing instance with the same key is reused. This means that two entities (or Models) could represent the same persisted record, and changes to one or both that conflict will meet a race condition. This is something I would like to see change in the overall Datastore API. For now, keep it in mind and consider a cache of your own.

For complete API details, look in the AppEngine SDK's copy of this module. It is not the same as what runs on the AppEngine servers, but the API matches for all the public functions and classes.

While researching this I came across an interesting detail about the keys as represented by the datastore library. Every key is basically a trio of the Kind, ID, and the application identifier. Most of us are familiar just with the hash-looking form of the entity key and know that entities have numeric IDs, but we shouldn't rely on them as strongly as the keys. A little investigation into the source reveals that every key is actually a Protocol Buffer message, and that the hash-like key we see is actually the encoded PB message in url-safe base64, containing all three components. The keys are actually full paths to individual entities, mapped by application, kind, and ID. This intrigued me to attempt loading an entity by key giving another application name (of my own), to which I received an interesting error "BadRequestError: untrusted app shell cannot access app foo's data". The interesting thing about the error is that it doesn't tell us one application cannot access another's data, but that this particular application can't access this specific other applications data. Does this mean a future feature will allow it? The possibilities here are very exciting.

Please vote on Reddit and/or Digg this article.

Comments

bd_ said…
Perhaps the GAE panel itself uses the 'trusted app shell' system to do the datastore explorer stuff.
Anonymous said…
Where does it really store this information ? a file in local hard-disk or inside google servers?
bd_ said…
@anonymous: take a look at http://labs.google.com/papers/bigtable.html

Also, 'local disk' would be inside a google server, remember :)
Anonymous said…
I was very clear on my question. Recently I downloaded Google App engine and I was playing with the samples. If I use http://localhost:8080 and use some features for my own developements, the date store the app engine uses much be in the same machine as localhost. OR is it doing something like web-service data storage into the google servers ? Hope my questions is clear now. I'm concerned only about the data I use it to try out some app. The deployinh in appspot is entirley a different matter altogether. I wanted to see the data file which it created on my localdisk...

thanks
Anonymous said…
oops!, I should have said I was NOT very clear on my question...
Calvin Spealman said…
When you are running the dev server it uses a low performance local store. It is not the same implementation you get running on the google servers.
Hannson said…
Nice wrap-up.

Do you have an idea of how the Entity table looks like in BigTable?

Popular posts from this blog

On Pruning Your Passions

We live in a hobby-rich world. There is no shortage of pastimes to grow a passion for. There is a shortage of one thing: time to indulge those passions. If you're someone who pours your heart into that one thing that makes your life worthwhile, that's a great deal. But, what if you've got no shortage of interests that draw your attention and you realize you will never have the time for all of them?

If I look at all the things I'd love to do with my life as a rose bush I'm tending, I realize that careful pruning is essential for the best outcome. This is a hard lesson to learn, because it can mean cutting beautiful flowers and watching the petals fall to the ground to wither. It has to be done.

I have a full time job that takes a lot of my mental energy. I have a wife and a son and family time is very important in my house. I try to read more, and I want to keep up with new developments in my career, and I'm trying to make time for simple, intentional relaxing t…

The Insidiousness of The Slow Solution

In software development, slow solutions can be worse than no progress at all. I'll even say its usually worse and if you find yourself making slow progress on a problem, consider stopping while you're a head.

Its easy to see why fast progress is better: either you solve the problem or you prove a proposed solution wrong and find a better one. Even a total standstill in pushing forward on a task or a bug or a request can force you to seek out new information or a second opinion.

Slow solutions, on the other hand, is kind of sneaky. Its insidious. Slow solution is related the Sunk Cost Fallacy, but maybe worse. Slow solutions have you constantly dripping more of your time, energy, and hope into a path that's still unproven, constantly digging a hole. Slow solutions are deceptive, because they still do offer real progress. It is hard to justify abandoning it or trying another route, because it is "working", technically.

We tend to romanticize the late night hacking…

Why I Switched From Git to Microsoft OneDrive

I made the unexpected move with a string of recent projects to drop Git to sync between my different computers in favor of OneDrive, the file sync offering from Microsoft. Its like Dropbox, but "enterprise."

Feeling a little ashamed at what I previously would have scoffed at should I hear of it from another developer, I felt a little write up of the why and the experience could be a good idea. Now, I should emphasize that I'm not dropping Git for all my projects, just specific kinds of projects. I've been making this change in habit for projects that are just for me, not shared with anyone else. It has been especially helpful in projects I work on sporadically. More on why a little later.

So, what drove me away from Git, exactly?

On the smallest projects, like game jam hacks, I just wanted to code. I didn't want to think about revisions and commit messages. I didn't need branching or merges. I didn't even need to rollback to another version, ever. I just …