login
v2
v1

jmoiron.net

Welcome to my blog. You may view older entries via list navigation, view posts sorted by topic, or see them arranged by date.

About SQLAlchemy and Django's ORM

posted June 21st, 2009 @ 17:08:13

- tags: development

- comments: 0

For some background, I've been working with Django for a few years now and for the past year have been a developer on a rather large django application at work. I did a few projects with SQLAlchemy about 2 years ago when I was put on a project whose requirements seemed to imply the need for a database but had no web component. I know of and have used other python ORMs (notably SQLObject), but this post is about SQLA and Django's ORMs, and how I consider the differences between them to be important but also neutral.

I started a small project recently that was going to require some light querying, and have had aborted re-starts on the business logic layer using first Elixir, then SQLObject, and now going straight ext.declarative with SQLAlchemy. The project has relatively few tables, with only a couple one-to-many and a single many-to-many relationship; as such it probably represents a fairly standard use case of an ORM. I've learned a lot about these different projects in the process, but was finally able to formalize some thoughts I've had with respect to SQLA and Django's ORMs.

The major takeaway from all of it is that SQLAlchemy's ORM and Django's ORM solve different problems. They aren't completely different problems, but they are different enough that you should have a completely different approach using each.

SQLAlchemy is a toolkit for python programmers who want or need to use a database to write programs that do so. Django's ORM is a tool for python programmers to quickly model business-level data relationships and use a database for persistent storage. This is really an important distinction; the Django ORM interface is focused on the application's data model, whereas SQLAlchemy's interface is focused on the underlying data store. In Django, your application is first class; in Alchemy, it isn't. As an example, look at a simple usage of a many-to-many relationship in each:

  • Django: Book.objects.filter(author__age=27)
  • SQLAlchemy: session.query(Book).join(Book, Author).filter(Author.age==27)

Note that Django's API is entirely concerned with the relationship between the application-level abstractions 'Book' and 'Author', whereas the Alchemy version includes far more details about what is actually going to happen when executed. This example exposes the different assumptions the ORMs have to live with, as well: implicit within Django's design are decisions on the way connections are made to the database, how to evaluate queries (lazily or eagerly), whereas in SQLAlchemy these details are in plain sight.

There's a large degree of negative/positive choice tradeoff going on between the two ORMs that took a while to dawn on me. As an application developer first and a database administrator and tuner maybe 5th at best, I tend to lean towards Django's ORM as the type of thing I'd rather use. I see that as a result of familiarity, my own weaknesses, and also the strengths of the Django ORM to do a good enough job most of the time.

Of course, there are things you can't do easily with Django's ORM, like fetch all books with 2 authors, or fetch all authors whose first and last names are the same, that are relatively simple to do in SQLAlchemy. The conceptual leap, of course, is understanding how to answer these questions from the standpoint of your database, not your data model. There's a case to be made that, if you have not already made this leap, your ability to solve problems will be constrained by the tools the Django ORM gives you, and I can see some truth in the blame critics place on Django's ORM for keeping some folks unenlightened. I still think that in the end, the python programmer is better off having both of these projects to learn from, and the python world better for having both to choose from.

update: This post got a bit of action at reddit. The comments there rightly pointed out that the examples I pointed to above that were 'not easily done' via Django are fairly easy to do with the new aggregation features in Django 1.1. It was a mistake not to acknowledge the improvements in the Django ORM post-1.0. My main point wasn't to highlight a laundry list of things that are hard with Django or easy with SQLAlchemy; I'm not really qualified to do this, but I felt that a couple examples of this would perhaps illustrate the differences I saw better than mere exposition.

They also pointed to the fact that 'natural orm' features are available via SQLAlchemy. The code examples I included above are already using relations, and at least according to the latest documentation they are the canonical way to deal with these types of relationships in SQLA 0.5.x. However, the fact that there are multiple ways of doing this speaks to the main point of this post; that the focus of SQLAlchemy is to allow you to do whatever it is you want without touching raw SQL, and the focus of Django's ORM is to provide your application with an easy-to-use data model.

The fact that Django's ORM is heavily influenced by the relationships SQL is good at is to me an unavoidable impurity; you could provide a Django-orm style data model for a non RDBMS, but SQLAlchemy outside of that context is completely meaningless. Which is fine; the difference is neutral, and SQLAlchemy is meant to be the best there is within that context. In fact, I think it clearly is; the whole point is that Django's ORM is only incidentally playing in the same sandbox, and is focusing on solving a slightly different problem.

Finding images in a binary file w/ python

posted June 15th, 2009 @ 23:13:27

- tags: general tech, python

- comments: 0

asobi seksu album cover

Been having a sort of chronic problem with my phone over the past couple weeks where songs would skip. Didn't think it was too big of a deal, but transferring some more music over to it the other day somehow my pictures all disappeared. I take fairly frequent backups, but this caught a few that weren't on the memory card. I checked dmesg, saw end_request: I/O error, dev sdg, sector 48728, and knew that I was probably in for some problems.

Hoping to salvage something, I made a dump of the volume via dd if=/dev/sdg of=phonestick, and set to work trying to figure out how I could read through the dump. I came upon a helpful page with jpeg header information and grepped for the ascii JFIF marker to make sure there were some recognizable jpeg files in there.

(Un)fortunately for me, I have some experience with disgusting byte-level hackery in python, and decided to give a crack at extracting all of the jpeg images I could find. In the end, a very simple and straightforward algorithm ended up working surprisingly well:

#!/usr/bin/env python

chunk = 1048576 * 4

# http://www.obrador.com/essentialjpeg/headerinfo.htm
start_of_image = soi = '\xff\xd8\xff\xe0'
jfif_id = 'JFIF\x00'
diffie_quant_marker = '\xff\xdb'
diffie_huffman_marker = '\xff\xc4'
frame_marker = '\xff\xc0'
scan_marker = '\xff\xda'
comment_marker = '\xff\xee'
end_of_image = eoi = '\xff\xd9'

def extra_check(string):
    """An extra check to make sure we're looking at a jpeg file..."""
    return jfif_id in string[:11]

def slice_image(img):
    """Find the EOI marker assuming we are at the beginning of a jpeg file."""
    dqm_loc = img.find(diffie_quant_marker)
    dhm_loc = img.find(diffie_huffman_marker, dqm_loc)
    frm_loc = img.find(frame_marker, dhm_loc)
    smk_loc = img.find(scan_marker, frm_loc)
    com_loc = img.find(comment_marker, smk_loc)
    eoi_loc = img.find(end_of_image, com_loc)
    return img[:eoi_loc+2]

def generate_jpeg_files(f):
    """A generator that spits out strings that match jpeg files.  `f` is a
    file opened in binary mode."""
    eof = False
    s = ''
    while not eof:
        while soi not in s:
            s = f.read(chunk)
            if not s:
                eof = True
                break
        img_loc = s.find(soi)
        img = s[img_loc:]
        if len(img) < 11:
            extra = f.read(chunk)
            img += extra
            s += extra
        if not extra_check(img):
            # hmm.. it wasn't a jpeg after all, continue
            s = s[img_loc+1:]
            continue
        image = slice_image(img)
        s = s[img_loc + len(image):]
        yield image


def find_all_images(filename, threshold=None):
    f = open(filename, 'rb')
    image_generator = generate_jpeg_files(f)
    for num,img in enumerate(image_generator):
        ifile = open('potential_image_%04d.jpg' % num, 'wb')
        ifile.write(img)
        ifile.close()
        if threshold and num > threshold:
            break
    f.close()

if __name__ == '__main__':
    import optparse
    parser = optparse.OptionParser(usage='%prog [opts] filename', version='1.0')
    parser.add_option('-t', '--threshold', help='maximum number of image files to extract')
    opts, args = parser.parse_args()
    threshold = int(opts.threshold) if opts.threshold else None
    find_all_images(args[0], threshold)

You can also download the script above; I see no reason why it shouldn't run on any platform although it can't attest to its endian-safeness. The algorithm is pretty basic; it searches for the SOI marker, then iterates through the rest of the markers until it finds what looks like the EOI marker, then spits that out as a file. The chunk size can be adjusted to your needs; I wasn't going to have anything much bigger than 1 meg on there, so I wasn't too worried about having to read lots of chunks for one image.

In the end, it worked really well and found 63 images on my phone card. Unfortunately, almost all of them were from mp3 ID3 tags, and the rest were from the phone's build in themes. It's entirely possible that I deleted that directory accidentally and then nuked the storage space with my music transfer.

Recent Google Releases

posted June 9th, 2009 @ 00:21:42

- tags: general tech

- comments: 0

After a year or two of incremental updates to gmail and youtube and general software announcements or releases that were not very exciting to me; notably mondrian and app-engine, google has released a torrent of interesting software in the past week that look fairly exciting to me!

The first thing to come across my radar was Google Page Speed, a Firefox+Firebug plugin that examines a page for potential improvements in load times. My 10 minute test ride at work showed that it had a much more extensive and helpful checklist of best practices than YSlow, which tends to moan about the last 10% of what you need (separate media to a cookie-less domain, use akamai, etc) but doesn't really help for day-to-day development. Page slow also did a lot more technical lifting, like telling you what styles were not in use on a page, or detecting that you are re-sizing images with html size attributes and telling you how much you could save by serving an appropriately sized thumbnail. Probably the most useful is the timing view, which breaks down resource loads into connecting, connection established, waiting for data, transfering, etc phases, and even tracks the execution of client javascript. Not exactly a profiler, but very nice for a high level look to identify any areas that might look suspicious!

Along with page speed's documentation, Google has also released a very concise listing of web performance best practices which is probably a handy link to give to any new team member joining a web development shop that might not have a recent web background. Like most google documentation (which is underrated, I think) it reads very well but remains very technical.

The release of a branded linux version of the browser Google Chrome has finally arrived in the form of a dev channel version. Although linux support is still in its early stages, it is very nice to be getting a quality competitor to Firefox with a different javascript engine based on webkit. As of yet plugins do not work, which means no Java (vpn access) and no flash. I've used the linux dev channel version for a little over 3 days now and the only other things I've noticed is that HTTP Auth doesn't seem to work and the web developer tools also seem to be broken. Chrome seems to take almost an order of magnitude less ram than Firefox (although it is not like-for-like w/o the massive java/flash plugins loaded), and also starts up so fast it made me realize I developed a habit of looking around the room as I wait for Firefox to start.

Google also announced that they would be supporting Mercurial repositories in google code. This makes google code and bitbucket attractive destinations for code that I might wish to contribute to the community in a more active way, as creating branches on hosted sites seems to give more visibility.

Finally, Google has released something I've been waiting for for a long time: the Android scripting environment, which allows you to write programs for any android phone in Python, Lua, or a JRE-based scripting language called BeanShell that google forked back in 2007. With this and the release of the non-android Palm Pre this past saturday, suddenly my next phone selection got very interesting! If Android handset makers could get their act together, they might have a very compelling product for the very niche libre geek!

Deploying django on mod_wsgi, virtualenv

posted May 27th, 2009 @ 01:23:40

- tags: python , development

- comments: 0

With my recent release of a new version of this site, I finally made the jump from using mod_python to using mod_wsgi. This change had been a long time coming, but was made slightly less than straight forward by my desire to deploy to virtualenv environments.

read the rest of "Deploying django on mod_wsgi, virtualenv"

New version of jmoiron.net

posted May 25th, 2009 @ 12:39:50

- tags: site news

- comments: 0

As you would probably expect with how problematic and generally incomplete it feels, this blog is based upon a long and wretched history of failed or aborted blog projects. Beginning in February 2002, when I wrote far more often about completely uninteresting things (a trend that's caught on with the web at large), this blog has seen 7 past versions:

read the rest of "New version of jmoiron.net"

QuerySet Caching

posted May 18th, 2009 @ 21:48:47

- tags: python , development

- comments: 0

Spent a few hours today trying to figure out how to do QuerySet caching in django in some kind of transparent manner. We are working on a multi-tiered caching system, sort of like that which was released by the pownce guys a few weeks ago, except with a bit more there there.

read the rest of "QuerySet Caching"

Be careful with metaclasses

posted May 13th, 2009 @ 23:40:14

- tags: python , development

- comments: 0

Python metaclasses can be really intense. SQLObject's declarative metaclass base, popularized by Django's ORM, has been as useful a DSL as the python world has had in a long time. But please be careful when using metaclasses in a way that makes it difficult to extend via subclassing!

read the rest of "Be careful with metaclasses"

Plaintext Style Markup

posted April 23rd, 2009 @ 02:07:39

- tags: python , development

- comments: 0

I've been in search of the "holy grail" of plain text markup for years. Before the first django version of this site, I was writing a custom mod_python handler to run a server-page style site and as part of that writing my own wiki creole. Since then, I've used TracWiki, reStructured Text, Markdown, MediaWiki, and Textile but have been left slightly disappointed by them all.

read the rest of "Plaintext Style Markup"

Not Your Father's Music

posted March 12th, 2009 @ 23:32:51

- tags: general tech , music

- comments: 0

I don't even understand why it's notable anymore that something tragic is or isn't announced on the internet. Isn't it past time now that everyone owns up to the fact that the majority of communication for this generation and at least the next few future ones will happen over the internet?

read the rest of "Not Your Father's Music"

New York City

posted March 3rd, 2009 @ 23:49:08

- tags: life

- comments: 0

Finally moved in to my new apartment in New York City. I can see the Wintergarden Theater from my bedroom, and the ex-Lehman Building behind it. On the border of the theater district and hells kitchen, I can't really ask for a better location; a great mix of life and, from the 29th floor, peacefulness.

read the rest of "New York City"