login
v2
v1

jmoiron.net

Welcome to my blog. You may view older entries via list navigation, view posts sorted by topic, or see them arranged by date.

Early impressions on sass

posted March 9th, 2010 @ 02:27:02

- tags: web design

- comments: 0

Spent a little time with "Syntactically Awesome StyleSheets" tonight (sass). I wrote up a preliminary new design for this blog, and used all of the features in the sass tutorial. My impressions as I went through this process:

  • no virtualenv for ruby afaik :(
  • debian's haml is outdated, and sudo gem install haml doesn't seem to overwrite the /usr/bin sass
  • but installing locally was ridiculously easy and gem's output is helpful; virtualenv blows my PATH away so had to add a postactivate adding the local ruby bin path..
  • after 2 seconds realized I rely on vim's excellent CSS highlight file to not make mistakes (is it line-spacing or line-height...); this is the best sass.vim i could find
  • macros and variables are a breath of fresh air
  • nesting/indentation feels right at home
  • nesting stuff like :font (family, size) or :background(position, repeat, image) doesn't really feel right in the context of the rest of the document
  • my mind context-switches to css (or js) mode from python mode and it becomes very hard to turn off the urge to put ;'s at the end of lines
  • what's the deal with : and = for setting styles? if a sigil that has no css significance was chosen for variables, you could just interpolate by default
  • I realized that sigils and interpolation don't really feel right to me, and the =/+ syntax for macros is attractive but fairly arbitrary
  • love being able to do color and spatial arithmetic
  • I like it when I guess syntax (nesting div > h2,h3,h4 > ...) and it works

My experiment involved applying style to markup that I didn't want to touch, so some of sass' macroing became really convenient, since I couldn't rely on creating classes and spreading them about the markup to distribute common styles. Grabbing any colour palatte off COLOURlovers and naming them with variables made it really easy to swap them around during the design phase; in fact, this probably saved me a lot of time. I wish they were available for use in firebug somehow.

As a non-ruby programmer, I can definitely feel the ruby heritage in sass. For one, it does make your CSS a lot cleaner looking. I have a real soft spot for this almost zen-like focus on beauty and self improvement I perceive in some of the ruby community's major projects. When I released django-selector last night, one of the first reactions was "What's the point?", and my first thought was (innocently) "a ruby developer would see the point."

The dark side of its ruby heritage is in its liberal use of sigils and in particular its bizarre sigil design choices. For instance, you can specify attribute families, like background-color, background-position, etc. like this:

h1
  :background
    color: #fff
    position: top left
    repeat: repeat-x

This seems okay at first (though it overloads the meaning of that nesting), but then you discover that when you want to nest CSS pseudo-classes (like a:hover or a:visited), you have to add another sigil:

a
  color: blue
  &:hover
    background-color: #eee

I can't help but wonder if the 'attribute family' sigil had been chosen as & instead of :, wouldn't you be able to use the much-more natural :hover syntax without an explicit sigil? The decision to make ! the variable sigil is also an unfortunate choice, although as far as sigils go I think it's the most attractive I've seen. If it had been something else ($ perhaps), it wouldn't overlap with vanilla CSS values like !important, and you wouldn't need to differentiate between : and = or have explicit interpolation (except perhaps to group for arithmetic).

Is johnny-cache for you?

posted March 2nd, 2010 @ 05:24:36

- tags: development, python

- comments: 7

I've been pleasantly surprised with the amount of interest in johnny-cache since Jeremy and I released it this past weekend. A lot of the comments revealed that perhaps the documentation is missing an important discussion on the repercussions of using Johnny. They are also pretty positive about the name :)

"Is johnny-cache for you?" is the most important question that is not answered by the documentation. Using Johnny is really adopting a particular caching strategy. This strategy isn't always a win; it can impact performance negatively:

  • any real database read is first a cache miss, then a cache write
  • any database write is a cache write
  • any write to any table invalidates all cache depending on that table
  • there are extra cache reads on every request to load the current generations

The major positive impact is:

  • any cached read doesn't hit your database

This turns out to be a pretty exceptional positive for pretty large class of applications. Loading from memcached is going to smoke even your db's queryset cache with respects to latency while giving you cheap and easy horizontal scalability. It's not often you get these two coming hand in hand.

Every time you do a query that hits cache, your database doesn't have to accept a connection, allocate cursors, examine your query, execute it, and return the result. This is a fairly heavy cognitive load to lift off of your database servers.

If you were using something akin to MySQL's queryset cache before, you can pretty much turn it off. Not only do you get that memory back for loading indexes, performing queries, etc, but you can now horizontally scale your query cache with ease.

Pre-Django 1.2, splitting db reads and db writes at the application level was a real pain. Scaling reads across a pool of RODB databases is no picnic, either. For a read-heavy application, Johnny can alleviate so much read traffic that you can potentially just scale reads in memcached. Even if you need to horizontally scale reads across an rodb pool, they now have a shared queryset cache, such that reads on one slave saves reads to another.

Still, writes eventually happen, and when they do, Johnny will blow away the cache depending on the table written to. The implications of this are that Johnny's effectiveness is reduced if you:

  • have "logical" write operations that hit many tables
  • write heavily to one table that is then featured in many joins
  • have very few tables

An unappreciated caveat to this is that the relative frequency of your writes and reads matters quite a bit. For a simple one page, one query, one table scenario where you are receiving about 1 write per second. This might seem like too often for Johnny to be useful, but if you serve 30 pages per second, you are hitting cache 96% of the time.

Typical webapps are going to read far more often than they write, and serve a few pages far more often than the other pages on the site. For these apps, Johnny will probably work quite well. Even in cases where it doesn't fly, it's probably a good starting point.

But due to the magic of the internet, I don't have to rely solely on hypothetical and anecdotal evidence. Someone running such an application tried Johnny out and wrote a nice little blog post about his results. His chart even suggests that his application is quite write heavy. It also looks pretty similar to what we saw when we pushed the primordial version of Johnny live last year. The post itself is pretty fascinating; the readers digest translation is that he already had some caching in place, but installed Johnny, set it up, and his query count still dropped pretty dramatically (illustrated). Note that it wasn't just cache hits that dropped; Johnny can cache some queries that MySQL can't, and there are other classes of queries that are impossible to cache but are easily avoided. Despite that initial positive result, he noticed that his CPU utilization and context-switching increased, likely because memcached and mysql (and I perhaps even his app server) were running on the same box.

So, where to take Johnny from here? Johnny is version '0.1' not because we think it's barely ready for use, but because we felt like we released the smallest piece of software that could actually be of use.

The first improvement would be a way to allow application authors to keep Johnny from caching result sets from tables that receive very heavy write traffic, like a log table. Although monkeypatching was really the only way to achieve the level of integration and simplicity we needed, you always have to acknowledge that there will be cases where people only want to use your code some of the time, or maybe most of the time, but not all of the time. Some kind of model annotation or table blacklist might suffice here, but I want to think through this and its invalidation implications a bit more before deciding on how to do it.

Another improvement I want is increased access to the generational keys Johnny maintains. I recognize cases where you might want to use Johnny's invalidation to consistently cache higher level objects like html fragments or even entire pages. Consider something like a @invalidate_on_model(Post) decorator for an RSS feed of latest blog posts that would only have to be generated upon the first read, and invalidates automatically when the Post's table is altered (or after some optional timeout). I'm still trying to work out how to increase this idea's usefulness when you introduce pagination.

Towards answering the question that is the title of and reason for this post, I'd like to either build in or provide separately something that utilizes Johnny's hit/miss signals to give per-page and per-table statistics about cache hits and misses.

Every application has its own set of circumstances and requirements, and probably its own optimal caching strategy, but if you're a perfectionist with a deadline, Johnny might just get you a whole lot of bang for fairly little buck.

Johnny Cache

posted February 28th, 2010 @ 12:11:56

- tags: python, development, web design

- comments: 20

I've been waiting a long time to write about this. Johnny Cache is now released upon the world. It's a drop-in caching library/framework for Django that will cache all of your querysets forever in a consistent and safe manner. You can install it via pip install johnny-cache.

Conceptually, Johnny Cache started when I wrote the 'Queryset Caching' post last May. That was written after the ideas for how to implement such a cache had coalesced into a plan, but before an implementation had been created. A proof of concept was developed that summer and put into production on a fairly large site. The code that went into that version was probably not releasable; but a lot of work had gone into the code (and the testing suite), and I couldn't bear to start over on a clean implementation.

This January rolled along, and a few events converged that convinced me I needed to increase my open source footprint. Johnny had proven to be central to the scaling capability of the application where it was in use, and I felt that it would be a real benefit to the community to rewrite it. I created a repository for an MIT licensed project and threw up the easy part; a thread-local caching mechanism similar to the locmem backend, but cleared after every request.

After I gave Jeremy the URL to the hg repository, he banged out a nice framework for how the queryset caching mechanism would work in 1.2 (the patch point had changed), and declared it "pretty much done." It was fantastic; ~100 lines or so of clean, concise python code. And of course, it didn't work at all. Nearly 80 commits, 2000 lines of tests, fixtures, documentation, edge case handling, 1.1 support, and actual implementation later (his version handled generation keys but completely left out queryset keys based on the generations), we had something we were confident would finally work as advertised. Johnny's documentation explains what the project is, and what it does, but I want to reflect a bit more on the process of its development.

This is the first project I've really worked on with a Distributed VCS that I've had more than one developer working on. Both of us are really used to a centralized repos, and I don't think we quite embraced the "every change is a branch" philosophy. Other than that, working in a distributed nature wasn't really that helpful, because most of our machines don't have unfettered access to the internet and so it can be difficult to share revisions between us without hitting the central repository. This is, perhaps, one of the true draws of sites like bitbucket and github. I've found that dvcs is great when developers are working in isolation (like, forking beaker to write a new auth backend) but I prefer a centralized development repos when working on the same issues. Mercurial does either quite well.

This is also the first time I've used the Sphinx Documentation system that python.org et al use. I've used stuff like JavaDoc and Doxygen in the past, and frankly the documentation it produces is almost always worthless, even for a library. It doesn't highlight the important pieces properly, it doesn't provide room for exposition, etc. I've shied away from Sphinx in the past for a few reasons; the startup cost seemed a bit steep (it really isn't), and there was a conceptual confusion in the steps between writing the documentation and getting the results you desire. Finally, I didn't want to really write lots of extra documentation that would live outside my code.

My feelings on writing "extra" documentation have substantially changed, however, and Sphinx offers a best-of-both-worlds hybrid, with commands that will automatically pull documentation strings or automatically document modules, functions, classes, etc, but more or less leave the entire form and function of the documentation up to the actual ReST documents themselves. When I need to explain something, I explain it. When I want to include docstrings, or highlight a specific function as a method of doing some higher level action within the context of my app/library, I can do that with ease. It was a delight.

This was also the first project that I've used a significant amt of TDD for. The implementation details for Django 1.1 and Django 1.2 were radically different (1.2 had to support multiple databases, itself a major change), but in the end we wanted the software to more or less operation in the same transparent manner. Having a thorough testing suite caught tons of subtle bugs in behavior (including lots of regressions) that we might never have found otherwise. There were a few database-specific behavioral bugs that would have passed under one db but not another, which could very well have gone unnoticed were it not so easy to set up different environments and test.

I still feel like TDD works much better when it's easy to define correctness, but perhaps it's the case that any software development is much better when that's the case. The test suite for johnny-cache is extensive, and at least 1/3rd of the tests were written before the code to pass it was. A culture of healthy fear grew up around the tests; if there were acknowledged holes in the testing suite, the code that would supposedly functioned to pass those gaps was presumed to be buggy, and this defensive posture helped us find a few bugs.

Finally, the difference between "hacking" and "shipping" is pretty apparent when you do the legwork to get proper documentation written, set up the right distribution channels, and you have a desire for that code to be used by people, and in some way be a representation of your ideas and abilities. There are still lots of options for future development on Johnny; we've only got the basic operation of the cache up, but there's tons of things you might want to do once you start to understand how your app is utilizing Johnny.

Profiling Generalizations

posted February 25th, 2010 @ 22:00:05

- tags: development , python

- comments: 0

A friend and colleague Jeremy Self, who I've been working with the past few weeks on a project that will probably be released shortly, told me the interesting results of some profiling he was doing on a lazy-evaluated data structure:

read the rest of "Profiling Generalizations"

Subclassing Django's TestCase

posted February 15th, 2010 @ 13:51:28

- tags: python , development

- comments: 0

As mentioned in yet-unresolved #7835, I've been writing a sort of meta-application recently that doesn't provide any models of its own but absolutely requires models (and data) of various types to test against. I ended up adopting julien's technique, which puts the test-only application inside the tests module, and then overrides setUp and tearDown to monkey patch your settings to include the test-only application and then run a syncdb command. This approach started out working very well, but I soon ran into a fairly major problem: fixtures failed to load.

read the rest of "Subclassing Django's TestCase"

Cheesy color console output

posted January 20th, 2010 @ 00:09:34

- tags: python

- comments: 0

Since Audacious crashes in Ubuntu 9.10 when playing NSF files, I was compelled to compile a newer version for personal use, and I noticed that their build system is tricked out with very little extraneous geek-cred output and some colored output. I have been writing lots of little one-off bulk job scripts lately, and when I realized the answer to the question "Why don't I do this?" was "I don't know how", I decided to do some google jump roping and figure it out.

read the rest of "Cheesy color console output"

On-Suspend scripts in Ubuntu 9.10

posted January 7th, 2010 @ 19:25:47

- tags: general tech

- comments: 0

My thinkpad works pretty well in Ubuntu 9.10, but one thing thing that is not fixed is that my wireless ceases to function when I resume. If you remove and reinsert the kernel module for it, it will work again. Back in the day, you'd put this in /etc/apm/resume.d/, or more recent, in /etc/acpi/resume.d/, but as with other seemingly fine technologies Ubuntu has recently deprecated ACPI. The new location for the script is /etc/pm/sleep.d/.

read the rest of "On-Suspend scripts in Ubuntu 9.10"

Third World

posted December 15th, 2009 @ 09:14:53

- tags: life

- comments: 0

Every time I go to Asia, it gets harder and harder to come back. Landing at Newark "Liberty" International Airport via Hong Kong's airport feels like travelling to a third world backwater.

read the rest of "Third World"

iPhone wifi timeouts

posted November 21st, 2009 @ 14:33:05

- tags: general tech

- comments: 0

I finally got sick and tired of my iPhone disconnecting from wifi when locking, not actively in use, or feeling unloved, and did some research this weekend. There seems to be two issues that are not easily overcome:

read the rest of "iPhone wifi timeouts"

The joys of racing

posted November 3rd, 2009 @ 02:02:05

- tags: life , games

- comments: 0

I'm not quite sure how to initiate this discussion. I know that it was initiated for me by the recent release of the [Forza 3] racing sim, but it's something that has been on my mind for years. As someone who used to prefer the cheap thrills and pure raw adrenaline of arcade racers, I know what it's like to bemoan a game filled with cars that "won't turn."

read the rest of "The joys of racing"