Saturday, July 11, 2009

Midterm Reflexions: Just About On Track

They say time flies when you're having fun... I would tend to agree! I'm about to fill out the Google midterm survey, and thought I would post some midterm reflexions here. First I guess I should mention recent accomplishments, since I didn't post last weekend.

Recent Events
First up, I ran into a blog post by Mark Stosberg that discussed some issues with cookie handling in Mojo. I contacted Mark via email, and he was kind enough to provide me with further details. Testing revealed the issue was still present. I'm not nearly as familiar with the various cookie specs, so some reading was required, but in the end I determined that the problem wasn't specific to cookies, but rather to HTTP headers in general. Of course, nowadays, one hopes that all Web developers know enough to scrub user-provided data before sticking it into things like cookies, to avoid unpleasantness. However, the (simple) fix I committed puts some armor in Mojo's boots, so to speak, and will thus make it harder for Web developers to shoot themselves in the foot.

Next up was a dark corner of the spec that threw me for a loop... It again has to do with 100-continue. I should note that I think Mojo's already fairly ahead of the game here. Of the browsers, my reading leads me to believe only Opera supports 100-continue as of yet. And, for example, Mojo's client code already implements behavior coming in JDK7. That being said, the spec allows for a problematic situation for servers. Clients are told not to wait forever to be told to continue by the server. Therefore, if a server tells a client not to continue with a request, and receives more data on the connection, it could be a new request, or it could be that the client didn't wait long enough and started sending the body of the previous request before it received the message from the server not to do this. How do you tell which is which? In most cases, it should be obvious, but unfortunately it's easy to think of a case where it's impossible for the server to make the distinction. After some reflection and some discussion on IRC, I decided to implement the "most prudent" behavior - the server should close the connection after sending the response to a declined request for a 100-continue.

But implementing this wasn't trivial, and it took me a while to figure out the most elegant way to implement the desired behavior. Which leads me into some more general observations.

On debugging & patching
One thing that's particular about this project so far is that it's been mostly patches & fixes to the Mojo codebase. I've spent a lot more time doing this than I expected at the start. I don't mind - in fact, I quite enjoy this kind of work. But that means the linecount metric is really low. It's much better to spend a bit of extra time and implement a fix in a way that really agrees with the overall architecture of the software, and often that means you can really reduce the linecount of the fix. And if you didn't write the software in the first place, you need to make extra-sure you really understand what's going on. More time, less code, yet better? I think so, because I feel an elegant, small fix is less likely to introduce new problems than a more complex, bolted-on one. The goal is good (software) architecture.

On The Tools
You need a good base to start with though - it's difficult to renovate a building that is on the verge of falling down because as soon as you start to work on something, everything falls apart. If I may toot the team horn, I have found that Mojo is well-engineered software. It was theoretically possible that I might find a problem or an issue that would require significant refactoring. That has simply not been the case so far.

I'm fairly impressed with Git and Github. I'm still a bit nervous when using rebase, but I can see how it makes things much easier for project maintainers when contributors ask them to integrate their code. However, rebasing makes it harder for me to track my own work using the network graph, as it squashes the revision history and may even change the reference point at which a branch was created. So neither you nor I can easily see how long I worked on a branch or how many commits went into it.

Perl? Well, Perl5 is Perl5 and it's been like meeting with an old friend you haven't seen in a long time - mostly a fun experience. You know there are a few character flaws, but you've learned to live with them and can re-adjust when they pop-up. I must say that I do wish Perl6 was here by now. Rakudo is moving fast so maybe/hopefully by the end of the year. There seems to be some controversy in the community as to Perl5's release schedule and support policy... It's probably not appropriate at this point for me to take sides on that.

On The Community
The people I've been interacting with on a daily basis - both from Mojo and The Perl Foundation, have been nothing but helpful and present. Sebastian Riedel, in particular, has been fantastic, answering many questions, discussing spec, perl and design issues, and constructively criticizing my submitted patches.

On My Progress
Looking at my proposal's schedule, I'd say I'm just about on track. There is one significant discrepancy. Since I've spent a lot more time hacking Mojo than I thought I would, I've actually skipped one of my deliverables - the "blackbox test suite". My mentor has told me he doesn't feel that that's a big deal and Sebastian also said I've fixed many more things in Mojo than he thought I would. So if you look at the beginning of my list of deliverable:
  1. Whitebox tests using Test::More [3] and integrating into Mojo’s current testing framework. Most tests will be concentrated on the Mojo::Message class and focus on HTTP/1.1 [4] content parsing, with emphasis on edge cases.
  2. A blackbox test suite to run against Mojo’s built-in server using an appropriate HTTP/1.1 client library (most likely libcurl [5]). Tests to include some cases of adversarial stance (e.g. deliberately malformed requests).
  3. If necessary, patches to Mojo that enable it to pass the test suites.
I would say that I've delivered #1, skipped #2, because #3 has been much more important than expected. Also, many of the things I fixed were better suited to either whitebox testing or testing using raw telnet and/or a specially designed "fake server" to generate on purpose events that would otherwise be statistically rare occurences on a real network with a real server.

So there you go. Next step: MojoX::UserAgent!

Friday, July 10, 2009

State Transitions in Mojo

Next up will be a "midterm" post, but I just want to post this first, because... well... because it was more fun on a Friday night to do this analysis. (Yes, I could also have gone out. But you know, I did that last night! Yes, really!) As you'll see, fun is obviously in the eye of the beholder, because this is unlikely to interest anyone but a very small handful of Mojo developers.

So a lot of the classes in Mojo are stateful. They inherit from the Stateful class. However, that class is pretty barebones, and "states" are fundamentally just a string property on an object. So there's no single place you can go to see the list of allowable states, and no documentation (let alone verification) of allowed state transitions. The first step in addressing this is to collect some data on the various stateful subclasses and see how they actually behave. So I instrumented the Stateful class and had it print out all state transitions. I then ran the Mojo test suite (including some non-default tests), which generated a little over 2100 state transitions. I fed these results to Graph::Easy (which I discovered thanks to a blog post on MojoX::Routes::AsGraph by Marcus Ramberg) in order to generate state diagrams. We may therefore consider this post as a kind of "documenting-on-the-run"...

Here are the results, in increasing order of complexity.

First up, Mojo::Stateful:

Simple enough. The number on the arrow indicates the number of times that particular state transition occurred during the testing. (That gives an indication of how much "exercise" the current test suite gives each state transition.) Next comes Mojo::Headers:

is still pretty simple:

And Mojo::Message::Request is barely more complicated:

At the same level of complexity, Mojo::Filter::Chunked:

One more step up are Mojo::Content:

and Mojo::Content::Multipart:

Finally, here comes the punchline: the two classes I've been working most heavily on/with (click for full size).


And Mojo::Pipeline:

There you go! Simple as pie, right?