Monday, June 29, 2009

Week 5: Learning About the Chainsaw

So last Monday as I was doing some testing for proper handling of POSTs requests (or, more technically, non-idempotent requests) in a pipeline, Sebastian Riedel told me over IRC that really the highest priority now should be proper handling of unexpected 1xx responses in the client code. So I started to look at that, and it seemed a bit messy. Most HTTP implementations, including Mojo's, use a state machine premised on the model that you write out a request, and then you get a response. But with 1xx informational responses, the spec asks clients for a lot. Clients should be able to handle an arbitrary number of informational responses before the real response comes along. So now each request may generate multiple responses, though only one of them is final. Worse, those responses may come at any time, even before you're done sending the request.

I started implementing this in small steps. I first looked at the existing code for 100-Continue responses, that is the special case where the client explicitely asks to receive an informational response before sending the main body of a request. Understanding this would likely allow me to implement the more general case in the least intrusive way possible with respect to the codebase's existing architecture. In the process, I found some small issues with that code, made it more resilient, committed my changes to a new branch and pushed that out to Github. I then moved to the easier case of receiving an unexpected informational response: when it doesn't interrupt the writing out of the request. That wasn't too hard, and again after committing, I pushed out those changes to Github. There followed a flash of inspiration on how to handle the case when the writing of a request is interrupted: where I expected to have to write a delicate piece of code enacting exceptions to the normal flow of states, I wound up with a single line change! I was pretty happy with that. Add a POD update, a typo fix, some test cases and a last code change as I realized thanks to the tests that the code wasn't handling multiple subsequent informational responses properly.

So by the time I sent a pull request, I was about six commits out on my branch. Sebastian didn't like that. Multiple commits make it harder for him to evaluate the global diff with his master branch. I used Git to generate the overall diff and nopasted it. Not enough: a merge would still import my commit history and unduly "pollute" the commit history of the master branch. So I tried to merge my changes out to a new branch that would only be one commit away from master. No go, since a merge takes the commit history with it.

So after some research, the answer was to learn how to squash commits with git-rebase. I must say, doing this felt a little like learning to sculpt with a chainsaw - it feels a little risky. Both because you "lose" your commit history and because you're messing with the revision control software in a way that seems a little dangerous. But Git is apparently fully up to the task. And as long as no one is already dependent on your branch, you can even push your changes out to Github by applying a little --force. Github will appropriately update it's revision history network graph. Impressive! Maybe I'll eventually learn to trust the chainsaw...

Anyhow, that solved the problem, and Sebastian says from now on he'll insist on squashed commits before pull requests. And since this post is already way too long, I'll just add that my other accomplishment for the week was adding a "safe post" option to change how POST requests are handled in a Pipeline object. I also had to rebase that, and it still felt pretty weird, but Git seems pretty solid...

Monday, June 22, 2009

Week 4: About last week

I was hoping to be able to update my blog this weekend, but alas, life had other plans... Anyway, last week my main achievement was to teach the client side of Mojo about (pipelined) HEAD requests. The changes required were fairly significant. This is explained by the fact that in the Mojo architecture, pipelined requests are handled by a Pipeline object; in turn a Pipeline has a bunch of Transactions, each transaction has a Request and a Response, which are in turn Messages, which use Content filters. And all of these objects are Stateful. Changes had to trickle all the way down, and so I also had to make sure I undertood things all the way down. I could see three ways to implement my change, and after choosing one, implementing it and sending a pull request, I was pretty happy that Sebastian agreed that the method I picked made the most sense. Integrating into an open source project is not just about writing code, it is also about understanding the sense of design of the main architect(s) - at least if you're interested in getting your changes merged into the mainline project. Perl philosophy may have TIMTOWTDI at its heart, but some ways to do it are more equal than others, and software developpers tend to be opinionated...

Now I'm thinking more and more about MojoX::UserAgent. But one thing I am worried about is that I am diverging somewhat from my intial GSoC plan. Not in the overall direction and the goal to write a UserAgent, but in some of the intermediate deliverables. This is something I will need to discuss with my mentor to see how it should be handled...

Sunday, June 14, 2009

Week 3: Over To The Client-Side

Week 3 has been a bit slow, I must say, as obligations outside of GSoC intruded upon my coding time. The level of concentration required makes it difficult to debug when one doesn't get long uninterrupted stretches of time to do it. This week, I first spent a bit of time chasing a problem that has the daemon tests hanging on Ubuntu - and probably other Debian-derived distros. I traced things down to the way dash does signal handling, but the problem is "exotic enough" - a non-default test on a particular platform (see here for something similar) - that I didn't commit a fix, and simply moved on to client-side testing.

I suspected that the client code in Mojo might exhibit some issues simillar to those I found on the server-side when dealing with pipelined requests. Testing the client-side requires a bit more setup than server-side. When testing the server, one can simply use telnet as a low-level client and have full control over how requests are sent to the server. Now I had to set-up a simple "fakeserver" to server responses precisely how I wanted them to be sent in order to try to trigger bugs in the client. I sound found a problem, and I also discovered that the test infrastructure had no clean way to build a test case to demonstrate the issue, so I have started building that. And that's where I'm at: down a couple of commits on a branch, with a few more to come before I think the changes should be merged back into mainline Mojo...

Friday, June 5, 2009

Week 2: Ghosts, Deeper Bugs, First Feature...

Another week gone by. This week I learned two important, though obvious, lessons. The first lesson is: if a test case fails, do not touch the test case. The second lesson is: if a test case fails, do not touch the test case. At the beginning of the week I thought there were still problems with requests using chunked transfer-encoding with trailing headers. I was able to create a failing test case and proceeded to launch another (lengthy!) bug-hunt expedition. But by the time I found a fix, I'd modified the test case ever so slightly - where I should have created other, new cases instead. When I reported my "fix" to Sebastian Riedel, the first thing he did was try my test case, and promptly tell me that it was passing on his latest master! He prefaced this news by saying "You're not going to like this..." Indeed. Had I spent hours chasing and fixing a non-existent bug? I spent some time thinking I'd gone mad, before realizing what I'd thought was a trivial change in the test-case actually prevented the bug from being triggered. Once I realized this, Sebastian found a fix that was much more elegant that what I'd proposed. Tough start to the week, but hey, that's how you learn, right? I hope so...

I was able to redeem myself by tracking down and fixing a very difficult transient bug having to do with requests expecting a 100-Continue followed by other pipelined requests. Isolating the problem took two full days, and Sebastian commented over IRC that it doesn't get much more complicated than this, protocol-wise. Anyway, so my tally for the week is:
  • always reach trailing_headers state for chunked messages
    (not my fix, but my catch);
  • problem with duplicate start line on 100-continued request with a
    following pipeline;
  • 5-second delay problem when pipelining two requests, the second
    of which includes an Expect 100-Continue;
  • allow applications to override the default continue handler.
The last of these is my first feature(tte). It has allowed me to start testing what happens client-side when a server doesn't reply with a 100 Continue to a client that requests one. This has in turn led me to start thinking about how I will implement Mojo::UserAgent... I've also discovered some issues with the deamon tests on my version of Ubuntu which I've not been able to hunt down just yet. All in all, a more intense week than the first...