Monday, June 29, 2009

Week 5: Learning About the Chainsaw

So last Monday as I was doing some testing for proper handling of POSTs requests (or, more technically, non-idempotent requests) in a pipeline, Sebastian Riedel told me over IRC that really the highest priority now should be proper handling of unexpected 1xx responses in the client code. So I started to look at that, and it seemed a bit messy. Most HTTP implementations, including Mojo's, use a state machine premised on the model that you write out a request, and then you get a response. But with 1xx informational responses, the spec asks clients for a lot. Clients should be able to handle an arbitrary number of informational responses before the real response comes along. So now each request may generate multiple responses, though only one of them is final. Worse, those responses may come at any time, even before you're done sending the request.

I started implementing this in small steps. I first looked at the existing code for 100-Continue responses, that is the special case where the client explicitely asks to receive an informational response before sending the main body of a request. Understanding this would likely allow me to implement the more general case in the least intrusive way possible with respect to the codebase's existing architecture. In the process, I found some small issues with that code, made it more resilient, committed my changes to a new branch and pushed that out to Github. I then moved to the easier case of receiving an unexpected informational response: when it doesn't interrupt the writing out of the request. That wasn't too hard, and again after committing, I pushed out those changes to Github. There followed a flash of inspiration on how to handle the case when the writing of a request is interrupted: where I expected to have to write a delicate piece of code enacting exceptions to the normal flow of states, I wound up with a single line change! I was pretty happy with that. Add a POD update, a typo fix, some test cases and a last code change as I realized thanks to the tests that the code wasn't handling multiple subsequent informational responses properly.

So by the time I sent a pull request, I was about six commits out on my branch. Sebastian didn't like that. Multiple commits make it harder for him to evaluate the global diff with his master branch. I used Git to generate the overall diff and nopasted it. Not enough: a merge would still import my commit history and unduly "pollute" the commit history of the master branch. So I tried to merge my changes out to a new branch that would only be one commit away from master. No go, since a merge takes the commit history with it.

So after some research, the answer was to learn how to squash commits with git-rebase. I must say, doing this felt a little like learning to sculpt with a chainsaw - it feels a little risky. Both because you "lose" your commit history and because you're messing with the revision control software in a way that seems a little dangerous. But Git is apparently fully up to the task. And as long as no one is already dependent on your branch, you can even push your changes out to Github by applying a little --force. Github will appropriately update it's revision history network graph. Impressive! Maybe I'll eventually learn to trust the chainsaw...

Anyhow, that solved the problem, and Sebastian says from now on he'll insist on squashed commits before pull requests. And since this post is already way too long, I'll just add that my other accomplishment for the week was adding a "safe post" option to change how POST requests are handled in a Pipeline object. I also had to rebase that, and it still felt pretty weird, but Git seems pretty solid...

No comments:

Post a Comment