Friday, December 4, 2009

TopCoding

Since Cross Country Coaches Moss and Brandon had us running sprints, I've realized the importance of short-distance running for long-distance training. So I suppose I've always suspected that TopCoder could be worthwhile, even for those of us who don't care about who can implement bubble sort fastest. Still, it lingered at the bottom of my priority queue. I don't strongly enjoy competition so it wouldn't be first-order fun for me, and I never had trouble thinking of more interesting intellectual challenges than speed-writing fizzbuzz.

But, Niniane told me I should use TopCoder, so I updated that priority. I'm not so haughty as to disregard Coach's advice.

Well, it turns out I was wrong about TopCoder. It's not only fizzbuzz: the problems with high point values do require some care with algorithm and datastructure selection. It's also not so much of a race as I'd thought. You get no points for an incorrect solution, and even very late answers (like with 12 hours of padding on a 75 minute problem) still get 30% credit.

I think coding time is just about the least important thing in programming. I'm well aware of the pitfalls of perfectionism and the whole nuanced worse is better concept. But I've very rarely been in a real-world programming situation in which the difference between 15 minutes and 1 hour made any practical difference, aside from labor cost. On the other hand, I've frequently seen that investing a little time in a thoughtful algorithm, test coverage, and an elegant implementation can pay off hugely. Naturally, there are many different assignments to the decision-theoretic parameters here; it's not surprising that TopCoder's scores are so simplistic (and well-suited to their business model).

I'm TopCoding in Java. Of course, I found myself Greenspunning almost immediately. The TopCoder rules explicitly allow code reuse, while penalizing dead code. Still, it feels like cheating to keep a warm Emacs buffer full of map/reduce/foldr/nth/setdefault around. A newbie will have to write and test that code, losing points against the veteran TopCoder for a reason that has nothing to do with skill.

Having attacked the TopCoder measure of programmer performance, I'm ready to praise it. First, the emphasis on speed accords with interview conditions. The high pressure and short duration of an interview is not ideal for sampling anyone's performance, but it's better than any of the alternatives I know. Second, just as in athletic cross-training, I think there's value in practicing against TopCoder's measure. It works against my natural perfectionism and, because it is so different from my daily routine, it helps me to strengthen in areas that may be seldom used and hence relatively inefficient.

TopCoder's FAQs are spartan, and the UI is confusing for me. There lots of sequential number ranges and acronyms, and after some fruitless web browsing/searching, I resorted to trial and error to infer their meanings. There aren't many context menus and the key bindings don't follow my OS standards (⌘-C/⌘-X/⌘-V) or the emacs defaults. Try ctrl-C/ctrl-X/ctrl-V for copy/cut/paste. The code editor is terrible, but for interview prep I'm using it in preference to Emacs; I won't have a kill ring or isearch-forward available at the whiteboard, either.

Nobody is verbally critiquing you as you tap out your TopCoder solutions, so they provide a compiler and a very rudimentary test framework (you can feed your program arbitrary program inputs and see the result along with whatever it wrote to stdout and stderr). If interview prep is your goal, it's probably still a good idea to do some whiteboarding and paper exercises, but I think the TopCoder environment is impoverished enough that you'll have a heightened awareness of your dependence on compiler and test feedback.

TopCoder is organized as a shallow, fixed-depth hierarchy of chat rooms. The leaf nodes are each associated with three problems of varying difficulty and point value. Problem statements give the general form of input and output along with several example i/o pairs and a sentence or two about each. The lowest-valued problem statements give an algorithm in prose. The statements of higher-valued problems give fewer hints. Also, higher-valued problems tend to require more work (compared to a hypothetical lower-valued problem whose hints have been erased).

There are competition rooms and practice rooms; different rules governing your actions apply depending on the context. Specifically, in competitions, there are discrete, synchronized phases for coding, code review, and a final non-interactive test round; these activities are concurrent for practice rooms. The practice rooms are created in anticipation of competitions and disappear later. It's too bad that the number of rooms and hence the number and variety of available practice problems swings so wildly (from hundreds of problems to 3). There is an archive where you can read the problem statements and look at final stats, but solutions (including your own) and the test oracle are unavailable.

Unsurprisingly, TopCoder problems are less overtly mathematical than those of Project Euler. Whereas my main annoyance with Project Euler was tedium associated with thematic repetition (e.g., primality testing), my main annoyance with TopCoder is tedium associated with parsing the regular language of each problem's input. Project Euler emphasizes mathematical insight and your own preferred blend of elegance/efficiency/creativity/..., while TopCoder emphasizes implementation speed. To be clear, "annoyance" is far from a primary adjective I'd use for either site; I recommend them both for pass-times or as interview preparation.

The "summary" button in each chat room shows you the list of everyone's who's been there. For each person, you can see overall experience level, rank w/r/t the room, and status/earned points for each problem. By double-clicking on a score, you can read that person's solution and optionally challenge it by providing an input for which you think it fails; TopCoder will execute the solution against your input and either cancel the other person's points and reward you, or punish you for your false allegation by subtracting points from your score.

My informal survey of submitted solutions shows that most coders leave sloppy mistakes in their final submissions, reflecting the time pressure and lack of incentive for craftsmanship. There are almost no code comments, but the code usually speaks for itself. I usually remove test and debugging code. I don't revise for clarity or efficiency. It is striking to me that apparently I value conceptual focus more highly than most other TopCoders: my solutions tend to explicitly compose well-known algorithms using the modularity features of my chosen language. Being uninterested in competition, I don't plan to destructure my own code, but I wonder if this is a competitive advantage for them? I like to think that under real-world conditions, my approach yields better reusability and code clarity, but I don't really have empirical evidence for that.

If I have inspired you to try TopCoder, beware that opening any problem statement starts its timer, and closing the problem does not pause or stop the timer. I don't exactly regret having sampled problems from various rooms and point levels before starting, but I also don't care about how I rank relative to anyone else.