## Sunday, August 2, 2009

### vogon fodder

In this post, I show-and-tell about automated paper-pushing.

I spent considerable effort at work last week building something that is minimally useful. I am preoccupied by the conflict between the demands of professionalism and the idiotic/cynical mockery of professionalism that is usual (though not universal) in large organizations. But I've got to eat and I should focus on problems I can realistically solve. For now, let's accept the demands of management without question and focus on how to satisfy the demands at minimum cost to human life. In the words of Steve Jobs:

Well, let's say you can shave 10 seconds off of the boot time. Multiply that by five million users and thats 50 million seconds, every single day. Over a year, that's probably dozens of lifetimes. So if you make it boot ten seconds faster, you've saved a dozen lives. That's really worth it, don't you think?

Your manager emails you an MSWord document and asks you fill it out once per software deployment. It should list every affected "object" and indicate when if ever "User" has been notified, when and who if ever "BAM" has approved, and it should reference the other paperwork associated with the project. What do you do?

Clearly, the right answer involves CI and LaTeX. In my case: svn, ccnet, Bugzilla, python, xslt, ruby, LaTeX, perl, and pdf.

We've been using CI with SVN to do deployments since 2004. There's a branch for each deployment target, and a CI project associated with each of those branches. The CI builds usually correspond to a single commit, but we don't insist on that; when commits come fast enough we go for speed over precision in who's "on the hook" for breaking changes. So the first problem is to find the union of affected "objects" over the set of svn commits in the deployment.

The term "object" is taken from the argot of the RPGers, who are the oldest and largest group of software people at our company. For them, "object" often refers to a printed tabular summary of data but more generally can refer to a program for any purpose. The usual custom outside the RPG tribe is to list as "objects" ui elements, but sometimes instead OO classes or source files are listed. It's not worth being too careful about this, because the real goal is pro forma compliance with a vague demand for documentation. Users basically want release notes, and engineers already get precise, accurate, and efficient answers to technical questions from svn and Bugzilla. The form really just needs to exist and, at a glance, to appear normal.

Our CI system CruiseControl.NET has a feature called modification writer that renders metadata about a build to XML. This provides a list of added, deleted, and textually-modified files for each commit that's new since the last build:

<?xml version="1.0" encoding="utf-8"?><ArrayOfModification xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><Modification>  <Type>Modified</Type>  <FileName>ccnet-self.config</FileName>  <FolderName>/ccnet</FolderName>  <ModifiedTime>2009-07-24T10:23:08.162158-05:00</ModifiedTime>  <UserName>sean.foy</UserName>  <ChangeNumber>4413</ChangeNumber>  <Version />  <Comment>whitespace[Bug 50,51,3783]</Comment>  <IssueUrl>https://bugzilla.thcg.net/buglist.cgi?bug_id=50,51,3783</IssueUrl></Modification></ArrayOfModification>

We integrated SVN and Bugzilla in 2004, and so since that time we've been able to correlate commits with bugs. This gives us informal commentary about the motivations and the design considerations for each commit, plus metadata including who was involved in the process, how much time each of us spent, whether we paired or got code review, whether dbas got involved, who approved deployment, etc. Well, at least in principle. Actually, only one or two projects have ever really taken advantage of these capabilities. But the point is that this metadata fits the purported goal of our paperwork, and it's useful and easily accessible, and it exists for at least one project. It's the best we can do, and it's pretty good. So we've got to use it.

Python is a fairly pleasant language to work in and nobody's terrified by its syntax. It also has great libraries for screen-scraping. Accordingly, we used a python program to augment ccnet's modifications xml. We read the ccnet xml and use the bugtraq standard to write back the set of relevant bugs:

<ArrayOfModification><Modification>  <Type>Modified</Type>  <FileName>ccnet-self.config</FileName>  <FolderName>/ccnet</FolderName>  <ModifiedTime>2009-07-24T10:23:08.162158-05:00</ModifiedTime>  <UserName>sean.foy</UserName>  <ChangeNumber>4413</ChangeNumber>  <Version />  <Comment>whitespace[Bug 50,51,3783]</Comment>  <IssueUrl>https://bugzilla.thcg.net/buglist.cgi?bug_id=50,51,3783</IssueUrl>  <database />  <flags>    <flag id="1882" name="QCApproved" setter="michael.jorgensen@covidien.com" status="+" />  </flags></Modification></ArrayOfModification>

Now we have all the answers we need to fill out our worksheet in XML. We just need to produce MSWord output (good luck). Well, MSWord probably isn't really necessary. That's what's expected, but also we should keep in mind that the goal is pro forma compliance. This document's real use is to ward off auditors, so it just needs to exist and not look unusual. So, we want to typeset XML? Obviously we should consider DocBook, XSL-FO, and LaTeX. I looked into XSL-FO c. 2001 and DocBook c. 2002, but I know LaTeX best. I wanted to spend less than 4 hours on this little project, so I was satisfied to use LaTeX.

For processing XML, you could do worse than XSLT. It can be particularly nice as a declarative template specification: "make my input look like this". So, I used xslt to process the bz-augmented ccnet modifications xml into LaTeX:


LaTeX is based on Don Knuth's TeX typesetting system. It's serious business: Turing-complete and designed specifically for the needs of the world's most demanding typographical experts.

For a while I thought about merging the bug and svn data here in LaTeX. My TeX-hacking skills are very weak now so I had in mind the python package. As I escaped for XSL the escape for LaTeX of the escape for Python of the LaTeX output I wanted, I thought I was probably making a mistake. LaTeX is great for typesetting, and capable of general computation, but it's awkward to express the heavy-lifting there (even in an embedded language). The python package is probably better used tactically, for simple comprehensions and the like -- not for elaborating significantly on the structure of the LaTeX markup.

We must compile LaTeX to obtain our output (normally PostScript or PDF). I use a perl script, latexmk, to manage the multi-phase compilation process and keep track of cleaning up the intermediate products.

So we want to combine (((xml and Buzilla) by python to xml) and xsl) by xslt to LaTeX by latexmk to PDF). We need to process our inputs and intermediate products in topological order; this is a build process. I've used make before, but it's a bit crufty. I prefer NAnt, whose XML syntax is more explicit and discoverable. From what I've seen of rake, I like it better than NAnt. NAnt's a priori boundary between its implementation and your build specification is confining. Rake draws no distinction between itself and your build specification.

# -*- mode: ruby -*-# Render modifications as beautiful# (or hideous!) PDF## sample invocation:# rake latexmk=vendor/latexmk/latexmk.pl modifications.pdfrequire 'rake/clean'require 'rexml/document'require 'rexml/xpath'transform = "#{ENV['transform'] || 'vendor/saxonb9-1-0-7n/bin/Transform.exe'}"latexmk = "#{ENV['latexmk'] || 'vendor/latexmk/latexmk.pl'}"modifications_xml = "#{ENV['modifications_xml'] || '../Artifacts/modifications.xml'}"task :default => 'modifications.pdf'task :clean => ['clobberhelper'] doendtask :clobber => ['clobberhelper'] doenddef contains_clobber(ts)ts.each { |i|  return true if (i == 'clobber') || contains_clobber(Rake::Task[i].prerequisites)}return falseendtask :clobberhelper do |t|if contains_clobber(Rake::application().top_level())  sh latexmk, '-C'else  sh latexmk, '-c'endendfile 'modifications.tex' => [modifications_xml, 'modifications.xsl'] do |t|xsl_params = {}begin  f = File.new('modifications.xsl')  doc = REXML::Document.new(f)  REXML::XPath.each(doc.root, 'xsl:param', {'xsl' => 'http://www.w3.org/1999/XSL/Transform'}) do |elt|    xsl_params[elt.attributes['name']] = ENV[elt.attributes['name']] || elt.text  endensure  f.close unless f.nil?endxsl_args = xsl_params.inject([]){ |accum, (key, value)|  accum << "#{key}=#{value}"}sh transform, "-s:#{modifications_xml}", '-xsl:modifications.xsl', '-o:modifications.tex', *xsl_argsendCLEAN.include('modifications.tex')file 'modifications.pdf' => ['modifications.tex'] do |t|sh latexmk, '-pdf', 'modifications.tex'end

If you don't know about rake, one point of confusion is that rake clean conventionally means "remove intermediates" while rake clobber means "remove intermediate and final outputs".

With my use of latexmk, I encountered a technical problem in the implementation of the standard clean and clobber rake tasks: latexmk needs the .tex file to decide what to clean up, but rake wants to clean (removing the .tex) before it clobbers. I needed to let latexmk do its job before clean, and latexmk needs to know whether to clean up the PDF or just the intermediates. My solution was to add a dependency clobberhelper for the clean task to call latexmk's cleanup routine. clobberhelper searches rake's dependency graph for clobber in order to decide how to call latexmk.

Finally, we have a corporate commitment to .NET and so most of our projects benefit from NAnt's understanding of .NET. We want an easy way to use this motley assortment of tools with our existing NAnt build files:

<?xml version="1.0" encoding="utf-8"?><project name="sdim" default="move-worksheet"><property name="artifacts_dir" value="." overwrite="false" /><!-- nant for Windows doesn't use the cygwin path so     you need to chdir to %cygwin_home%\bin and then     run %cygwin_home%\path\to\python or     install Windows python and do e.g.,     C:\Python31\python.exe --><property name="python" value="C:\Python31\python.exe" overwrite="false" /><property name="rake" value="C:\Ruby\bin\rake.bat" overwrite="false" /><property name="modifications_xml" value="modifications.xml" overwrite="false" /><target name="move-worksheet">  <exec program="${python}"> <arg line="bzaugment.py${path::combine(artifacts_dir, modifications_xml)} -o ${path::combine(artifacts_dir, 'modifications.augmented.xml')}" /> </exec> <exec program="${rake}">    <arg line="modifications_xml=${path::combine(artifacts_dir, 'modifications.augmented.xml')}" /> </exec></target><target name="clean"> <exec program="${rake}">    <arg line="clobber" />  </exec>  <delete file="\${path::combine(artifacts_dir, 'modifications.augmented.xml')}" /></target></project>

I still have some reservations about the morality of this. On the one hand, no human should waste even 30 minutes of precious life preparing ambiguous, incomplete, inaccessible and uninsightful summaries of data that's already available, accurate, complete, and easily-accessible. On the other hand, you can lead a manager to water but you can't make him drink. This solution seems to balance those two forces, but by reducing the cost of useless redundancy, am I encouraging further bureaucratic malignancy?

Viola:

Anyway, I hope you enjoyed the tour of this motley assemblage of languages and tools.

Thanks to Josh Buedel for comments and suggestions during the preparation of this post.