Sunday, August 2, 2009

Paging for Mutable Data

Many web applications provide segmented result sets in response to queries. This approach can conserve server-side effort when users decide to reformulate a query or use partial results rather than actually attending to all the results. Even if a user will eventually demand the complete set of results, paging can smooth the demand/time graph, lowering the peaks.

It's easy to find library support for paging, but it's rare to find good support for paging mutable data. Typically the library provides a set of results given an integer offset and page size. Another way to characterize this functionality is to say that the library does integer arithmetic. I don't object to procedural abstraction, but it seems to me that this is not only simple but simplistic.

We can always enumerate our datasets, but I think that users don't normally care about specific offsets. I for one normally care about the whole set of results that satisfy my query. Or at least I think I do, until I see from the results that my query spec wasn't quite right. Anyway, I think we users normally formulate queries in terms of the problem domain, not in terms of sequences of integers. I don't normally care about the eleventh or the twenty sixth result because of its ordinal rank; I just want the next element of my dataset.

So where's the problem? We know that we can setup a one-to-one correspondence between integers and countable data sets, so why not use convenient integers to specify pages? As long as the data are immutable, or users are tolerant of certain inaccuracies, we can use integers to identify pages.

But, in general the data might be changing even as users peruse the results. Suppose I specify a query q, which predicate is satisfied by the following ordered result elements at time 0: <A, C, E, G, I, J>. At time 1, F begins to satisfy q so the result set becomes <A, C, E, F, G, I, J>. If my page size is 2 and I was looking at the second page at time 0, what should I see when I proceed to the third page at time 2? Do I see G and I because results_t2[(page - 1) * size] = results_t2[4] identifies G? Would the apparent duplicate result in that case be confusing or just annoying? Should I see I and J because I started viewing at t_0 and therefore my page request should be interpreted as results_t0[4] = I?

Suppose an item I've already seen is deleted as I page through results. Will I see the item that would have been first on the next page at all?

If we're going to avoid anomalous disappearances and false duplicates, we can't rely on naively recomputing the function from integers to results on-demand. Perhaps we can store history and use an (page-number, timestamp) pair to index our pages.

If the cost of storing history is too great, there are compromises available aside from allowing both false apparent duplicates and omissions of elements that have existed throughout. We can allow repetition in order to avoid erroneous omission, or we can allow omission to avoid false repetition. Finally, if we disallow duplicates, we can use observable properties of the elements themselves as page indices. Often, (apparent) duplicates are undesirable anyway:

Summaries are often preferred over raw data
A B
pepperoni pepperoni: 4
mushroom mushroom: 2
pepperoni cheese: 1
mushroom
pepperoni
cheese
pepperoni


In my experience, the most common scenario involves concurrent usage, mutable underlying data, and two laws of user expectation: (1) if an item satisfied the query predicate continuously from before the query was issued to after the user reached the last page of results, the user should see the item and (2) one unchanging item should appear at most once in the pages.

Where these requirements are satisfiable, we'd like to encapsulate and reuse the implementation details. The interface should work in domain terms: given a data element, total order relation, and window size as input, it could provide the next n elements.

The main idea here is pretty simple, both in conception and in implementation. Part of the motivation for this blog post though was to discuss the details of client interface design and testing.

Here's my first test design for the core functionality:


[Test]
public void paging() {
const int count =
(int)(
// first page: stable + deleted
(pageSize + 1.0) * 3.0 / (1 + 1 + 0) +
// second page: stable + deleted + ins
(pageSize + 1.0) * 3.0 / (1 + 1 + 0) +
// third page: stable + ins
(pageSize + 1.0) * 3.0 / (1 + 1 + 0));
var items = Enumerable.Range(0, count).ToList();
var stable =
Enumerable.Where(
items,
x => x % 3 == 0).ToList();
var inserting =
Enumerable.Where(
items,
x => x % 3 == 1).ToList();
var deleting =
Enumerable.Where(
items,
x => x % 3 == 2).ToList();
var attendance =
Enumerable.ToDictionary(
items,
x => x,
x => 0);
try {
foreach (var i in stable) {
persist(i);
}
foreach (var i in deleting) {
persist(i);
}
int timer = 0;
foreach (var p in getPages()) {
foreach (var i in p) {
++attendance[i];
}
if (timer == 1) {
foreach (var ins in inserting) {
persist(ins);
}
}
else if (timer == 2) {
foreach (var del in deleting) {
delete(del);
}
}
++timer;
}
Assert.IsTrue(timer > 3, "There were too few pages in this test to see both kinds of concurrency anomaly.");
Assert.IsTrue(
Enumerable.All(
stable,
x => attendance[x] == 1),
"Every item that is present throughout the process should be seen exactly once.");
Assert.IsTrue(
Enumerable.Any(
deleting,
x => attendance[x] == 0),
"Some item deleted during this test should not have been seen. This may indicate eager retrieval of pages, which is isn't feasible in general (for large datasets).");
Assert.IsFalse(
Enumerable.Any(
items,
x => attendance[x] > 1),
"No item should be seen twice.");
// it's OK if items inserted during the test are seen
}
finally {
foreach (var i in items) {
delete(i);
}
}
}


For me, the best thing about this code is its strong focus on two key axioms. I partition all possible inputs in such a way as to demonstrate those, and then I arrange to draw samples from those partitions in what I hope is pretty unobtrusive way. I check that I've covered the necessary cases and then I show that the axioms are true for my sample data. getPages simulates sequential uses of the paging abstraction by a client:


protected override IEnumerable<IEnumerable<int>> getPages() {
const int pageSize = 10;
LocalItem memento = null;
IEnumerable<LocalItem> p;
do {
var slicedCriteria =
Potpourri.Page<LocalItem>.page(
DetachedCriteria.For(typeof(LocalItem)),
new [] {"itemNumber"},
memento);
p = LocalItem.SlicedFindAll(0, pageSize + 1, slicedCriteria);
memento =
Enumerable.LastOrDefault(p);
yield return
Potpourri.slice(
Enumerable.Select(
p,
i => elementToInt(i)),
0,
pageSize);
} while (Enumerable.Count(p) > pageSize);
}


Details: in real life this code runs against shared relational database instances, but for testing purposes I execute against SQLite for isolation. This test occurs in an abstract TestFixture with two concrete subclasses, one of whose getPages method is shown above.

The other uses the raw NHibernate windowing primitives in the naive way; that test class is marked with the IgnoreAttribute because that approach is doomed. I actually started by writing that test case as an executable demonstration of the motivation for the less obvious technique.

Here's my second test design. I think on balance it's actually less readable than the first attempt. I wonder how many people would agree?


[Test]
public void insertionSansDuplicates() {
insertion(fetchPageByValueEquality);
}

[Test]
[ExpectedException(typeof(DuplicateDetectedException))]
public void insertionAnomalies() {
insertion(fetchPageByOffset);
}

[Test]
public void deletionSansSkips() {
deletion(fetchPageByValueEquality);
}

[Test]
[ExpectedException(typeof(AnomalyDetectedException))]
public void deletionAnomalies() {
deletion(fetchPageByOffset);
}


Again, I wrote working test code for both straightforward, broken use of primitives and for the approach that suits my needs.

Here's how I address the two kinds of trouble that can result from trying to use offsets for paging in the face of concurrent modification:


private class AnomalyDetectedException : Exception {}
private class DuplicateDetectedException : AnomalyDetectedException {}
private void insertion(FetchPage fetchPage) {
var q = DetachedCriteria.For<Beer>();
var O = new [] {"ABV", "BrewDate", "Variety"};
Paging.AddOrder(q, O);
var seenit = new HashSet<Beer>();
using (var factory = Persistence.makeSessionFactory()) {
using (var s = factory.OpenSession()) {
var page =
fetchPage(
s,
q,
O,
null,
pageSize + 1);
foreach (Beer i in Enumerable.TakeWhile(page, (elt, i) => i < pageSize)) {
if (seenit.Contains(i)) throw new DuplicateDetectedException();
seenit.Add(i);
}
var first = (Beer)page[0];
var nouveau =
new Beer {
Variety = "not" + first.Variety,
BrewDate = first.BrewDate,
ABV = first.ABV / 2};
s.Save(nouveau);
s.Flush();
page =
fetchPage(s, q, O, Enumerable.Last(page), pageSize);
foreach (Beer i in page) {
if (seenit.Contains(i)) throw new DuplicateDetectedException();
seenit.Add(i);
}
}
}
}

private void deletion(FetchPage fetchPage) {
var q = DetachedCriteria.For<Beer>();
var O = new [] {"ABV", "BrewDate", "Variety"};
Paging.AddOrder(q, O);
var expected = new HashSet<Object>();
var actual = new HashSet<Object>();
using (var factory = Persistence.makeSessionFactory()) {
using (var s = factory.OpenSession()) {
var page = fetchPage(s, q, O, null, pageSize);
foreach (var i in page) expected.Add(i);
foreach (var i in page) actual.Add(i);
foreach (var i in fetchPage(s, q, O, Enumerable.Last(page), pageSize)) {
expected.Add(i);
}
s.Delete(page[0]);
s.Flush();
foreach (var i in fetchPage(s, q, O, Enumerable.Last(page), pageSize)) {
actual.Add(i);
}

actual.SymmetricExceptWith(expected);
if (0 != actual.Count) throw new AnomalyDetectedException();
}
}
}


The test data are much less carefully prepared:


[TestFixtureSetUp]
public void setup() {
Persistence.rebuildDB();
using (var factory = Persistence.makeSessionFactory()) {
using (var s = factory.OpenSession()) {
Assert.That(s.Linq<Bottle>().Count(), Is.GreaterThan(pageSize));
Assert.That(s.Linq<Beer>().Count(), Is.GreaterThan(pageSize));
}
}
}


That rebuildDB method isn't particularly geared towards the paging problem. It's really intended for use with a broader class of NHibernate-related addons.

I also have boundary test cases:


[Test]
public void empty() {
using (var factory = Persistence.makeSessionFactory()) {
using (var s = factory.OpenSession()) {
var max =
Queryable.Max(
s.Linq<Bottle>(),
i => i.Serial);
var source =
DetachedCriteria.For(
typeof(Bottle));
Assert.IsFalse(
Enumerable.Any(
Paging.Page(
s,
source,
new [] {"Serial"},
new Bottle { Serial = max + 1 },
pageSize)));
}
}
}

[Test]
public void singleton() {
using (var factory = Persistence.makeSessionFactory()) {
using (var s = factory.OpenSession()) {
var max =
Queryable.Max(
s.Linq<Bottle>(),
i => i.Serial);
var source =
DetachedCriteria.For(
typeof(Bottle)).
AddOrder(Order.Asc("Serial"));
Assert.AreEqual(
1,
Enumerable.Count(
Paging.Page(
s,
source,
new [] {"Serial"},
new Bottle { Serial = max },
pageSize)));
}
}
}


When there's no concurrent data modification, we get equivalent results with the simpler, direct use of NHibernate primitives as with the more complicated use of my paging abstraction:


[Test]
public void naturals() {
using (var factory = Persistence.makeSessionFactory()) {
using (var s = factory.OpenSession()) {
var Q =
Enumerable.Select(
Enumerable.Range(0, 2),
x => DetachedCriteria.For<Bottle>().AddOrder(Order.Asc("Serial"))).ToArray();
Assert.AreEqual(
Q[0].GetExecutableCriteria(s).SetMaxResults(pageSize).List<Bottle>(),
Paging.Page<Bottle>(s, Q[1], new [] {"Serial"}, null, pageSize));
}
}
}

[Test]
public void beers() {
using (var factory = Persistence.makeSessionFactory()) {
using (var s = factory.OpenSession()) {
var O = new [] {"ABV", "BrewDate", "Variety"};
// ICriterion instances are mutable and don't
// expose a copy constructor or clone method,
// so we must make two criteria, just alike.
var Q =
Enumerable.Select(
Enumerable.Range(0, 2),
x => DetachedCriteria.For<Beer>()).ToArray();
foreach (var q in Q) {
Paging.AddOrder(q, O);
}
Assert.AreEqual(
Q[0].GetExecutableCriteria(s).SetMaxResults(pageSize).List<Beer>(),
Paging.Page(s, Q[1], O, null, pageSize));
}
}
}


I welcome your comments and criticism. Please see ormar for details or to publish your own code review.

vogon fodder

In this post, I show-and-tell about automated paper-pushing.

I spent considerable effort at work last week building something that is minimally useful. I am preoccupied by the conflict between the demands of professionalism and the idiotic/cynical mockery of professionalism that is usual (though not universal) in large organizations. But I've got to eat and I should focus on problems I can realistically solve. For now, let's accept the demands of management without question and focus on how to satisfy the demands at minimum cost to human life. In the words of Steve Jobs:

Well, let's say you can shave 10 seconds off of the boot time. Multiply that by five million users and thats 50 million seconds, every single day. Over a year, that's probably dozens of lifetimes. So if you make it boot ten seconds faster, you've saved a dozen lives. That's really worth it, don't you think?

Your manager emails you an MSWord document and asks you fill it out once per software deployment. It should list every affected "object" and indicate when if ever "User" has been notified, when and who if ever "BAM" has approved, and it should reference the other paperwork associated with the project. What do you do?

Clearly, the right answer involves CI and LaTeX. In my case: svn, ccnet, Bugzilla, python, xslt, ruby, LaTeX, perl, and pdf.

We've been using CI with SVN to do deployments since 2004. There's a branch for each deployment target, and a CI project associated with each of those branches. The CI builds usually correspond to a single commit, but we don't insist on that; when commits come fast enough we go for speed over precision in who's "on the hook" for breaking changes. So the first problem is to find the union of affected "objects" over the set of svn commits in the deployment.

The term "object" is taken from the argot of the RPGers, who are the oldest and largest group of software people at our company. For them, "object" often refers to a printed tabular summary of data but more generally can refer to a program for any purpose. The usual custom outside the RPG tribe is to list as "objects" ui elements, but sometimes instead OO classes or source files are listed. It's not worth being too careful about this, because the real goal is pro forma compliance with a vague demand for documentation. Users basically want release notes, and engineers already get precise, accurate, and efficient answers to technical questions from svn and Bugzilla. The form really just needs to exist and, at a glance, to appear normal.

Our CI system CruiseControl.NET has a feature called modification writer that renders metadata about a build to XML. This provides a list of added, deleted, and textually-modified files for each commit that's new since the last build:

<?xml version="1.0" encoding="utf-8"?>
<ArrayOfModification xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Modification>
<Type>Modified</Type>
<FileName>ccnet-self.config</FileName>
<FolderName>/ccnet</FolderName>
<ModifiedTime>2009-07-24T10:23:08.162158-05:00</ModifiedTime>
<UserName>sean.foy</UserName>
<ChangeNumber>4413</ChangeNumber>
<Version />
<Comment>whitespace
[Bug 50,51,3783]</Comment>
<IssueUrl>https://bugzilla.thcg.net/buglist.cgi?bug_id=50,51,3783</IssueUrl>
</Modification>
</ArrayOfModification>

We integrated SVN and Bugzilla in 2004, and so since that time we've been able to correlate commits with bugs. This gives us informal commentary about the motivations and the design considerations for each commit, plus metadata including who was involved in the process, how much time each of us spent, whether we paired or got code review, whether dbas got involved, who approved deployment, etc. Well, at least in principle. Actually, only one or two projects have ever really taken advantage of these capabilities. But the point is that this metadata fits the purported goal of our paperwork, and it's useful and easily accessible, and it exists for at least one project. It's the best we can do, and it's pretty good. So we've got to use it.

Python is a fairly pleasant language to work in and nobody's terrified by its syntax. It also has great libraries for screen-scraping. Accordingly, we used a python program to augment ccnet's modifications xml. We read the ccnet xml and use the bugtraq standard to write back the set of relevant bugs:

<ArrayOfModification>
<Modification>
<Type>Modified</Type>
<FileName>ccnet-self.config</FileName>
<FolderName>/ccnet</FolderName>
<ModifiedTime>2009-07-24T10:23:08.162158-05:00</ModifiedTime>
<UserName>sean.foy</UserName>
<ChangeNumber>4413</ChangeNumber>
<Version />
<Comment>whitespace
[Bug 50,51,3783]</Comment>
<IssueUrl>https://bugzilla.thcg.net/buglist.cgi?bug_id=50,51,3783</IssueUrl>
<database />
<flags>
<flag id="1882" name="QCApproved" setter="michael.jorgensen@covidien.com" status="+" />
</flags>
</Modification>
</ArrayOfModification>

Now we have all the answers we need to fill out our worksheet in XML. We just need to produce MSWord output (good luck). Well, MSWord probably isn't really necessary. That's what's expected, but also we should keep in mind that the goal is pro forma compliance. This document's real use is to ward off auditors, so it just needs to exist and not look unusual. So, we want to typeset XML? Obviously we should consider DocBook, XSL-FO, and LaTeX. I looked into XSL-FO c. 2001 and DocBook c. 2002, but I know LaTeX best. I wanted to spend less than 4 hours on this little project, so I was satisfied to use LaTeX.

For processing XML, you could do worse than XSLT. It can be particularly nice as a declarative template specification: "make my input look like this". So, I used xslt to process the bz-augmented ccnet modifications xml into LaTeX:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:smf="http://seanfoy.thcg.net/xslt">
<xsl:output method="text" />

<xsl:param name="asr">application service request number</xsl:param>
<xsl:param name="builddate">the date these changes became effective</xsl:param>
<xsl:param name="trunkdeploymenttarget">unc path</xsl:param>
<xsl:param name="trunkdbserver">hostname</xsl:param>
<xsl:param name="trunkdbname">dbname</xsl:param>
<xsl:param name="productiondeploymenttarget">unc path</xsl:param>
<xsl:param name="productiondbserver">hostname</xsl:param>
<xsl:param name="productiondbname">dbname</xsl:param>

<xsl:function name="smf:latexlit">
<xsl:param name="raw" />
<xsl:value-of select="fn:replace(fn:replace($raw, '([$&amp;%#_{}~^\\])', '\\$1'), '\\\\', '\$\\backslash\$')" />
</xsl:function>

<xsl:template match="@*|node()">
</xsl:template>
<xsl:template match="/">
<xsl:text>
\documentclass{article}

\usepackage{hyperref}
\usepackage{longtable}
\usepackage{latexsym}

\usepackage[document]{ragged2e}
\usepackage{wordlike}
\renewcommand{\familydefault}{phv} % Arial
\sloppy
\makeatletter
\renewcommand{\subsection}{\@startsection%
{subsection}%
{1}
{0em}
{-\baselineskip}%
{0.5\baselineskip}%
{\bfseries\LARGE\sffamily}}
\makeatother

\author{Sean Foy}
\title{Sample}

\begin{document}
\section*{Internet/Intranet Production Move Request}
\textbf{ASR /Track-it WO\#:} </xsl:text><xsl:value-of select="smf:latexlit($asr)" /><xsl:text>\hfill\textbf{Move Date:}

\textbf{Requestor/ User Notification Sent:} $\Box$ \textbf{Dated:}

\subsection*{Object(s): $\Box$Web \qquad $\Box$WebMethods (WM)}
\begin{longtable}{|l|l|}\hline
</xsl:text>
<xsl:for-each select="ArrayOfModification/Modification">
<xsl:value-of select="smf:latexlit(FileName)" />
<xsl:text>&amp; -r</xsl:text><xsl:value-of select="smf:latexlit(ChangeNumber)" />
<xsl:if test="fn:not(fn:empty(IssueUrl))">
<xsl:text> (</xsl:text><xsl:value-of select="smf:latexlit(IssueUrl)" /><xsl:text>)</xsl:text>
</xsl:if>
<xsl:text> \\</xsl:text>
</xsl:for-each>
<xsl:text>\hline
\end{longtable}

\subsection*{DB Object(s):}
\begin{longtable}{|l|l|}\hline
</xsl:text>
<xsl:for-each select="ArrayOfModification/Modification[database]">
<xsl:value-of select="smf:latexlit(FileName)" />
<xsl:text>&amp; -r</xsl:text><xsl:value-of select="smf:latexlit(ChangeNumber)" />
<xsl:if test="fn:not(fn:empty(IssueUrl))">
<xsl:text> (</xsl:text><xsl:value-of select="smf:latexlit(IssueUrl)" /><xsl:text>)</xsl:text>
</xsl:if>
<xsl:text> \\</xsl:text>
</xsl:for-each>
<xsl:text>
\hline
\end{longtable}

\subsection*{Deployment Servers:}
\begin{tabular}{|l|l|l|}\hline
&amp;
Development&amp;
Production\\\hline
Web:&amp;
\textbf{UNC:} </xsl:text><xsl:value-of select="smf:latexlit($trunkdeploymenttarget)" /><xsl:text>&amp;
\textbf{UNC:} </xsl:text><xsl:value-of select="smf:latexlit($productiondeploymenttarget)" /><xsl:text>\\\hline
DB:&amp;
\textbf{Server:} </xsl:text><xsl:value-of select="smf:latexlit($trunkdbserver)" /><xsl:text> \textbf{Database:} </xsl:text><xsl:value-of select="smf:latexlit($trunkdbname)" /><xsl:text>&amp;
\textbf{Server:} </xsl:text><xsl:value-of select="smf:latexlit($productiondbserver)" /><xsl:text> \textbf{Database:} </xsl:text><xsl:value-of select="smf:latexlit($productiondbname)" /><xsl:text>\\\hline
WM:&amp;
\textbf{UNC:} &amp;
\textbf{UNC:} \\\hline
\end{tabular}

\rule{\textwidth}{0.25em}

\textbf{Move Approved by BAM}:

Signature: \underline{\makebox[2in][l]{}}

Date: \underline{\makebox[1in][l]{}}

\end{document}
</xsl:text>
</xsl:template>
</xsl:stylesheet>

LaTeX is based on Don Knuth's TeX typesetting system. It's serious business: Turing-complete and designed specifically for the needs of the world's most demanding typographical experts.

For a while I thought about merging the bug and svn data here in LaTeX. My TeX-hacking skills are very weak now so I had in mind the python package. As I escaped for XSL the escape for LaTeX of the escape for Python of the LaTeX output I wanted, I thought I was probably making a mistake. LaTeX is great for typesetting, and capable of general computation, but it's awkward to express the heavy-lifting there (even in an embedded language). The python package is probably better used tactically, for simple comprehensions and the like -- not for elaborating significantly on the structure of the LaTeX markup.

We must compile LaTeX to obtain our output (normally PostScript or PDF). I use a perl script, latexmk, to manage the multi-phase compilation process and keep track of cleaning up the intermediate products.

So we want to combine (((xml and Buzilla) by python to xml) and xsl) by xslt to LaTeX by latexmk to PDF). We need to process our inputs and intermediate products in topological order; this is a build process. I've used make before, but it's a bit crufty. I prefer NAnt, whose XML syntax is more explicit and discoverable. From what I've seen of rake, I like it better than NAnt. NAnt's a priori boundary between its implementation and your build specification is confining. Rake draws no distinction between itself and your build specification.

# -*- mode: ruby -*-
# Render modifications as beautiful
# (or hideous!) PDF
#
# sample invocation:
# rake latexmk=vendor/latexmk/latexmk.pl modifications.pdf

require 'rake/clean'
require 'rexml/document'
require 'rexml/xpath'

transform = "#{ENV['transform'] || 'vendor/saxonb9-1-0-7n/bin/Transform.exe'}"
latexmk = "#{ENV['latexmk'] || 'vendor/latexmk/latexmk.pl'}"

modifications_xml = "#{ENV['modifications_xml'] || '../Artifacts/modifications.xml'}"

task :default => 'modifications.pdf'

task :clean => ['clobberhelper'] do
end

task :clobber => ['clobberhelper'] do
end

def contains_clobber(ts)
ts.each { |i|
return true if (i == 'clobber') || contains_clobber(Rake::Task[i].prerequisites)
}
return false
end

task :clobberhelper do |t|
if contains_clobber(Rake::application().top_level())
sh latexmk, '-C'
else
sh latexmk, '-c'
end
end

file 'modifications.tex' => [modifications_xml, 'modifications.xsl'] do |t|
xsl_params = {}
begin
f = File.new('modifications.xsl')
doc = REXML::Document.new(f)
REXML::XPath.each(doc.root, 'xsl:param', {'xsl' => 'http://www.w3.org/1999/XSL/Transform'}) do |elt|
xsl_params[elt.attributes['name']] = ENV[elt.attributes['name']] || elt.text
end
ensure
f.close unless f.nil?
end
xsl_args = xsl_params.inject([]){ |accum, (key, value)|
accum << "#{key}=#{value}"
}
sh transform, "-s:#{modifications_xml}", '-xsl:modifications.xsl', '-o:modifications.tex', *xsl_args
end
CLEAN.include('modifications.tex')

file 'modifications.pdf' => ['modifications.tex'] do |t|
sh latexmk, '-pdf', 'modifications.tex'
end

If you don't know about rake, one point of confusion is that rake clean conventionally means "remove intermediates" while rake clobber means "remove intermediate and final outputs".

With my use of latexmk, I encountered a technical problem in the implementation of the standard clean and clobber rake tasks: latexmk needs the .tex file to decide what to clean up, but rake wants to clean (removing the .tex) before it clobbers. I needed to let latexmk do its job before clean, and latexmk needs to know whether to clean up the PDF or just the intermediates. My solution was to add a dependency clobberhelper for the clean task to call latexmk's cleanup routine. clobberhelper searches rake's dependency graph for clobber in order to decide how to call latexmk.

Finally, we have a corporate commitment to .NET and so most of our projects benefit from NAnt's understanding of .NET. We want an easy way to use this motley assortment of tools with our existing NAnt build files:

<?xml version="1.0" encoding="utf-8"?>
<project name="sdim" default="move-worksheet">
<property name="artifacts_dir" value="." overwrite="false" />
<!-- nant for Windows doesn't use the cygwin path so
you need to chdir to %cygwin_home%\bin and then
run %cygwin_home%\path\to\python or
install Windows python and do e.g.,
C:\Python31\python.exe -->
<property name="python" value="C:\Python31\python.exe" overwrite="false" />
<property name="rake" value="C:\Ruby\bin\rake.bat" overwrite="false" />
<property name="modifications_xml" value="modifications.xml" overwrite="false" />

<target name="move-worksheet">
<exec program="${python}">
<arg line="bzaugment.py ${path::combine(artifacts_dir, modifications_xml)} -o ${path::combine(artifacts_dir, 'modifications.augmented.xml')}" />
</exec>
<exec program="${rake}">
<arg line="modifications_xml=${path::combine(artifacts_dir, 'modifications.augmented.xml')}" />
</exec>
</target>

<target name="clean">
<exec program="${rake}">
<arg line="clobber" />
</exec>
<delete file="${path::combine(artifacts_dir, 'modifications.augmented.xml')}" />
</target>
</project>

I still have some reservations about the morality of this. On the one hand, no human should waste even 30 minutes of precious life preparing ambiguous, incomplete, inaccessible and uninsightful summaries of data that's already available, accurate, complete, and easily-accessible. On the other hand, you can lead a manager to water but you can't make him drink. This solution seems to balance those two forces, but by reducing the cost of useless redundancy, am I encouraging further bureaucratic malignancy?

Viola:


Anyway, I hope you enjoyed the tour of this motley assemblage of languages and tools.


Thanks to Josh Buedel for comments and suggestions during the preparation of this post.








23.1: new .emacs


(setq ns-command-modifier 'meta)
(if (functionp 'tool-bar-mode)
(tool-bar-mode -1))

(setq default-buffer-file-coding-system 'utf-8-unix)
(setq-default indent-tabs-mode nil)
(put 'erase-buffer 'disabled nil)

(cond
((file-exists-p "/sw/bin")
;; MacOS X + fink
(setenv "PATH" (concat (getenv "PATH") ":/sw/bin"))
(setq exec-path (append exec-path '("/sw/bin"))))
((file-exists-p "c:/")
;; Windows
(setenv "PATH" (concat (getenv "PATH") ";c:\\cygwin-1.7\\bin"))
(setq exec-path (append exec-path '("c:\\cygwin-1.7\\bin")))
(if (require 'cygwin-mount nil t)
(progn
(cygwin-mount-activate)
(setq shell-file-name "bash")
(setenv "SHELL" shell-file-name)
(setq w32-quote-process-args ?\")
;; for subprocesses invoked via the shell
;; (e.g., "shell -c command")
(setq explicit-shell-file-name shell-file-name)
(setq explicit-sh-args '("-login" "-i"))))
; (if I'm using Windows, I must be at work!)
(setq bug-reference-url-format
"https://bugzilla.thcg.net/show_bug.cgi?id=%s")
(setq bug-reference-bug-regexp
"\\(?:[Bb]ug ?#?\\|PR [a-z-+]+/\\)\\([0-9]+\\)")))

(dolist (
prog-mode-hook
'(python-mode java-mode ruby-mode js2-mode nxml-mode))
(add-hook
(car (read-from-string (concat (symbol-name prog-mode-hook) "-hook")))
(lambda ()
(setq indent-tabs-mode nil))))

(autoload 'python-mode "python-mode" "Python editing mode." t)
(add-to-list 'auto-mode-alist '("\\.py$" . python-mode))
(add-to-list 'interpreter-mode-alist '("python" . python-mode))
(add-hook 'python-mode-hook
(if (null (getenv "LC_CTYPE"))
(setenv "LC_CTYPE" "C")))

(add-to-list 'auto-mode-alist '("\\.cs$" . java-mode))

(autoload 'js2-mode "js2" nil t)
(add-to-list 'auto-mode-alist '("\\.js$" . js2-mode))

(autoload 'ruby-mode "ruby-mode" "Ruby editing mode." t)
(add-to-list 'auto-mode-alist '("\\.rb$" . ruby-mode))

(setq auto-mode-alist
(cons '("\\.\\(xml\\|xsl\\|rng\\|xhtml\\)'" . nxml-mode)
auto-mode-alist))
(add-to-list 'magic-mode-alist '("<\\?xml " . nxml-mode))

(if (load "auctex.el" t t t)
(load "preview-latex.el" nil t t))
(setq ispell-program-name "aspell")
(setq ispell-extra-args '("--sug-mode=ultra"))

Saturday, June 6, 2009

Andand.NET via monads

In my previous post, I discussed a direct implementation of andand in C#. This time, I'll show a monadic implementation.

As Justin Wick pointed out on Reg Braithwaite's blog announcing Object#andand, andand can be viewed as a special case of the Maybe Monad. I've been hearing about how useful monads are for a few years, but until recently I hadn't occasion to use them directly. Would a monad formulation of andand be simpler and clearer to read? Would it be easier to understand, if not for everyone then at least for readers who are already familiar with the monad formalism? Here was an opportunity to find out.

For those who are totally unfamiliar with monads: they're a means of accommodating side-effects within the functional paradigm. By structuring input, output, exceptions, etc. with monads, you can restore referential transparency to functions that make use of these kinds of features. Monads are surprisingly versatile. You can build lists and arrays from monads, and in the case of arrays the use of monads makes it easy to use the type system to guarantee that client code fulfills a contract on which an efficient implementation of arrays depends. The famous applications of all the functional goodies in LINQ, such as data access and data-parallel computation, are organized around monads.

So, I refreshed my memory with Monads for Functional Programming and Confessions of a Used Programming Language Salesman followed some of their references: Comprehending Monads, Guarded Recursive Datatype Constructors, and Generalized Algebraic Data Types and Object-Oriented Programming. Thus fortified, I was ready to write some tests:


public class TestMonad<M> where M : MonadImpl<M>, new() {
[Test]
public void leftUnit() {
var a = new Object();
Assert.AreEqual(a, Monad.Star(Monad.Unit<Object, M>(() => a), b => Monad.Unit<Object, M>(() => b)).Just());
}

[Test]
public void rightUnit() {
var a = new Object();
var m = Monad.Unit<Object, M>(() => a);
Assert.AreEqual(
Monad.Just(m),
Monad.Just(
Monad.Star(m, b => Monad.Unit<Object, M>(() => b))));
}

public Monad<String, M> an(Object o) {
return Monad.Unit<String, M>(() => o.ToString());
}

public Monad<Int32, M> bo(Object o) {
return Monad.Unit<Int32, M>(() => o.GetHashCode());
}

[Test]
public void associative() {
var m = Monad.Unit<Object, M>(() => new Object());
Assert.AreEqual(
Monad.Just(
Monad.Star(
m,
a => Monad.Star(an(a), b => bo((String)b)))),
Monad.Just(
Monad.Star(
Monad.Star(m, a => an(a)),
b => bo(b))));
}
}


I'm following the definitions from Monads for Functional Programming here: a Monad is a triple <M, Unit, ★> where M is a type constructor (generic type), Unit is a function from computations of type a to M a, and ★ applies a function to a monad.

My use of the name "Just" here is regrettable. The Maybe monad's type constructor is conventionally defined as a simple union:
Maybe a = Nothing | Just a
. So, by convention,
Just a
takes a computation of arbitrary type
a
to a monad representing that computation. My function "Just" is the inverse, taking a Monad to the computation it represents. I gather that the primitive monad operators are normally encapsulated so that control of side-effects is well localized/modularized. Usually, function composition on monad values is accomplished by ★, so it's not often necessary to explicitly extract raw computations from monads. Finally, monads seem to be most popular within the Haskell community, and pattern-matching is a pretty ubiquitous in Haskell programs. I think these reasons explain why (as far as I know) there's not much use and no standard name for the operation I call "Just" in my code.

Having translated the mathematical and Haskell definitions of monad into C#, I was ready to check that my translation could extend to an example application shown elsewhere:


[TestFixture]
public class TestMaybeMonad : TestMonad<MaybeMonadImpl> {
private Monad<String, MaybeMonadImpl> lookup(string name, Dictionary<String, Monad<String, MaybeMonadImpl>> db) {
Monad<String, MaybeMonadImpl> result;
db.TryGetValue(name, out result);
return result;
}

[Test]
public void MailSystem() {
//based very loosely on haskell.org's example
// (I don't understand why their mplus is relevant)
var fullnamedb = new Dictionary<String, Monad<String, MaybeMonadImpl>>();
var nicknamedb = new Dictionary<String, Monad<String, MaybeMonadImpl>>();
fullnamedb.Add("nicholas", Monad.Unit<String, MaybeMonadImpl>("n@m.com"));
nicknamedb.Add("nick", Monad.Unit<String, MaybeMonadImpl>("nicholas"));
var mnick = Monad.Unit<String, MaybeMonadImpl>("nick");
Assert.AreEqual(
"n@m.com",
Monad.Star(
Monad.Star(mnick, nick => lookup(nick, nicknamedb)),
name => lookup(name, fullnamedb)).Just());
}


And finally, I can restate andand in terms of monads:

private class D {}
private class C {
public D d;
}
private class B {
public C c;
}
private class A {
public B b;
}
[Test]
public void andand() {
A nil = null;
A a = new A();
A ab = new A() { b = new B() };
Assert.IsNull(MaybeMonad<D>.Andand<A>(() => null));
Assert.IsNull(MaybeMonad<D>.Andand(() => (A)null));
Assert.AreEqual(a, MaybeMonad<D>.Andand(() => a));
Assert.IsNull(MaybeMonad<D>.Andand(() => a.b));
Assert.AreEqual(ab.b, MaybeMonad<D>.Andand(() => ab.b));
Assert.IsNull(MaybeMonad<D>.Andand(() => ab.b.c));
Func<Object> x = null;
Func<Func<Object>> y = null;
y = () => null;
Assert.IsNull(MaybeMonad<D>.Andand(() => y()()));
Assert.IsNull(MaybeMonad<D>.Andand(() => a.b.c));
Assert.IsNull(MaybeMonad<D>.Andand(() => a.b.c.d));
}


OK then, let's see how this is implemented:

public class MonadImpl<M> where M : MonadImpl<M>, new() {
public virtual Monad<B, M> Star<A, B>(Monad<A, M> ma, Func<A, Monad<B, M>> f) {
return f(ma.Just());
}
}
public class Monad<T, M> where M : MonadImpl<M>, new() {
public M impl = new M();
public T Just() {
return a();
}
internal protected Func<T> a;
public Monad<B, M> Star<B>(Func<T, Monad<B, M>> f) {
return impl.Star<T, B>(this, f);
}
public static Monad<A, M> Unit<A>(A a) {
return new Monad<A, M>() { a = () => a };
}
}


A monad represents a computation of arbitrary type and some sort of side-effect whose control motivates the use of monads. Wrapping and unwrapping the computation associated with the monad is a straightforward example of parametric polymorphism. Slightly more interesting is recursive nature of the Monad type, whose method signatures involve the type on which they are defined. Simple parametric polymorphism permits us to define methods that behave uniformly on values of any type. If we want to make use of non-universal structure on those values we can restrict the values to a range of types that share the structure we need. Often, in a recursively-defined type, we want this range of types to itself involve a type parameter. For example, in this case we want Star to take a Monad and to return a Monad of a similar type. This requires f-bounded polymorphism, which is supported in .NET's generics; c#'s where clause not only binds type parameters, but can (re)use those parameters in the bounds.

Managing the side-effects is less straightforward. Here we want uniform packing/unpacking behavior across all computations, but side-effect management behavior that is peculiar to the kind of side-effect captured by the monad. We've had some support for behavioral refinement that follows subtyping relationships since the beginning of .NET, with virtual methods. But the Star function takes a monad (with behavior b and computation c) and a function from c to d and returns a monad with behavior b and computation d. It builds a value whose type is merely related and not identical to any of its type arguments. This is characteristic of Guarded Recursive Datatype Constructors. These g.r. datatypes, also known as Generalized Algebraic Data Types (GADTs), are not yet well-supported in .NET. Fortunately, the work-around shown above is sufficient for seeing the connection between monads and andand.

The Maybe Monad represents "just" a computation, or "nothing." The monad structure makes it easy and convenient to treat Maybes in a uniform way, regardless of whether a particular Maybe represents "nothing" or a non-nothing computation. If you're familiar with the Null Object pattern, you'll recognize the benefit of this uniformity and you'll see that Maybe monad is a more general alternative to Null Object in many contexts.

So finally we're ready to see andand, implemented in terms of the MaybeMonad:

public class MaybeMonadImpl : MaybeMonadImpl<MaybeMonadImpl> {}
public class MaybeMonad<T> : Monad<T, MaybeMonadImpl> {
private static Expression andandifyme(MemberExpression me) {
var objectParameter = Expression.Parameter(me.Expression.Type, "target");
var parameters = new ParameterExpression [] {objectParameter};
//TODO: simplify by using the parameter f rather than by duplicating
// the subexpression: andandify(me.Expression) in slotValue and
// in the Just of Star of Unit below (the subexpression could occur
// solely in slotValue, and then its value could be used by
// dereferencing f in the Just of Star of Unit.
var slotValue =
andandifyinr(
Expression.Invoke(
Expression.Lambda(
Expression.MakeMemberAccess(
objectParameter,
me.Member),
parameters),
andandify(me.Expression)));
var f = Expression.Parameter(me.Expression.Type, "f");
return
Expression.Call(
typeof(Monad),
"Just",
new Type [] { me.Type, typeof(MaybeMonadImpl) },
Expression.Call(
typeof(Monad),
"Star",
new Type [] { me.Expression.Type, me.Type, typeof(MaybeMonadImpl) },
Expression.Call(
typeof(Monad),
"Unit",
new Type [] { me.Expression.Type, typeof(MaybeMonadImpl) },
andandify(me.Expression)),
Expression.Lambda(
Expression.Call(
typeof(Monad),
"Unit",
new Type [] { me.Type, typeof(MaybeMonadImpl) },
slotValue),
f)));
}
private static Expression andandifyi(InvocationExpression i) {
return
andandifyinr(
Expression.Invoke(
andandify(i.Expression),
i.Arguments));
}
private static Expression andandifyinr(InvocationExpression i) {
var l = i.Expression;
var f = Expression.Parameter(l.Type, "f");
return
Expression.Call(
typeof(Monad),
"Just",
new Type [] { i.Type, typeof(MaybeMonadImpl) },
Expression.Call(
typeof(Monad),
"Star",
new Type [] { l.Type, i.Type, typeof(MaybeMonadImpl) },
Expression.Call(
typeof(Monad),
"Unit",
new Type [] { l.Type, typeof(MaybeMonadImpl) },
l),
Expression.Lambda(
Expression.Call(
typeof(Monad),
"Unit",
new Type [] { i.Type, typeof(MaybeMonadImpl) },
i),
f)));
}
private static LambdaExpression andandifyl(LambdaExpression l) {
return
Expression.Lambda(
andandify(l.Body),
Enumerable.Select(
l.Parameters,
p => (ParameterExpression)andandify(p)).ToArray());
}
private static Expression andandify(Expression expr) {
var me = expr as MemberExpression;
if (me != null) {
return andandifyme(me);
}
var i = expr as InvocationExpression;
if (i != null) {
return andandifyi(i);
}
var l = expr as LambdaExpression;
if (l != null) {
return andandifyl(l);
}
return expr;
}
public static U Andand<U>(Expression<Func<U>> expr) {
var aexpr = (Expression<Func<U>>)andandify(expr);
try {
return (U)aexpr.Compile().DynamicInvoke();
}
catch (Exception e) {
throw new Exception(
String.Format(
"trouble evaluating {0}",
aexpr),
e);
}
}
}


By conflating null and Maybe's Nothing, we preserve the simple approach Identity monad can use for Star: just unwrap the computation from the input monad and wrap it in the output monad. This approach doesn't work in general, because there might be state associated with the monads that we'd need to migrate.

Now, comparing the direct implementation to the monadic implementation, you might be ready to decide that monads might be theoretically interesting but are impractical. I think that on one hand, the more stateful monads benefit more fully from the paradigm than the Maybe monad; on the other hand, C# exaggerates the ceremony of monad implementation relative to Haskell. Microsoft hasn't highlighted their use of monads in LINQ or EF or PLINQ, and they haven't exposed any analog to my Monad<T, M> type, so I couldn't share that implementation for andand. In summary, I don't think the monadic implementation of andand carries its weight, but if I saw an opportunity to benefit from other kinds of monads in my program, the benefits might easily outweigh the amortized costs of monads.





Thursday, June 4, 2009

Andand.NET sans monads

I motivated this material in my previous post.

One way to view andand is as a syntactic transformation of the code: the effect we want is to write

a.b.c.d

and have that automatically translated to

(a != null && b != null && c != null) ? a.b.c.d : null


C# doesn't support syntactic abstraction in the style of Lisp macros, but later in this post we'll see that recent versions of C# give us much more flexibility than we had in 2.0 and earlier.

Before we look at the techniques in detail, I want to acknowledge that syntactic regularity is a key motivation for Ruby Object#andand. Ruby Object#andand uses infix "andand" in imitation of the built-in guarded assignment operator &&=. We have no such operator in C#, and I am a prefix sympathizer anyway, so neither of my andand implementations share this emphasis with the Ruby inspiration.

So, my first implementation of andand for C# used LINQ expression tree transformation. The first step is to gain the ability to manipulate the program defined by the source code whose meaning I want to reinterpret (e.g., the compound member access expression a.b.c.d shown above):

public static T Andand<T>(Expression<Func<T>> f) {
try {
f = (Expression<Func<T>>)aar(f);
return f.Compile()();
}
catch (Exception e) {
throw new Exception(String.Format("trouble with {0}", f), e);
}
}

One of the main ideas in LINQ is type-directed expression quoting. Here, I've written a function that takes an Expression tree as an argument. If I apply this function to a lambda expression, the compiler will translate the lambda to a value representing the abstract syntax tree of the lambda rather than generating an anonymous delegate. So, a typical call site for Andand looks like this:


private class D {}
private class C {
public D d;
}
private class B {
public C c;
}
private class A {
public B b;
}

A ab = new A();
ab.b = new B();
Assert.AreEqual(ab.b, Andand(() => ab.b));
Assert.IsNull(Andand(() => ab.b.c.d));


Once I have my expression tree, I'm ready for a transformation. Clearly from my starting example, member access expressions constitute an important case. I'd like to be able to handle some other cases though, like method chaining. I'll give you a taste of tree inspection and construction here; for complete details see andand-sans-monads at googlecode.


private static Expression aar(Expression e) {
var me = e as MemberExpression;
if (me != null) {
return
guard(me.Expression, me);
}
var l = e as LambdaExpression;
if (l != null) {
return
Expression.Lambda(
guard(
l.Body,
l.Body),
Enumerable.Select(
l.Parameters,
p => p).ToArray());
}
var ie = e as InvocationExpression;
if (ie != null) {
var args =
Enumerable.Select(
ie.Arguments,
(arg, i) => Expression.Parameter(arg.Type, String.Format("arg{0}", i))).ToArray();
return
Expression.Invoke(
guard(
ie.Expression,
Expression.Lambda(
nullExpr(ie.Expression.Type),
args),
Expression.Lambda(
ie.Expression,
args)),
Enumerable.Select(
ie.Arguments,
arg => aar(arg)));
}
return e;
}


It sure is a lot of work to construct expression trees "by hand" as opposed to using the type-directed quoting facility. I could be a lot more concise with Lisp quasiquote. Maybe later we'll experiment with using type-directed quotation of expression tree transformations, but for now I was more interested in exploring the connection between andand and monads.

But I have to work tomorrow morning and I'd like you to be able to read this now, so I'll save the monad and GADT discussion for my next post in this series.





Sunday, May 31, 2009

motivation for Andand.NET

Some of my followers on twitter encouraged me to post about my andand implementation, so here is the first in a series.

In brief: I wanted to replace

if (a != null && a.b != null && a.b.c != null) {
return a.b.c.d;
}
else {
return null;
}

by

return andand(a.b.c.d);

and I manged to learn a little bit about monads, generalized algebraic data types, and .NET on the way to achieving

return andand(() => a.b.c.d)


About a month ago I was working with an instance of a generic type:

var page = Potpourri.Page.getPage(s, count.Value);
ViewData["reverse-memento"] = page.reverseMemento.itemNumber;

This code obtains a range of rows from a database given a starting position and a window size. Simple integer offsets don't work for the general case of concurrent access by lay users to a mutable table. My approach requires getPage to know an equivalence relation on elements, but it doesn't require that getPage know about itemNumber specifically or even that there be just one field/property that's sufficient to determine equality.

So, AFAICT, there's no good way to avoid violating Demeter here.

Certainly there remains a practical problem that the next and previous page references could be undefined for any given page. In this case, reverseMemento could be null.

The awkward conditional needed to avoid NullReferenceException here brought to mind Object#andand. Informally, andand provides short-circuit behavior for chained accesses. The Ruby andand relies on Ruby metaprogramming features. The closest thing we have to a macro facility in c# is Linq expression trees, and I hadn't had an excuse to use System.Linq.Expressions yet, so this seemed like a good opportunity.

My first implementation directly transformed expression trees, replacing member accesses by conditional member accesses that return null whenever a is null in a.b.

My second implementation was inspired by a question about the relation between andand and the Maybe monad. I thought it would be a good idea to (re)read some papers about monads and test my understanding by implementing a few, including Maybe Monad. I also thought it would be instructive to implement Maybe monad in terms of andand, and andand in terms of Maybe monad.

I'll discuss the code in detail in my next post. For now, you can download a snapshot. I guess I'll migrate my existing hg repo to googlecode in time for the next post to make browsing easier.

Monday, May 18, 2009

.emacs

; Some people I know have asked about my .emacs. Voila:

(custom-set-faces
;; custom-set-faces was added by Custom.
;; If you edit it by hand, you could mess it up, so be careful.
;; Your init file should contain only one such instance.
;; If there is more than one, they won't work right.
)

(setq default-buffer-file-coding-system 'unix)

(setenv "PATH" (concat "c:/cygwin-1.7/bin;" (getenv "PATH")))
(setq exec-path (cons "c:/cygwin-1.7/bin/" exec-path))

(require 'cygwin-mount)
(cygwin-mount-activate)

(add-to-list 'load-path "/cygdrive/c/program-files/emacs-22.3/site-lisp")

(add-to-list 'auto-mode-alist '("\\.cs$" . java-mode))
(add-hook 'java-mode-hook
(lambda ()
(setq indent-tabs-mode nil)))

(autoload 'js2-mode "js2" nil t)
(add-to-list 'auto-mode-alist '("\\.js$" . js2-mode))
(add-hook 'js2-mode-hook
(lambda ()
(setq indent-tabs-mode nil)))

(autoload 'python-mode "python-mode" "Python Mode." t)
(add-to-list 'auto-mode-alist '("\\.py\\'" . python-mode))
(add-to-list 'interpreter-mode-alist '("python" . python-mode))
(add-hook 'python-mode-hook
(lambda ()
(set (make-variable-buffer-local 'beginning-of-defun-function)
'py-beginning-of-def-or-class)
(setq outline-regexp "def\\|class ")
(setq indent-tabs-mode nil)))

(load (concat "/cygdrive/c/program-files/emacs-22.3/site-lisp/" "nxml-mode-20041004/rng-auto.el"))
(add-to-list 'auto-mode-alist '("\\.\\(xml\\|xsl\\|rng\\|xhtml\\)\\'" . nxml-mode))
(add-hook 'nxml-mode-hook
(lambda ()
(setq indent-tabs-mode nil)))
(setq magic-mode-alist
(cons '("<\\?xml" . nxml-mode)
magic-mode-alist))

(setq binary-process-input t)
(setq w32-quote-process-args ?\")
(setq shell-file-name "bash") ;; or sh if you rename your bash executable to sh.
(setenv "SHELL" shell-file-name)
;; For subprocesses invoked via the shell
;; (e.g., "shell -c command")
(setq explicit-shell-file-name shell-file-name)
(setq explicit-sh-args '("-login" "-i"))

(put 'erase-buffer 'disabled nil)