Thursday, April 22, 2010

Palindromic Pangrams

This post details a solution to a disused hiring problem. If you are a graph search wiz, you will find yourself skimming; I assume readers either haven't done this before or have forgotten it mostly. Even experts might be interested in the pathology, so hopefully there's something here for everyone to enjoy.

I've been unusually quiet lately because I'm in the midst of a truly marathon interview cycle. The things I'm working on probably aren't interesting to other people, and they're taking up all of my time. But before this began I spent some time working on a hiring puzzle that I think is blog-worthy.

I couldn't share the details of my approach at the time that I was actively working on the problem, because that would hurt the company and other candidates by raising the question of plagiarism. In fact, ITA removed the problem I was working on from their web site, and when I wrote their recruiter to ask why, I found out it was because somebody else wrote about it! Now it seems the rumor that Google is acquiring ITA is confirmed and some people are talking about other hiring problems. So I'm doubly sure it's OK to post this now.

First, the problem specification:

A palindrome is a sequence of words like "lid off a daffodil" or "shallot ayatollahs" that uses the same letters reading backwards as forwards. The words need not form a meaningful or grammatical sentence. A palindromic pangram is a multi-word palindrome that includes all 26 letters of the alphabet. Find the shortest sequence of words that is both a pangram and a palindrome.

That sounded like something that could occupy me for a few hours. Of course ITA would have chosen the dictionary so that brute-force approach wouldn't be fast enough, but with a reasonable choice of data structures and algorithms, it shouldn't be too hard. Writing up my solution would require lots of care; I imagined it would be carefully code reviewed to see if I was worth talking to, in which case I'd spend most of the interview defending my solution and talking about how it could be improved. Maybe several days to edit for clarity and efficiency and so on. I wound up spending weeks trying with some success to make it run faster. Besides learning some python, some humility and having an excuse to actually do things I'd previously only read about, I learned this lesson: the point of solving a puzzle is to GET A PHONE SCREEN. That's all. It was fun, but I won't repeat the mistake of undertaking substantial work without feedback.

If you want to play along at home, pause reading here and resume when you're happy with your own solution to compare notes.

Enough introduction. Let me show you some code along with commentary based on my notes. I started with a little brainstorming. I'll give you the main points first and then give a more thorough discussion later.

A quick note on optimization terminology: an optimization problem is normally posed with a set of constraints and an objective function. For example, the constraints for this problem specify that we are only interested in pangrams, and we are only interested in palindromes. We call solutions that satisfy the constraints feasible. The objective function grades the goodness of feasible solutions. For example, in this case our objective function is phrase length. An optimal solution is a feasible solution for which no other feasible solution is scored better by the objective function.

First observation: For each letter, some word in our pangram must contain that letter.

First idea: beam stack search word-sets.

Elaboration: Here I'm considering pangram formation as a search problem in the space of words organized by letter inclusion. Each dictionary word is a child of the empty root node. Each node has the complement of dictionary words represented by its (improper) ancestors as children. Assuming a small constant bound c on word length, classifying words will cost |words| * 26 * c = O(n). Among sibling nodes, I can prioritize words containing the rarest letter in order to avoid unfruitful branches. Even so, the six-figure branching factor gives a very big search space.

Second observation: For each prefix p of the letter-sequence of the pangram, the reverse of p must exist in some concatenation of words.

Second idea: use a trie for reversed words to help prune the search for palindromes. The overall search tree has min(n, 26^c)! nodes, so let's only build tries for feasible word-sets.

We need a formal definition of our goal:


def palindromeP(seq):
""" Characteristic fn for palindromes.

>>> palindromeP(['a'])
True

>>> palindromeP(['a', 'b', 'a'])
True

>>> palindromeP(['ab', 'c', 'ba'])
True

>>> palindromeP(['ab', 'c', 'ab'])
False
"""
return (lambda p: p[:int(len(p)/2)] == (p[int(math.ceil(len(p)/2.0)):])[::-1])(''.join(seq))

For those who don't use python, the triple-quotes delimit a docstring comment, and what looks like a toplevel transcript is a doctest specification showing inputs and expected outputs. The code just says that we are concerned with the concatenation of the given sequence: when the first half matches the reverse of the second half, we have a palindrome.

The definition of pangram depends on the letters that comprise our candidate solution:


class Histogram:
def __init__(self, seq):
self.h = collections.defaultdict(set)
for s in seq:
for i in s:
k = i.lower()
self.h[k].add(s)
def __len__(self):
return len(self.h)
def __getitem__(self, key):
return self.h[key]
def __iter__(self):
if (not hasattr(self, 'ordered_h')):
self.ordered_h = sorted(self.h.items(), key = lambda xy: -len(xy[1]))
return self.ordered_h.__iter__()
def top(self):
for x in self.__iter__():
return x[1]
def __repr__(self):
return '(Histogram %s)' % self.h.__repr__()

The histogram is a glorified dictionary mapping letters to words. It participates in the python idiomatic iteration scheme by partially ordering its values by overall letter frequency.


def pangramP(seq):
"""
Characteristic fn for pangrams.
(assumes the alphabet is limited to A-Za-z)

>>> pangramP([''])
False

>>> pangramP(['a'])
False

>>> pangramP([string.ascii_lowercase])
True

>>> pangramP(dict((x, x) for x in string.ascii_lowercase))
True
"""
if isinstance(seq, collections.KeysView):
h = seq
elif isinstance(seq, str):
h = set(seq)
elif isinstance(seq, collections.Mapping):
h = seq.keys()
elif isinstance(seq, Histogram):
h = seq
else:
h = Histogram(seq)
return len(h) == 26

Now, let's look at the search implementation. I've characterized the problem in terms of tree search, so I need a representation for nodes:


class WordSetNode:
def __init__(self, word, parent = None):
if not hasattr(word, '__iter__'):
word = str(word)
self.word = word
self.parent = parent
self.covered = dict.fromkeys(word)
if parent:
self.covered.update(self.parent.covered)
def coveredP(self, letter):
return letter in self.covered
def pangramP(self):
return pangramP(self.covered.keys())
def ancestors(self):
n = self
while (n):
yield n
n = n.parent
def phrase(self):
return reversed([n.word for n in self.ancestors() if n.word])
def phrase_string_ns(self):
if not hasattr(self, 'phrase_string_ns_cache'):
self.phrase_string_ns_cache = ''.join(self.phrase())
return self.phrase_string_ns_cache
def __repr__(self):
if not self.parent:
return "WordSetNode('%s')" % self.word
return "WordSetNode('%s', %s)" % (self.word, repr(self.parent))

The first thing you might notice is that I have backlinks from children to parents, and a very odd thing you'll notice is that I have no edges from parents to children! I'm not planning to build a tree and then search it for solutions. I want to build my tree on-demand in order to avoid needless computation and limit my memory footprint. Basically, I move from parents to children by generating nodes rather than by following a path through some pre-existing structure. The backlinks are there for convenience.

As in the sketch above, the tree search is a quest for pangrams having the palindrome property, so nodes are viewed as contributing letter-coverage and paths through the tree represent words in the overall pangram. I cache information about the path to each node along the way.

So, about those pangrams: my idea for pruning infeasible word-sequences involved prefix trees ("tries"). This is the first time I've used them in real life, so I will remind you that prefix trees are all about paths; the element data aren't associated with particular nodes but rather with paths from the root. It's very efficient for queries of the form: which elements have x as a prefix? In this case, my trie nodes will represent letters; the trie itself represents a collection of letter-sequences:


class Trie:
def __init__(self, seq = []):
if type(seq) == type(''):
seq = [seq]
self.terminus = False
self.children = {}
for i in seq:
self.add(i)
def emptyP(self):
return not self.children
def match(self, word):
"""Find the deepest subtrie representing
a prefix of word, and the remaining suffix.

>>> Trie().match('')
(Trie(), '')

>>> Trie().match('abc')
(Trie(), 'abc')

>>> Trie('abc').match('')
(Trie(['abc']), '')

>>> Trie('abc').match('a')
(Trie(['bc']), '')

>>> Trie('abc').match('aac')
(Trie(['bc']), 'ac')

>>> Trie(['abc', 'aac']).match('achtung')
(Trie(['ac', 'bc']), 'chtung')

>>> Trie(['abc', 'aac']).match('aardvark')
(Trie(['c']), 'rdvark')
"""

if not word:
return (self, word)
if not word[0] in self.children:
return (self, word)
return self.children[word[0]].match(word[1:])
def add(self, word):
if not word:
self.terminus = True
return
subtree, remainder = self.match(word)
if not remainder:
# mark the word if it's not already marked
return subtree.add('')
subtree.children[remainder[0]] = Trie([remainder[1:]])
def search(self, w):
quarry, remnant = self.match(w)
if remnant:
return None
return quarry
def traverse(self, preorderVisit, postorderVisit, prefix = ''):
preorderVisit(self, prefix)
for c in self.children.items():
c[1].traverse(preorderVisit, postorderVisit, prefix + c[0])
postorderVisit(self, prefix)
def pre_order_nodes(self, ancestry = None):
if ancestry is None:
ancestry = []
yield (self)
for c in self.children.items():
for i in c[1].pre_order_nodes():
yield i
def words(self, prefix = ''):
if self.terminus:
yield prefix
for c in self.children.items():
for l in c[1].words(prefix + c[0]):
yield l
def __repr__(self):
W = ', '.join(repr(w) for w in sorted(self.words()))
if W:
return "Trie([%s])" % W
else:
return 'Trie()'


trie: {a, daffodil, lid, lug, off, offset}

a trie representing {a, daffodil, lid, lug, off, offset}


As I explore my overall search tree, building up a sequence of words, I want to ensure that that my sequence has palindromic symmetry. I can do this by querying a trie-filled-with-reverse-words for my current search tree path.

The basic idea seems OK, but I haven't addressed word segmentation within the pangram. We disregard whitespace in the problem definition, so we can't limit our search to pangrams with word-length symmetry. We've got to account for "massive … sam" wherein we could reach a terminal node of the prefix trie of reversed words before exhausting letters in the current search path.


def loopyTrieSearch(t):
"""Produce a function that searches
a trie augmented with backedges from
each leaf to the root.
>>> t = Trie()
>>> s = loopyTrieSearch(t)
>>> s(t, 'unmatchable')
>>> t = Trie(['abc', 'aac', 'b'])
>>> s = loopyTrieSearch(t)
>>> s(t, '')
[Trie(['aac', 'abc', 'b'])]

>>> s(t, 'a')
[Trie(['ac', 'bc'])]

>>> s(t, 'ba')
[Trie(['ac', 'bc'])]

>>> s(functools.reduce(Trie.merge, s(t, 'b')), 'ab')
[Trie(['c'])]

>>> s(t, 'az')
>>> t = Trie(['ab', 'abc', 'd'])
>>> s = loopyTrieSearch(t)
>>> s(t, 'abd')
[Trie([''])]

>>> s(t, 'azzz')
"""
def s(t2, w):
ctx = t2 if isinstance(t2, collections.MutableSequence) else [t2]
if w == '':
return ctx
for l in w:
for i in ctx:
if i.terminus:
ctx.append(t)
break
ctx = [j[0] for j in (i.match(l) for i in ctx) if j[1] == '']
if not ctx:
return None
return ctx
return s

From these ideas, I built a program that ran very nicely on a contrived, tiny, dictionary, but when I ran it on the 200kword ITA dictionary, my cpu fan revved to 100% and stayed there indefinitely. I should show some profiling data here.

How could I build my search tree more cheaply? Initially, to generate a node's children, I would take my letter-coverage dictionary's complement to see what letters I needed. I'd find all the words that could contribute a needed letter. For each word, I'd use my loopyTrieSearch. This tactic does avoid exploring infeasible branches, but it involves a lot of unsuccessful trie queries. What if I started by loopyTrieSearching the prospective parent's path, and then considering the loopy suffixes as possible words containing a needed letter?

Consider a trie pair (A, B) with A containing "awk" and "awkward" and B containing only "awesome." The naive approach requires us to conduct separate trie searches for "awk" and "awkward," both times stopping when we fail to match the ks. The improved approach allows us to simultaneously eliminate both words as soon as we find that "aw" has no child "k." If the average branching factor in the trie over all words (26 at most) is less than the average number of words associated with each letter (200000/26), then the latter could be a superior tactic. This is the purpose of the Trie.filter method:


class Trie:
# ...
def filter(self, criteria, root = None):
"""Return the words in this trie that
match the augmentation of criteria by
the addition of backedges from terminals
to the root.

>>> sorted(w for w in d if s(T, w)) == [w for w, i in itertools.groupby(sorted(FT.filter(T)))]
True
"""
def f(words, crit, prefix = ''):
if words.terminus:
yield prefix
if words.emptyP():
return
backedge = root.children.items() if crit.terminus else ()
for c in itertools.chain(crit.children.items(), backedge):
m = words.match(c[0])
if m[1] != '': continue
for w in f(m[0], c[1], prefix + c[0]):
yield w
if not root: root = criteria
for i in f(self, criteria):
yield i
@staticmethod
def merge(a, b):
"""Merge two tries.
>>> Trie.merge(Trie(['a']), Trie(['b']))
Trie(['a', 'b'])

>>> Trie.merge(Trie(['abc']), Trie(['bcd']))
Trie(['abc', 'bcd'])

>>> Trie.merge(Trie(['abc', 'aac']), Trie(['abe']))
Trie(['aac', 'abc', 'abe'])
"""
result = Trie()
result.children = a.children.copy()
for k, v in b.children.items():
if k in result.children:
result.children[k] = Trie.merge(v, result.children[k])
else:
result.children[k] = v
result.terminus = a.terminus or b.terminus
return result

For this, I'll use a pair of tries, one reverse and one forward. My search path corresponds to paths in each of those tries; as I query in the trie pair I need never pursue a branch that's available in only one of the tries.


class TriePair:
def __init__(self, W, Wr = None):
if not Wr: Wr = W
self.fwd = W if isinstance(W, Trie) else Trie(W)
self.rev = Wr if isinstance(Wr, Trie) else Trie(w[::-1] for w in Wr)
self.s = loopyTrieSearch(self.rev)
def expand(self, node, advisoryBudget = None, peek = True):
self.setupContinuation(node)
if not node.trieContinuation: return
if peek and node.pangramP() and not node.parent.pangramP():
# don't peek in the recursive expansion
for i in self.completePalindrome(node, expand = functools.partial(self.expand, peek = False)):
yield i
W = itertools.chain(*(self.fwd.filter(c, self.rev) for c in node.trieContinuation))
awords = set(a.word for a in node.ancestors())
for w in set(W):
if w in awords: continue
n = WordSetNode(w, node)
# if pangram and \exists a word matching some trie
# path to a leaf -> generate the easy palindrome
n.setupContinuation = self.setupContinuation
yield n

I mentioned earlier that the search tree is constructed during the search process. The expand function does that work, producing the child nodes of each parent. I'll explain peeking later. It would be possible to query our search tries for the string corresponding to our path whenever we expand a node, but then each descendent must redo its ancestors' trie search work. So, we copy the parent's search context into a cache (trieContinuation) on each child in order to avoid that extra work:


class TriePair:
# ...
def setupContinuation(self, n):
if hasattr(n, 'trieContinuation'):
return
elif self == n and hasattr(n, 'setupContinuation'):
n.setupContinuation(n)
elif n.parent and hasattr(n.parent, 'trieContinuation'):
n.trieContinuation = (
self.s(n.parent.trieContinuation,
n.word))
else:
n.trieContinuation = (
self.s(self.rev,
n.phrase_string_ns()))

Let's take a closer look at beam-stack search.

Recall that depth-first search always expands the deepest unexplored node, while breadth-first search always expands the shallowest unexplored node. Depth-first search uses memory proportional to the longest path to a leaf; breadth-first search uses memory proportional to the size of the largest layer, O(d^b) where d is the longest path to a leaf and b is the branching factor at each node. For many problems, the better of two feasible solutions is the one closer to the root, and for these problems breadth-first search finds good solutions faster than depth-first search (which can pay an infinite penalty for unlucky early expansions).

Best-first search algorithms take an intermediate course between depth-first and breadth-first search. They choose which node to expand next by applying a heuristic. A* is a well-known such algorithm, and it's the basis for beam-stack search. The exploration order (denoted f(n) = g(n) + h(n)) takes into account both the (known) cost of each node's path from the root (g(n)) and the (estimated) cost of its path to a solution (h(n)). The estimation must be conservative in the sense that it never overestimates a potential solution's cost; this allows admissible pruning: once we've found a solution, we can safely avoid expanding any subtrees with higher cost estimates.

Best-first search can degenerate into breadth-first search in the worst case for memory use; beam-stack search addresses real-world memory constraints by enabling inadmissible pruning without compromising completeness (we will find a solution if one exists) or optimality (no solution is better than the one we find). Beam-stack search does this by keeping metadata proportional to the depth of the search that allows it to backtrack, regenerating and thoroughly exploring inadmissibly pruned nodes after it has explored the more promising portion of the graph.

Note that beam-stack search is an anytime algorithm: it strives to return a feasible solution ASAP and then to return improved solutions as they discovered. It is also complete and optimal, so it will definitely return a solution if any exists and the final solution it returns is guaranteed to be at least as good as any that exist.

The following graphs illustrate the progress of a beam-stack search on a small graph with a simple cost function and heuristic.
root node with boundsBounds on the f-cost of child nodes are associated with each layer. Initially, we consider all possible children. Whenever we generate a child, we check for solution feasibility.
first layer expansion

For each node, we can compute the (certain) g-cost of the partial solution and the (guessed) h-cost of the potential solution descendent of the node. In this example, we assume that g is phrase length (not including spaces) and h is the number of letters we need to add to produce a pangram. These are sensible, if naive, assumptions for the pangramic palindrome problem. But, to keep this example small, we will cheat by using an easier feasibility predicate: we will accept any palindrome; pangrams are not required.

search progress

When memory is limited, we inadmissibly discard some children, updating the parent layer upper bound accordingly. If backtracking proves necessary, we can use the upper bound to regenerate only the inadmissibly discarded children, thus avoiding duplicated work. Assuming a beam width of 2 nodes, we partition the set of children by f-cost. In this case, a and lid survive while off and daffodil are pruned.

search progress

We are ready to expand the frontier.

search progress

The layer exceeds our memory threshold, so it's time to prune again.

search progress

We retain the most promising nodes and update the layer bound.

search progress

We are ready to expand the frontier.

increased memory bound

Our frontier exceeds our memory limit, but I think I've made the point about pruning so let's double the limit to 4. OK then we can expand again.

no further expansion is possible without backtracking

If the dictionary contains only the words {lid, off, a, daffodil} then we are unable to expand beyond this point without duplicating words. Notice that we have yet to find any palindromes. Because we cannot go forward, we must go backtrack.

backtrack

We backtrack to the deepest layer at which inadmissible pruning occurred (i.e., where the upper bound is finite). There, we update the bounds to show that we have fully explored the space below the upper bound; now we will consider the space above. We are ready to expand the frontier.

search progress

Inadmissible pruning works just the same after backtracking as before. To see this, let's tighten the memory limit back to two.

pruning

As ever, we remove the nodes with high f-cost.

search progress

We are ready to expand the frontier.

search progress

This layer exceeds our memory limit, so we need to prune.

pruning

We discard unpromising nodes.

search progress

We are ready to expand the frontier.

first candidate solution found

We find a feasible solution on the frontier: lidoffadaffodil. This is a proof by example of the maximum cost of the optimal solution to our problem. Our heuristic is admissible, meaning that it never underestimates the cost of the best solution that uses a given node. Therefore, from now on, we can admissibly prune nodes whose f-cost exceed the f-cost of our candidate solution. The memory limit might still cause us to do some inadmissible prunings in the future, and in that case we might need to backtrack. But, admissible prunings never create an obligation to backtrack. If we continue from here, we will eventually exhaust the space between 27 (where we inadmissibly pruned some nodes) and 35 (where we have a solution) and determine that in fact we have an optimal solution.

So here are the parts to our cost function f:


def g(n):
# sort of like
# lambda n: sum(len(w.word) for w in n.ancestors())
# but with caching
if not hasattr(n, 'g'):
result = len(n.word)
if n.parent:
result += g(n.parent)
n.g = result
return n.g
def h(n):
"""Admissible heuristic estimate of cost
remaining from this node.

>>> h(WordSetNode('a')) == h(WordSetNode('aa'))
True

Though admissible, it's reasonable tight.
>>> minimalCost = 26 * 2 - 1
>>> inadmissibleMargin = 1
>>> n = WordSetNode('a')
>>> h(n) > minimalCost - inadmissibleMargin - len(''.join(n.phrase()))
True
"""
phrase_width = g
tie_breaker = .01 * len(n.word) + .01 * abs(hash(n.phrase_string_ns()))/float(sys.maxsize)
covered = len(n.covered)
completely = len(string.ascii_lowercase)
# The symmetry of the palindrome guarantees
# that the first half is a pangram. So,
# there's no use evaluating palindrome-dness
# until we have a pangram.
if covered < completely:
# descendent nodes must:
# cover the missing letters on the left
# (just to the right of n)
# match the missing letters on the right
# (with a possible exception at the pivot)
# match what we have so far
# TODO: improve this estimate by:
# (a) looking ahead in the trie for the missing
# letters
# (b) using the max_missing(min(len(word) | missing \in word))
# rather than 1 for the cost component due to
# missing letters
missing = completely - covered
# strategy b:
# this doesn't help: when letterCosts
# wins the max, we lose our ability to
# distinguish between nodes that help
# us gain coverage and nodes that do not.
#needed = max(missing + (missing - 1), max(letterCosts[i] for i in string.ascii_lowercase if not i in n.covered))
needed = missing + (missing - 1)
return phrase_width(n) + needed - tie_breaker
else:
return palindrome_balance(n) - tie_breaker

And here's how those functions are used, along with parameters for tuning memory use (beam_width), the upper bound on cost for solutions we're willing to accept (U), a predicate to control which intermediate solutions we yield, and a way to inject a stack (which is handy for resuming cancelled searches):


def bss(nodes, expand, g, h, beam_width, U = sys.maxsize, feasibleP = lambda n: False, stack = None):
"""Beam-stack search, a variation on
breadth-first branch and bound with
backtracking.

>>> [i for i in bss([], None, lambda n: 1, lambda n: 1, 10)]
[]

>>> [i for i in bss([1], lambda n: [], lambda n: 1, lambda n: 1, 10)]
[1]

>>> sorted([i for i in bss(range(15), lambda n: [], lambda n: 1, lambda n: 1, 10)])
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

>>> sorted([i for i in bss(range(5), intexpand, lambda n: 1, lambda n: 0, 3)])
[1, 2]

"""
def dedup(frontier):
# This is a warning rather than an error:
# some nodes really are equivalent though
# not identical. For example, spaces don't
# matter in pangramic palindromes.
frontier = sorted(frontier, key = lambda p: p[0])
groupedFrontier = (
itertools.groupby(
sorted(frontier,
key = lambda p: p[0]),
lambda p: p[0]))
frontier = []
for i in groupedFrontier:
elts = list(itertools.islice(i[1], 2))
if (len(elts) <= 1):
frontier.extend(elts)
else:
frontier.append(elts[0])
print('f needs nuance: these nodes have the same f-cost: %d = f(%s) = f(%s)' % (elts[0][0], ' '.join(elts[0][1].phrase()), ' '.join(elts[1][1].phrase())))
return sorted(frontier, key = lambda p: p[0])
def prune(frontier, beam_width):
if len(frontier) <= beam_width:
f_max = frontier[-1][0]
print('pruned nodes, new f_max = %s' % f_max)
return frontier, f_max
# We'll never backtrack to the nodes
# we're pruning, even if these nodes
# are never admissibly discarded.
# beam_width may be smaller than the largest
# equivalence class defined by f.
# Ideally, these classes should contain
# one node each; otherwise it's difficult
# to choose a beam width that will avoid
# splitting an equivalence class.
s = dedup(frontier)
retained, pruned = s[:beam_width], s[beam_width:]
return prune(retained, beam_width)

# 63: verbify ma pluck now sh ad go jo zax eta qat ex azo jog dahs wonk culpa my fib rev
# 65: buckrams donzel ti pyx of vug we jo hm suq us mho jew guv foxy pit lez nods mark cub
# 65: verbify na ducks to hm we jo zax al pig suq us gip lax azo jew mho tsk cud any fib rev
# 67: bonked vulgars ti hm oy cap wo jo zax if suq us fix azo jow pac yom hits rag luv de knob
# 69: verbify locks twang up mho zax ed ma jab suq us ba jam dex azo hm pug naw tsk coly fib rev
# 71: verbify lock nut spaz we jig am dex eh ta qua sau qat hexed magi jew zap stunk coly fib rev
# 76: drawknife butyls go hm op ox azo jo we vac suq us suq us cave wo jo zax op om hog sly tube fink ward
# 86: de but way xi sh on vug la ef pa me re ka jo ah coz om ta qat ta qat mozo chao jake remap feal guv noh six yaw tubed
# 92: drawknife butyls go hm op ox azo jo we vac suq usherettes setter eh suq us cave wo jo zax opo mho g sly tube fink ward
# 101: underflow machs ti by xi pe ka jo zag eve suq aimu umiaq use vega zo jake pixy bits h cam wolfr ed nu

f = lambda n: g(n) + h(n)
if not stack:
oldFrontier = [(f(i), i) for i in nodes]
for i in filter(lambda x: feasibleP(x[1]), oldFrontier):
U = min(U, g(i[1]))
yield i[1]
oldFrontier = [i for i in oldFrontier if i[0] < U]
stack = [[0, U, oldFrontier]]
j = 0
while stack:
f_min, f_max, oldFrontier = stack[-1]
print('f_min, f_max, level = %s, %s, %d' % (f_min, f_max, len(stack)))
if oldFrontier: print('sample phrase: %s' % ' '.join(oldFrontier[0][1].phrase()))
frontier = []
for i, o in enumerate(oldFrontier):
visit(o[1])
j += 1
if not j % 32: print('nodes expanded: %d' % j)
k = 0
for n in expand(o[1], advisoryBudget = f_max):
k += 1
# if not k % 8196: print('progress on frontier: %d' % k)
if feasibleP(n):
gn = g(n)
if gn >= U: continue
# g-values are integers, and f
# is admissible, so all the f-values
# above g - 1 have f-cost at least
# as high as g (i.e., they're no
# better than this solution).
U = gn - 1 + (1.0 / sys.maxsize)
f_max = U
yield n
oldFrontier[i + 1:] = filter(lambda p: p[0] < U, oldFrontier[i + 1:])
continue
p = (f(n), n)
if f_min < p[0] < f_max:
frontier.append(p)
if len(frontier) > beam_width * 2:
frontier, f_max = prune(frontier, beam_width)
# filter, sort, reduce equivalence classes to class representatives
prior_frontier = list(frontier)
frontier = filter(lambda p: p[0] <= f_max, frontier)
frontier = dedup(frontier)
if not frontier:
# backtrack
print('empty layer -> backtracking from %d to' % len(stack), end=' ')
while stack and stack[-1][1] >= U:
stack.pop()
print(len(stack), end=' ')
if stack:
frame = stack[-1]
frame[0], frame[1] = frame[1], U
frame[2] = list(filter(lambda n: n[0] < end="'')

Now, a very obvious improvement can be had by noticing that we guarantee palindromes by construction. Once we attain the pangram property, it's entirely straightforward to complete the palindrome. I call that "peeking." Here's a way to take the short-cut at the appropriate moment during bss:


class TriePair:
# ...
def pivotH(self, n):
revCompletions = self.s(self.rev, n.phrase_string_ns())
completionWords = itertools.chain(*(t.words() for t in revCompletions))
return min(len(w) for w in completionWords) - (32 - len(n.word)) / 32 - abs(hash(n.phrase_string_ns())) / sys.maxsize / 64
@staticmethod
def rightH(n, expected_length):
return expected_length - sum(len(i) for i in n.phrase()) - abs(hash(n.phrase_string_ns())) / sys.maxsize
def rightExpand(self, n, match, len_left):
"""Restricted search space for quickly finding
pangramic palindrome completions.
>>> left = 'drawknife butyls go hm op ox azo jo we vac suq us'.split()
>>> right = 'suq us cave wo jo zax op om hog sly tube fink ward'.split()
>>> extraneous = ['opossum']
>>> palindrome = left + right
>>> tries = TriePair(palindrome + extraneous)
>>> all(tries.fwd.match(i)[1] == '' for i in palindrome)
True
>>> n = functools.reduce(lambda a, i: WordSetNode(i, a), left, None)
>>> match = ''.join(left)[::-1]
>>> rx = functools.partial(tries.rightExpand, match)
>>> x = list(rx(n))
>>> [i.word for i in x]
['suq']

>>> n = functools.reduce(lambda a, i: WordSetNode(i, a), itertools.chain(left, itertools.takewhile(lambda p: p != 'op', right)), None)
>>> m, r = tries.fwd.match('op')
>>> r == '' and sum(1 for i in m.words()) > 1
True

>>> [i.word for i in list(rx(n))]
['op']

>>> n = functools.reduce(lambda a, i: WordSetNode(i, a), palindrome[:-1], None)
>>> palindrome[-1] in (i.word for i in list(rx(n)))
True
"""
m = match[sum(len(i) for i in n.phrase()) - len_left:]
r = self.fwd.match(m)[1]
nextWordSupersequence = m[:-len(r)] if r else m
for i in self.fwd.filter(Trie([nextWordSupersequence]), Trie()):
yield WordSetNode(i, n)
def completePalindrome(self, pangram, expand, f_max = sys.maxsize, match = None, left = None):
# the pivot of the palindrome can be anywhere
# left of the last letter needed for the pangram.
# So it could be in the rightmost word of pangram,
# or it could be in words yet to be added to our
# phrase. We'll expand the pangram in the usual
# way elsewhere, so we don't need to be complete
# here. We do need to remain feasible, however.
# Our strategy here is to find a pivot that
# coincides with a word boundary, so that we can
# use the reverse of the phrase as the sequence
# of letters to appear on the right. This will
# give us a much faster way to complete a
# palindrome and establish a tighter upper bound
# for the rest of our search.
# TODO: also check for more interior pivots by
# extending keystoneP to enumerate all possible
# interior pivots rather than just the first one.
# ababa -> 1, 2, 3
if match and left:
len_match = len(match)
len_left = len(left)
rx = functools.partial(self.rightExpand, match = match, len_left = l
rh = functools.partial(TriePair.rightH, expected_length = len_left + len_match)
print('building right with match = %s and left = %s' % (match, left))
for completion in bss([pangram], rx, g, rh, 1, len_left + len_match + 1/sys.maxsize, feasibleP):
yield completion
return
if keystoneP(pangram.word):
match = pangram.parent.phrase_string_ns() + pangram.word[:keystoneP(pangram.word)]
for i in self.completePalindrome(pangram, expand, f_max, match = match, left = pangram.phrase_string_ns()):
f_max = min(f_max, g(completion))
yield i
for left in bss([pangram], expand, g, self.pivotH, 1, f_max, lambda n: any(t.terminus for t in self.s(self.rev, n.phrase_string_ns()) + self.s(self.rev, n.phrase_string_ns()[:-1]))):
match = left.phrase_string_ns()[::-1]
for m in (match[1:], match):
for completion in self.completePalindrome(left, expand, f_max, match = m, left = match):
yield completion

In essence, it comes down to switching to very well-informed heuristics that can exploit our foreknowledge of the letter sequence we want in the second half of the phrase.

This still tied up my laptop for many hours. I had two ideas for making progress: (1) reduce the dictionary in hopes of lowering the upper bound for the search and (2) experiment with different heuristics in hopes of dynamically finding better solutions more quickly. As an example of (1), I tried excluding words with multiple occurrences of any letter. As an example of (2), I tried using a piecewise heuristic function that penalizes letter repetition early in the search, but also recognizes that the pangram may contain a "keystone" word overlapping its palindromic midpoint.

Here's how I disallowed intraword letter duplication:


def minKeystoneP(w):
"""Determine whether this word's interior
contains the palindromic pivot of a theoretically
minimal-sized pangramic palindrome.
>>> minKeystoneP('abc')
False

>>> minKeystoneP('aba')
True

>>> minKeystoneP('abaa')
False

"""
k = keystoneP(w)
if not k: return False
sides = w[:k], w[k:]
return all(len(s) == len(set(s)) for s in sides)

def mintries(d):
minKeystones = [w for w in d if minKeystoneP(w)]
nodups = [w for w in d if len(set(w)) == len(w)]
return TriePair(minKeystones + nodups)

Here's some profiling data from a run that used a mintries dictionary (stopped at the first feasible solution). I used beam_width = 512, U = 1024:

solution (65): buckrams donzel ti pyx of vug we jo hm suq us mho jew guv foxy pit lez nods mark cub
3063736602 function calls (1650738620 primitive calls) in 15627.224 CPU seconds
Ordered by: standard name
ncallstottimepercallcumtimepercallfilename:lineno(function)
10.0000.00015627.22315627.223<string>:1(<module>)
3112568967.4200.000364.3520.000<string>:100(pangramP)
463188677289.5710.000289.5710.000<string>:102(ancestors)
31119396148.4730.000562.3840.000<string>:107(phrase)
31119396258.9850.000413.9110.000<string>:108(<listcomp>)
31119402137.0970.000896.4150.000<string>:109(phrase_string_ns)
183/780.0010.0000.0010.000<string>:121(__init__)
285653338160.5940.000160.5940.000<string>:129(emptyP)
-495413458/-7811441303292.676-0.0003292.676-0.000<string>:131(match)
144/390.0010.0000.0010.000<string>:155(add)
48/80.0000.0000.0000.000<string>:176(words)
1230988464/1348405768569.2050.00012100.4590.000<string>:188(_filter_r)
134840576127.8940.00012228.3530.000<string>:199(filter)
19100.0050.0000.0430.000<string>:25(palindromeP)
62480.1590.0000.9310.000<string>:260(s)
1565450.2010.0000.5280.000<string>:269(<genexpr>)
259390.1810.0000.7090.000<string>:269(<listcomp>)
20.0000.0000.0010.001<string>:303(pivotH)
40.0000.0000.0000.000<string>:305(<genexpr>)
80.0000.0000.0000.000<string>:306(<genexpr>)
510.0010.0000.0070.000<string>:307(rightH)
8720.0010.0000.0010.000<string>:309(<genexpr>)
870.0010.0000.0090.000<string>:310(rightExpand)
6330.0010.0000.0010.000<string>:337(<genexpr>)
8/30.0000.0000.3910.130<string>:343(completePalindrome)
19100.0130.0000.0160.000<string>:37(<lambda>)
60.0000.0000.0000.000<string>:377(<genexpr>)
20.0000.0000.0020.001<string>:377(<lambda>)
62620.0490.0001.0240.000<string>:383(setupContinuation)
31125244430.6040.00013368.3630.000<string>:397(expand)
217000.0370.0000.0370.000<string>:404(<genexpr>)
216034002221.9310.000356.5290.000<string>:406(<genexpr>)
63010.0750.0000.1680.000<string>:416(visit)
63010.0410.0000.0780.000<string>:419(<listcomp>)
100.0000.0000.0000.000<string>:424(<listcomp>)
93363173/62244144198.1800.000322.0370.000<string>:440(g)
1790.0020.0000.0080.000<string>:450(partition)
1790.0040.0000.0360.000<string>:458(palindrome_balance)
31118982306.3940.0001302.7800.000<string>:521(h)
3111903548.4620.000412.7540.000<string>:563(feasibleP)
11/2203.03318.45815627.2237813.611<string>:566(bss)
256/1280.1950.0010.2980.002<string>:580(prune)
1250680.0440.0000.0440.000<string>:585(<lambda>)
3112568975.2910.000286.5330.000<string>:60(pangramP)
3111903571.7540.0001642.5390.000<string>:615(<lambda>)
70.0000.0000.0020.000<string>:617(<listcomp>)
70.0000.0000.0020.000<string>:618(<lambda>)
70.0000.0000.0000.000<string>:621(<listcomp>)
6600.0000.0000.0000.000<string>:648(<lambda>)
75110.0040.0000.0040.000<string>:657(<lambda>)
69310.0020.0000.0020.000<string>:658(<lambda>)
69310.0020.0000.0020.000<string>:659(<lambda>)
450.0120.0000.0160.000<string>:659(<listcomp>)
6410.0000.0000.0000.000<string>:669(<lambda>)
10.0000.00015627.22315627.223<string>:677(profileDriver)
20.0000.0000.0000.000<string>:682(keystoneP)
31119030149.9290.000351.3390.000<string>:90(__init__)
40.0000.0000.0000.000_abcoll.py:122(__subclasshook__)
20.0000.0000.0000.000_weakrefset.py:10(__init__)
80.0000.0000.0000.000_weakrefset.py:20(__iter__)
3113195048.8030.00048.8030.000_weakrefset.py:29(__contains__)
40.0000.0000.0000.000_weakrefset.py:36(add)
3113193775.1110.000123.9140.000abc.py:118(__instancecheck__)
4/30.0000.0000.0000.000abc.py:134(__subclasscheck__)
9420.0020.0000.0060.000ascii.py:21(encode)
10.0000.0000.0000.000codecs.py:164(__init__)
10.0000.0000.0000.000codecs.py:937(getincrementalencoder)
9420.0100.0000.0160.000io.py:1031(write)
4470.0020.0000.3640.001io.py:1069(flush)
4470.0040.0000.3610.001io.py:1073(_flush_unlocked)
4470.0010.0000.3650.001io.py:1454(flush)
9420.0030.0000.0040.000io.py:1465(closed)
9420.0120.0000.4040.000io.py:1479(write)
10.0000.0000.0000.000io.py:1500(_get_encoder)
23310.0020.0000.0020.000io.py:755(closed)
63010.0050.0000.0050.000{built-in method abs}
20.0000.0000.0000.000{built-in method any}
9420.0040.0000.0040.000{built-in method ascii_encode}
19100.0010.0000.0010.000{built-in method ceil}
10.0010.00115627.22415627.224{built-in method exec}
3111903069.1970.00069.1970.000{built-in method fromkeys}
40.0000.0000.0000.000{built-in method getattr}
155614108204.2890.000204.2890.000{built-in method hasattr}
3111898217.3150.00017.3150.000{built-in method hash}
530.0000.0000.0000.000{built-in method id}
3113382176.9050.000200.8190.000{built-in method isinstance}
50.0000.0000.0000.000{built-in method issubclass}
15569546650.3740.00050.3740.000{built-in method len}
10.0000.0000.0000.000{built-in method lookup}
1790.0000.0000.0000.000{built-in method max}
40.0000.0000.0000.000{built-in method min}
69310.0020.0000.0020.000{built-in method next}
4470.0050.0000.4080.001{built-in method print}
900.0010.0000.0030.000{built-in method sum}
13890.0010.0000.0010.000{method '__enter__' of '_thread.lock' objects}
20.0000.0000.0000.000{method '__subclasses__' of 'type' objects}
40.0000.0000.0000.000{method 'add' of 'set' objects}
827570.0340.0000.0340.000{method 'append' of 'list' objects}
10.0000.0000.0000.000{method 'disable' of '_lsprof.Profiler' objects}
9420.0010.0000.0010.000{method 'extend' of 'bytearray' objects}
26306010678.3120.00078.3120.000{method 'items' of 'dict' objects}
31120994126.5320.000126.5320.000{method 'join' of 'str' objects}
3113199010.4020.00010.4020.000{method 'keys' of 'dict' objects}
310.0000.0000.0000.000{method 'pop' of 'list' objects}
31119029111.0080.000111.0080.000{method 'update' of 'dict' objects}
4470.3570.0010.3570.001{method 'write' of '_FileIO' objects}

Using the same parameters but substituting a piecewise expand function to reduce the search space, I produced the same solution in 12949.838 CPU seconds (shortly after the 7232nd node expansion):


# this function takes a TriePair to an expand function
# suitable for use with bss:
def budgetExpand(ordinarytries):
def bex(n, advisoryBudget):
if n.pangramP():
for x in ordinarytries.expand(n):
yield x
return
letterAllowance = int((advisoryBudget / 2) - g(n))
if letterAllowance < 5:
missing = len(string.ascii_lowercase) - len(n.covered)
if missing < letterAllowance:
# we might be able to afford to waste the next word
for x in ordinarytries.expand(n):
yield x
return
if missing == 2:
# find a suitable tp
for lp, t in raretries:
if not any(l in n.covered for l in lp):
for x in t.expand(n):
yield x
return
if missing <= 4:
letters = string.ascii_lowercase - n.covered.keys()
if all(l in lettertries for l in letters):
for x in itertools.chain.from_iterable(t.expand(n) for l, t in lettertries.items() if not n.coveredP(l)):
yield x
return
for x in ordinarytries.expand(n):
yield x
return bex
2481633428 function calls (1172623990 primitive calls) in 12949.838 CPU seconds
Ordered by: standard name
ncallstottimepercallcumtimepercallfilename:lineno(function)
10.0000.00012949.83812949.838<string>:1(<module>)
2822166456.8480.000307.6800.000<string>:100(pangramP)
396097158231.4550.000231.4550.000<string>:102(ancestors)
28206562130.0970.000464.8240.000<string>:107(phrase)
28206562210.2860.000334.7270.000<string>:108(<listcomp>)
28207750120.3520.000750.6410.000<string>:109(phrase_string_ns)
75/320.0000.0000.0010.000<string>:121(__init__)
265064474140.7580.000140.7580.000<string>:129(emptyP)
-776844655/-10419877952885.248-0.0002885.248-0.000<string>:131(match)
59/160.0000.0000.0000.000<string>:155(add)
24/40.0000.0000.0000.000<string>:176(words)
1140618203/1249585936869.4110.0009963.4100.000<string>:188(_filter_r)
124958593113.4430.00010076.8520.000<string>:199(filter)
3940.0010.0000.1450.000<string>:25(palindromeP)
72600.1620.0000.9530.000<string>:260(s)
1763640.2080.0000.5450.000<string>:269(<genexpr>)
283810.1880.0000.7330.000<string>:269(<listcomp>)
10.0000.0000.0000.000<string>:303(pivotH)
20.0000.0000.0000.000<string>:305(<genexpr>)
40.0000.0000.0000.000<string>:306(<genexpr>)
210.0000.0000.0020.000<string>:307(rightH)
3500.0000.0000.0000.000<string>:309(<genexpr>)
360.0000.0000.0030.000<string>:310(rightExpand)
2510.0000.0000.0000.000<string>:337(<genexpr>)
5/20.0000.0000.2330.117<string>:343(completePalindrome)
3940.0020.0000.1420.000<string>:37(<lambda>)
30.0000.0000.0000.000<string>:377(<genexpr>)
10.0000.0000.0010.001<string>:377(<lambda>)
91800.0520.0001.0370.000<string>:383(setupContinuation)
28215608356.8220.00011018.0650.000<string>:397(expand)
288400.0490.0000.0490.000<string>:404(<genexpr>)
183935843180.8950.000287.8570.000<string>:406(<genexpr>)
72870.0700.0000.1740.000<string>:416(visit)
72870.0450.0000.0890.000<string>:419(<listcomp>)
100.0000.0000.0000.000<string>:424(<listcomp>)
84626570/56420123168.2730.000271.3540.000<string>:440(g)
290.0000.0000.0020.000<string>:450(partition)
290.0010.0000.0080.000<string>:458(palindrome_balance)
28206428257.7130.0001093.4650.000<string>:521(h)
2820645141.0200.000348.6550.000<string>:563(feasibleP)
7/2170.54724.36412949.8386474.919<string>:566(bss)
252/1260.1930.0010.2580.002<string>:580(prune)
1271060.0480.0000.0480.000<string>:585(<lambda>)
2822166463.6210.000242.1180.000<string>:60(pangramP)
2820645063.1010.0001381.8730.000<string>:616(<lambda>)
40.0000.0000.0010.000<string>:618(<listcomp>)
40.0000.0000.0010.000<string>:619(<lambda>)
40.0000.0000.0000.000<string>:622(<listcomp>)
4970.0000.0000.0000.000<string>:649(<lambda>)
85330.0040.0000.0040.000<string>:658(<lambda>)
77840.0020.0000.0020.000<string>:659(<lambda>)
77840.0040.0000.0040.000<string>:660(<lambda>)
7590.0000.0000.0000.000<string>:670(<lambda>)
10.0000.0000.0000.000<string>:683(keystoneP)
2821369929.2990.00011048.2200.000<string>:826(bex)
351640.0180.0000.0180.000<string>:844(<genexpr>)
37790.0030.0000.0030.000<string>:852(<genexpr>)
37310.0240.0000.0320.000<string>:854(<genexpr>)
28206448126.2050.000295.1140.000<string>:90(__init__)
145760.0080.0000.0080.000<string>:98(coveredP)
40.0000.0000.0000.000_abcoll.py:122(__subclasshook__)
20.0000.0000.0000.000_weakrefset.py:10(__init__)
80.0000.0000.0000.000_weakrefset.py:20(__iter__)
2822893440.8110.00040.8110.000_weakrefset.py:29(__contains__)
40.0000.0000.0000.000_weakrefset.py:36(add)
2822892463.0550.000103.8660.000abc.py:118(__instancecheck__)
4/30.0000.0000.0000.000abc.py:134(__subclasscheck__)
63990.0140.0000.0370.000ascii.py:21(encode)
10.0000.0000.0000.000codecs.py:164(__init__)
10.0000.0000.0000.000codecs.py:937(getincrementalencoder)
63990.0660.0000.1030.000io.py:1031(write)
31900.0140.0000.6700.000io.py:1069(flush)
31900.0260.0000.6550.000io.py:1073(_flush_unlocked)
31900.0090.0000.6790.000io.py:1454(flush)
63990.0130.0000.0210.000io.py:1465(closed)
63990.0750.0000.9190.000io.py:1479(write)
10.0000.0000.0000.000io.py:1500(_get_encoder)
159880.0140.0000.0140.000io.py:755(closed)
72870.0050.0000.0050.000{built-in method abs}
9230.0030.0000.0050.000{built-in method all}
153570.0130.0000.0240.000{built-in method any}
63990.0230.0000.0230.000{built-in method ascii_encode}
3940.1390.0000.1390.000{built-in method ceil}
10.0000.00012949.83812949.838{built-in method exec}
9110.0010.0000.0010.000{built-in method from_iterable}
2820644858.5170.00058.5170.000{built-in method fromkeys}
40.0000.0000.0000.000{built-in method getattr}
141057204171.3260.000171.3260.000{built-in method hasattr}
2820642814.6060.00014.6060.000{built-in method hash}
220.0000.0000.0000.000{built-in method id}
2824172265.7300.000169.5950.000{built-in method isinstance}
50.0000.0000.0000.000{built-in method issubclass}
14117409042.9040.00042.9040.000{built-in method len}
10.0000.0000.0000.000{built-in method lookup}
290.0000.0000.0000.000{built-in method max}
20.0000.0000.0000.000{built-in method min}
77660.0020.0000.0020.000{built-in method next}
31900.0310.0000.9490.000{built-in method print}
370.0000.0000.0010.000{built-in method sum}
95890.0090.0000.0090.000{method '__enter__' of '_thread.lock' objects}
20.0000.0000.0000.000{method '__subclasses__' of 'type' objects}
40.0000.0000.0000.000{method 'add' of 'set' objects}
857970.0340.0000.0340.000{method 'append' of 'list' objects}
10.0000.0000.0000.000{method 'disable' of '_lsprof.Profiler' objects}
63990.0080.0000.0080.000{method 'extend' of 'bytearray' objects}
24419620668.3310.00068.3310.000{method 'items' of 'dict' objects}
28206877105.9380.000105.9380.000{method 'join' of 'str' objects}
282298748.7170.0008.7170.000{method 'keys' of 'dict' objects}
150.0000.0000.0000.000{method 'pop' of 'list' objects}
2820644792.2870.00092.2870.000{method 'update' of 'dict' objects}
31900.6260.0000.6260.000{method 'write' of '_FileIO' objects}

What about that penultimate case, searching up to four overlapping TriePairs each representing the set of words containing a missing letter? How much does it help? The mintriesd superset has 35103 words, and a sample of four-combinations range in size from 11263 to 24428 words. But, the average size lettertrie has 4784 words, so we get a noticeable speedup even though trie search has logarithmic time complexity. Without the lettertries case, we need 2656858411 function calls (1319973770 primitive calls) in 14205.551 CPU seconds.

Finally, I added to the budgetExpand repertoire with lengthtries.

The idea for lengthtries comes from the observation that it would be useful to limit the depth of trie traversals as we approach the current cost bound. With this technique I brought the first solution discovery down to 398442028 function calls in 9485.551 CPU seconds:


def budgetExpand(ordinarytries):
raretries = [(lp, TriePair([w for w in ordinarytries.fwd.words() if all(l in w for l in lp)])) for lp in 'jm vx jy fx jk gq mq jp qy bq hq pq kz jv wz hj fq qv xz kq jw fj qz wx jz kx qw jx qx jq'.split()]
hist = Histogram(ordinarytries.fwd.words())
lettertries = {l : TriePair(hist[l]) for l in [k for k, v in sorted(hist.h.items(), key = lambda i: len(i[1]))][:16]}
lengthtries = {i : TriePair((w for w in mintriesd.fwd.words() if len(w) <= i or keystoneP(w)), mintriesd.rev.words()) for i in range(1, 8)}
def bex(n, advisoryBudget):
if n.pangramP():
for x in ordinarytries.expand(n):
yield x
return
letterAllowance = int((advisoryBudget / 2) - g(n))
if letterAllowance < 5:
missing = len(string.ascii_lowercase) - len(n.covered)
if missing < letterAllowance:
# we might be able to afford to waste the next word
for x in lengthtries[letterAllowance].expand(n):
yield x
return
if missing == 2:
# find a suitable tp
for lp, t in raretries:
if not any(l in n.covered for l in lp):
for x in t.expand(n):
yield x
return
if missing <= 4:
letters = string.ascii_lowercase - n.covered.keys()
if all(l in lettertries for l in letters):
for x in itertools.chain.from_iterable(t.expand(n) for l, t in lettertries.items() if not n.coveredP(l)):
yield x
return
elif letterAllowance in lengthtries:
for x in lengthtries[letterAllowance].expand(n):
yield x
return
for x in ordinarytries.expand(n):
yield x
return bex

One last idea for search space reduction is to exclude words containing covered letters:


def budgetExpand(ordinarytries):
# least-common 2-combinations of letters in d:
# [('kx', 53), ('qw', 27), ('jx', 27), ('qx', 26), ('jq', 8)]
raretries = [(lp, TriePair((w for w in ordinarytries.fwd.words() if all(l in w for l in lp)), ordinarytries.rev)) for lp in 'jm vx jy fx jk gq mq jp qy bq hq pq kz jv wz hj fq qv xz kq jw fj qz wx jz kx qw jx qx jq'.split()]
hist = Histogram(ordinarytries.fwd.words())
lettertries = {l : TriePair(hist[l], ordinarytries.rev) for l in [k for k, v in sorted(hist.h.items(), key = lambda i: len(i[1]))][:16]}
lengthtries = {i : TriePair((w for w in ordinarytries.fwd.words() if len(w) <= i or keystoneP(w)), ordinarytries.rev) for i in range(1, 8)}
def mfl(k, n):
"""Kth most frequent letter k, n -> (#, a)"""
if not hasattr(n, 'hist'):
n.hist = Histogram(n.tp.fwd.words())
lf = list(itertools.islice(n.hist, k + 1))[-1]
return (sum(1 for i in lf[1]), lf[0])
exttltries = Trie()
exttltries.tp = mintriesd
# this isn't optimal; there may be unreachable
# nodes and some of those may be more selective
# than letter-equivalent reachable nodes. But
# it won't yield incorrect results and it's
# an approximation I can build quickly.
for i in range(32):
(freq, split_char), parent = max(
(mfl(len(n.children), n), n) for n in
exttltries.pre_order_nodes() if len(n.children) <= 4)
parent.add(split_char)
subtrie = parent.search(split_char)
subtrie.tp = TriePair(Trie(w for w in parent.tp.fwd.words() if split_char not in w), parent.tp.rev)
print([x for x in exttltries.pre_order_nodes()])
print([sum(1 for y in x.tp.fwd.words()) for x in exttltries.pre_order_nodes()])
def pickTP(t, n):
if (len(t.children) == 0): return t.tp
for l in t.children:
if not l in n.covered: continue
return pickTP(t.children[l], n)
return t.tp
def bex(n, advisoryBudget):
if n.pangramP():
for x in ordinarytries.expand(n):
yield x
return
letterAllowance = int((advisoryBudget / 2) - g(n))
if letterAllowance < 5:
missing = len(string.ascii_lowercase) - len(n.covered)
if missing < letterAllowance:
# we might be able to afford to waste the next word
for x in lengthtries[letterAllowance].expand(n):
yield x
return
if missing == 2:
# find a suitable tp
for lp, t in raretries:
if not any(l in n.covered for l in lp):
for x in t.expand(n):
yield x
return
if missing <= 4:
letters = string.ascii_lowercase - n.covered.keys()
if all(l in lettertries for l in letters):
for x in itertools.chain.from_iterable(t.expand(n) for l, t in lettertries.items() if not n.coveredP(l)):
yield x
return
if letterAllowance in lengthtries:
for x in lengthtries[letterAllowance].expand(n):
yield x
return
for x in pickTP(exttltries, n).expand(n):
yield x
return bex

There are unreachable TriePairs in exttltries, and some of those are smaller than letter-equivalent reachable ones. To avoid this, we should maintain the exttltries property that no node's letter should appear in the set of siblings of its ancestors. Together with the greedy order for splitting nodes, this minimizes size and optimizes expected search cost. But we do pretty well even with the quick and dirty experiment: we find a first solution (cost 67) before the 7808 node expansion in under 4562.899 CPU seconds. A 65 solution is found before the 9984 node expansion in under 5141.208 CPU seconds.

I never got around to rewriting variations in search of more beautiful code. I still wonder what other tricks I've overlooked for improving the heuristic function or reducing the search space.

Rumor has it that Common Lisp has been really well-optimized over the years, so I wonder how much faster this could go if rewritten in lisp. Also, I could go twice as fast on my dual-core CPU if python had threads (just partition the input) or if I quit some other apps in order to fit two python instances of this program in my 4GB of RAM.

Many have posted solutions for this problem. Solutions I can recommend reading if this post has piqued your interest include Jim Vellenga's and Keoki Zee's.

Wednesday, February 10, 2010

Continuation Monad

Last month I worked on a TopCoder problem that led me to continue my exploration of monads. The problem is to sort a set of elements using comparisons whose cost is a function of the comparands. Solutions comprise a pair of methods, one of which initializes the program with the costs, and the other of which is invoked repeatedly to effect the dialog between the program and the less-than oracle.

Now choose your own adventure. Do you:

  1. Ask how a real computer scientist would solve it
  2. Watch me muddle through comparison-based sorting with variable costs
  3. Follow my continuation monad implementation in Java

Comparison-based sorting with variable costs

Sidebar: true story

I've seen a related problem in real life. Our app iDeal helped salespersons develop contract terms subject to Boolean constraints called guidelines, analogous to the Boolean trees examined in Charikar et al. We didn't actually have to buy our guideline results, but we did care about evaluation speed. There were basically two prices: low (computable on our local app server) and high (computable at a minicomputer reached by at least two SOAP hops and a batch job. Unlike the Charikar model, our cost function was not additive: it behaved like or, giving a result of "expensive" whenever at least one guideline couldn't be computed locally and "cheap" otherwise. Consequently, the real-life challenge was less about algorithm design and more about software engineering: given that guideline evaluation is demanded at many call sites, how can we avoid unnecessarily consulting the minicomputer?

I first thought of divide and conquer approaches to avoid redundant queries (don't ask a < c if it is already known that a < b and b < c). I thought about algorithms for the special case when C(a, b) = 1: mergesort, heapsort, and quicksort. All of these relate to binary trees, and I thought about properties a binary search tree constructed at optimal cost would have in the general case.

Shouldn't the root of the tree be the element for which the sum over pairs of itself with each of its descendants is minimal? Intuitively, we compare every element with the root to locate the element in the tree, and we want to minimize the cost of inserting each element by this recursive process. Not quite: this fails to account for tree shape (consider p < q < r with costs C(p, X) = 1 for all X in {p, q, r}; C(q, r) = 2). And thinking back to the special case of a constant cost function, it's clear that this root selection heuristic has no power. It's also often mentioned in discussions of e.g., quicksort, that it's important to have a balanced tree. But we're not looking just to balance and fill the tree; It's easy to ensure that every left child in an optimal tree is a leaf node by carefully choosing the cost function.

The shape of the tree is determined both by the sequence of queries the program makes and by the order of elements determined by the oracle. The best choice for the root is the median of the elements, so an exact solution is O(n^2) but we could think about approximate selection and average-case performance.

What about a bottom-up approach? Do our trees have optimal substructure? First: in an optimal tree, is every subtree optimal? Suppose not. Then some optimal tree has a subtree that can be rearranged for lower cost. But this doesn't affect the cost to construct the rest of the tree, so after the rearrangement the overall cost is lowered. This contradicts the supposition that our original tree was optimal with a suboptimal subtree. But second: can we efficiently combine optimal subsolutions to derive optimal solutions?

A straightforward mergesort would guarantee a minimal number of comparisons, but not necessarily a minimal-cost set of comparisons. Although a minimal tree implies minimal subtrees, not all combinations of minimal subtrees constitute a (rearrangement of a) minimal tree. To see this, imagine a cost function that does not have the triangle property. In this case, the cheapest way to combine two trees might involve using additional vertices not a part of either original tree.

Let g(t) = sum(C(n, n') for n in t for n' in ~t). Recursively merge the runs with the lowest g.

Pf. g-ordered mergesort yields optimal solutions.

Let A and B represent two C-optimal subsolutions whose g-costs are least among all subsolutions. Claim: the combination S of A and B is C-optimal. Suppose not. Then there is a (sub)solution S' containing all the elements of a and b such that C(S) > C(S'). S' is a proper superset of S, so we can obtain it by adding elements from the complement of S. These missing elements must exist in other optimal subsolutions having higher g-costs than that of S. Also, each of these elements must have edges to each of a and b in the optimal tree of S. Consider any subsolution G hosting one of these missing elements. It's g-cost is \sum_{g \in G}(\sum_{a \in A}(C(g, a)) + \sum_{b \in B}(C(g, b)) + \sum_{c' \in c}(C(g, c'))), where a, b, and c' are drawn from our input sets and the complement of their union respectively. To simplify the notation, I'm going to switch now to using implicit summations: the g-cost of subsolution G is C(g, a) + C(g, b) + C(g, c). The g-cost of S is c(a, c) + c(b, c) + c(s, c). The g-cost of a is c(a, b) + c(a, c). The g-cost of b is c(a, b) + c(b, c). A little algebra and the observation that the complement of the union of a and b includes all the g \in G, we can derive a contradiction from the assumption that there exist elements outside the min-g-cost subsolutions a and b that would improve the C-cost of the combination of a and b:


c(s a) + c(s b) + c(s c) > max(c(a c) + c(a b); c(b c) + c(a b))
> max(c(S) - c(s, c) - c(b, c) + c(a, b);
c(S) - c(s, c) - c(a, c) + c(a, b))
c(s a) + c(s b) + 2c(s c) - c(a b) - c(S) > max(- c(b, c); - c(a c))
- c(s c) - c(a c) - c(b c)
c(s a) + c(s b) + c(s c) - c(a b) - c(a c) - c(b c) > -max(c(b c); c(a c))
c(s a) + c(s b) + c(s c) - c(a b) - c(a c) > 0
c(s a) + c(s b) + c(s c) - c(a b) - c(b c) > 0
c(s a) + c(s b) + c(s c) > c(a b) + c(a c)
c(s a) + c(s b) + c(s c) > c(a b) + c(b c)
c(a c) includes c(a s) so c(a c) > c(a s)
c(s b) + c(s c) > c(a b)
c(s a) + c(s c) > c(a b)
c(s b) - c(s a) > 0
c(s a) - c(s b) > 0
⟶⟵

Sadly, although this heuristic won't mislead us, it may often fail to answer by telling us that more than two trees have the least g-cost.

A more dynamic heuristic could rely on probabilistic estimates of the information gain at each query. Assuming uniform random permutation, initially we should just do the cheapest comparison. As we progress, we'll develop equivalence classes: these elements could be anywhere; those elements could only be in the top 6. Comparisons involving elements in large classes or classes near the min or max yield relatively little information. We could combine estimates of value with the given cost information to get reasonable average-case performance.

I decided not to work harder to prove a tight lower bound on the solution to the problem or to find a better algorithm or heuristic. It's TopCoder after all, and speed is important. I decided instead to move on to implementation and Google for a better answer later.

So in a nutshell my solution is: initially regard every element as a subsolution. Order them by g-cost and merge the two lowest-ranked solutions. Recurse.

Continuation monad

The online interactions between the TopCoder oracle and my solution raises an interesting question. The first thing that occurred to me of course was to write g and merge, arrange to store state in instance variables, and update my instance state incrementally in the two callback functions whose existence is mandated by the oracle. On second thought, I wondered if my code could more closely resemble the nutshell description I gave above, with the state maintenance handled separately. It seemed like a good excuse to get some first-hand experience with continuation-passing style, a canonical form for programs that can be paused and restarted. The continuation monad brings cps into a more general framework, improving modularity.

So I broke out my old favorites Monads for Functional Programming and Comprehending Monads. I also found a couple of new sources: Lecture notes: Continuations and call/cc and The Continuation Monad in Clojure. Continuations are also discussed at some length in the lambda papers, although the idea originated a decade earlier.

First, the monad preliminaries. When I implemented the maybe monad in C#, I had the ambition to support composition of different monads and that led to some difficulty with the type system. Also, I noticed how cumbersome the syntax was and what effect that had on my enthusiasm for explicit and separate expression of an abstraction for monads. And that was with C# 3, which has overtaken Java in its evolution towards greater expressive power. So this time I decided to focus strictly on the continuation monad and the question of whether it could enable me to separate the oracle i/o from the specification of my sort algorithm.

Recall the monad laws:


@Test
// unit a ⋆ λb. n = n[a/b]
public void leftUnit() {
Func1<String, CMonad<Integer>> f =
new Func1<String, CMonad<Integer>>() {
public CMonad<Integer> f(String a) {
return CMonad.unit(a.length());
}
};

String a = "a";
CMonad<Integer> expected =
f.f(a);
CMonad<Integer> actual =
CMonad.map(
f,
CMonad.unit(a));

assertMEquals(expected, actual);
}

@Test
// m ⋆ λa. unit a = m
public void rightUnit() {
CMonad<Integer> m =
CMonad.unit(42);
Func1<Integer, CMonad<Integer>> id =
new Func1<Integer, CMonad<Integer>>() {
public CMonad<Integer> f(Integer x) {
return CMonad.unit(x);
}
};

assertMEquals(
m,
CMonad.map(id, m));
}

@Test
// m ⋆ (λa. n ⋆ λb. o) = (m ⋆ λa. n) ⋆ λb. o
public void associative() {
Func1<String, CMonad<Integer>> fa =
new Func1<String, CMonad<Integer>>() {
public CMonad<Integer> f(String x) {

return CMonad.unit(x.length());
}
};

Func1<Integer, CMonad<Integer>> fb =
new Func1<Integer, CMonad<Integer>>() {
public CMonad<Integer> f(Integer x) {
return CMonad.unit(x * 2);
}
};

final CMonad<String> m =
CMonad.unit("m");

final CMonad<Integer> n =
CMonad.unit(5);

final Func1<Integer, CMonad<Integer>> o =
new Func1<Integer, CMonad<Integer>>() {
public CMonad<Integer> f(Integer x) {
return CMonad.unit(x - 1);
}
};

CMonad<Integer> expected =
CMonad.map(
new Func1<String, CMonad<Integer>>() {
public CMonad<Integer> f(String x) {
return
CMonad.map(
o,
n);
}
},
m);

CMonad<Integer> actual =
CMonad.map(
o,
CMonad.map(
new Func1<String, CMonad<Integer>>() {
public CMonad<Integer> f(String x) {
return n;
}
},
m));

assertMEquals(expected, actual);
}

(assertMEquals is more or less what you'd expect):


private void assertMEquals(CMonad expected, CMonad actual) {
Func1 id =
new CostlySorting.Func1() {
public T f(T x) {
return x;
}
};
T exp = expected.f(id);
T act = actual.f(id);
assertEquals(exp, act);
}

My continuation monad satisfies those:


public abstract static class CMonad<X> {
private CMonad() {}
public abstract <R> R f(Func1<X, R> k);
public static <X> CMonad<X> unit(final X x) {
return
new CMonad<X>() {
public <R> R f(Func1<X, R> k) {
return k.f(x);
}
};
}

// aka bind aka ★
public static <X, R> CMonad<R> map(final Func1<X, CMonad<R>> f, final CMonad<X> mx) {
return new CMonad<R>() {
public <S> S f(Func1<R, S> k) {
return mx.f(f).f(k);
}
};
}
}

call/cc seemed like a useful abstraction, so I implemented that in terms of my monad:


public Integer divideByZeroEg(final int x, final int y) {
CMonad<Integer> eg =
CMonad.callcc(
new Func1<Func1<Integer, CMonad<Integer>>, CMonad<Integer>>() {
public CMonad<Integer> f(Func1<Integer, CMonad<Integer>> esc) {
return
CMonad.map(
new Func1<Integer, CMonad<Integer>>() {
public CMonad<Integer> f(Integer z) {
return CMonad.unit(x / z);
}
},
(y == 0 ? esc.f(42) : CMonad.unit(y)));
}
});
return eg.f(
new Func1<Integer, Integer>() {
public Integer f(Integer x) {
return x;
}
});
}

@Test
public void divideByZeroEg() {
assertEquals((Object)5, divideByZeroEg(20, 4));
assertEquals((Object)42, divideByZeroEg(20, 0));
}

public static <X, Y> CMonad<X> callcc(final Func1<Func1<X, CMonad<Y>>, CMonad<X>> g) {
// .\ k -> g(.\x -> .\k' -> kx)k
// g :: ((X -> MY) -> MX)
// g :: ((X -> (Y -> R)) -> (X -> R'))
// k :: X -> R''
// R'' = R
// R = X -> R'
return
map(
new Func1<X, CMonad<X>>() {
public CMonad<X> f(X x) {
return unit(x);
}
},
new CMonad<X>() {
public <R> R f(final Func1<X, R> k) {
CMonad<X> gresult =
g.f(
new Func1<X, CMonad<Y>>() {
public CMonad<Y> f(final X x) {
return new CMonad<Y>() {
public <S> S f(Func1<Y, S> kprime) {
return (S)k.f(x);
}
};
}
});
return gresult.f(k);
}
});
}

and then I implemented a while loop atop continuation monad and call/cc:


@Test
public void finiteWhile() {
final int [] countdown = new int [] {3};
CMonad.cpswhile(
new Func0<Boolean>() {
public Boolean f() {
return countdown[0] > 0;
}
},
new SideEffectful() {
public void f() {
countdown[0] -= 1;
}
});
assertEquals(0, countdown[0]);
}

private static <T> CMonad<T> recurse(CMonad<T> m) {
final WhateverClosure<CMonad<T>> loop =
new WhateverClosure<CMonad<T>>();
loop.whatever =
CMonad.map(
new Func1<T, CMonad<T>>() {
public CMonad<T> f(T x) {
System.out.println("recurse");
return loop.whatever;
}
},
m);
return loop.whatever;
}

public static void cpswhile(final Func0<Boolean> loopInvariant, final SideEffectful body) {
callcc(
new Func1<Func1<Object, CMonad<Object>>, CMonad<Object>>() {
public CMonad<Object> f(final Func1<Object, CMonad<Object>> esc) {
return recurse(
new CMonad<Object>() {
public <R> R f(Func1<Object, R> f) {
System.out.println("start.f");
if (!loopInvariant.f()) {
System.out.println("done");
return
esc.f(null).f(f);
}
System.out.println("body.f");
body.f();
return f.f(null);
}
});
}
}).f(new Func1<Object, Object>() {
public Object f(Object x) {
return x;
}
});
}

public interface SideEffectful {
public void f();
}

Why reimplement the while? Java's while keyword only composes in the predefined ways that Java's language designers cared about. I wouldn't be able to effect oracle I/O between loop iterations, except by resorting to contrived method invocations in the loop body or test. Also, because of the communications mechanism is method invocation, I couldn't actually use while without also using threads. By using composable continuation monads for control flow, I hoped to express my sort algorithm separately from managing the data flow between the program and the TC oracle.

Of course, Lisp and Haskell don't always translate easily to Java:


@Test
// (recipe for stack overflow)
public void infiniteWhile() {
// customize this value for your stack size/patience
int approximatelyInfinity = 1;

final Calendar finish = Calendar.getInstance();
finish.add(Calendar.SECOND, approximatelyInfinity);
//CMonad.cpsbounce(
CMonad.cpsbouncewhile(
new Func0<Boolean>() {
public Boolean f() {
return finish.after(Calendar.getInstance());
}
},
new SideEffectful() {
public void f() {
}
});
}

public static void cpsbouncewhile(final Func0<Boolean> loopInvariant, final SideEffectful body) {
CMonad<Object> mo =
callcc(
new Func1<Func1<Object, CMonad<Object>>, CMonad<Object>>() {
public CMonad<Object> f(final Func1<Object, CMonad<Object>> esc) {
final CMonad<Object> start =
new CMonad<Object>() {
public <R> R f(Func1<Object, R> f) {
System.out.println("start.f");
if (!loopInvariant.f()) {
System.out.println("done");
return
esc.f(null).f(f);
}
System.out.println("body.f");
body.f();
return f.f(null);
}
};

final WhateverClosure<CMonad<Object>> restartHolder = new WhateverClosure<CMonad<Object>>();
restartHolder.whatever =
map(
new Func1<Object, CMonad<Object>>() {
public CMonad<Object> f(Object x) {
System.out.println("restart.f");
return
esc.f(
Bounce.wrap(
restartHolder.whatever));
}
},
start);

return restartHolder.whatever;
}
});
for (; mo != null;
mo =
(CMonad<Object>)mo.f(new Func1<Object, Object>() {
public Object f(Object x) {
return x;
}
})) {
if (!(mo instanceof Bounce)) continue;
System.out.println("bouncey");
}
}

So then how'd I do? Well, the test code for while loops is fairly readable, but my goal was to write something like


// T is a priority queue of sorted runs
while (T.size() > 1) {
T.add(merge(comparator, T.remove(), T.remove());
}

and have that automatically translated to a continuation monad representing the progress of the loop as a function from Booleans (comparison results) to pairs of its own type and pairs of elements to compare next.

Dealing with the while per se was not bad, but translating the body requires quite a lot of fussiness. We need not only a restartable outer (while) loop, but also a restartable inner (merge) loop and we want these flattened so that the while's client code can be oblivious to the implementation choice to use functional composition in the while body. Looking at things a different way, we would like to encapsulate data flow so that a caller can repeatedly supply data to meet the demands of the callee until the callee finishes.

We could use call/cc whenever we need input to obtain a restart wherever we need input. These restarts are relatively easy to pass around (no need for joins), but they only support one standard operation (escape). Accordingly, each adjacent caller/callee pair have to be tweaked to expect/return restarts (analogously to monads in the above paragraph) and each caller has to branch at each invocation to either pass back a restart or move on with its own computation.

Alternatively, we could add a function to represent function application within the continuation monad framework. We could tweak while so that its body argument has a result type of continuation-monad, and then this new apply function could yield such a monad; while could be responsible for composing that result into its own result (using the join operation from MMA -> MA).

Although we're able to dispense with the repetitive branching at function invocations in the call/cc approach by using monads, we can't escape the need to reimplement Java all along the way. If the body of a while loop includes its own function delegation, we'll have make that monadic. If we need a conditional branch, we'll have to make that monadic. Even to implement merge sort we'd wind up with multiple nested anonymous class instance creation expressions. In between scads of braces, parens, brackets, and repetitions of type names like "Func1," we'd have the names of the functions we used to replace all the Java primitives. Lacking syntactic abstraction, the readability of the resulting DSL doesn't scale beyond the simple examples in this blog post.