Thursday, April 22, 2010

Palindromic Pangrams

This post details a solution to a disused hiring problem. If you are a graph search wiz, you will find yourself skimming; I assume readers either haven't done this before or have forgotten it mostly. Even experts might be interested in the pathology, so hopefully there's something here for everyone to enjoy.

I've been unusually quiet lately because I'm in the midst of a truly marathon interview cycle. The things I'm working on probably aren't interesting to other people, and they're taking up all of my time. But before this began I spent some time working on a hiring puzzle that I think is blog-worthy.

I couldn't share the details of my approach at the time that I was actively working on the problem, because that would hurt the company and other candidates by raising the question of plagiarism. In fact, ITA removed the problem I was working on from their web site, and when I wrote their recruiter to ask why, I found out it was because somebody else wrote about it! Now it seems the rumor that Google is acquiring ITA is confirmed and some people are talking about other hiring problems. So I'm doubly sure it's OK to post this now.

First, the problem specification:

A palindrome is a sequence of words like "lid off a daffodil" or "shallot ayatollahs" that uses the same letters reading backwards as forwards. The words need not form a meaningful or grammatical sentence. A palindromic pangram is a multi-word palindrome that includes all 26 letters of the alphabet. Find the shortest sequence of words that is both a pangram and a palindrome.

That sounded like something that could occupy me for a few hours. Of course ITA would have chosen the dictionary so that brute-force approach wouldn't be fast enough, but with a reasonable choice of data structures and algorithms, it shouldn't be too hard. Writing up my solution would require lots of care; I imagined it would be carefully code reviewed to see if I was worth talking to, in which case I'd spend most of the interview defending my solution and talking about how it could be improved. Maybe several days to edit for clarity and efficiency and so on. I wound up spending weeks trying with some success to make it run faster. Besides learning some python, some humility and having an excuse to actually do things I'd previously only read about, I learned this lesson: the point of solving a puzzle is to GET A PHONE SCREEN. That's all. It was fun, but I won't repeat the mistake of undertaking substantial work without feedback.

If you want to play along at home, pause reading here and resume when you're happy with your own solution to compare notes.

Enough introduction. Let me show you some code along with commentary based on my notes. I started with a little brainstorming. I'll give you the main points first and then give a more thorough discussion later.

A quick note on optimization terminology: an optimization problem is normally posed with a set of constraints and an objective function. For example, the constraints for this problem specify that we are only interested in pangrams, and we are only interested in palindromes. We call solutions that satisfy the constraints feasible. The objective function grades the goodness of feasible solutions. For example, in this case our objective function is phrase length. An optimal solution is a feasible solution for which no other feasible solution is scored better by the objective function.

First observation: For each letter, some word in our pangram must contain that letter.

First idea: beam stack search word-sets.

Elaboration: Here I'm considering pangram formation as a search problem in the space of words organized by letter inclusion. Each dictionary word is a child of the empty root node. Each node has the complement of dictionary words represented by its (improper) ancestors as children. Assuming a small constant bound c on word length, classifying words will cost |words| * 26 * c = O(n). Among sibling nodes, I can prioritize words containing the rarest letter in order to avoid unfruitful branches. Even so, the six-figure branching factor gives a very big search space.

Second observation: For each prefix p of the letter-sequence of the pangram, the reverse of p must exist in some concatenation of words.

Second idea: use a trie for reversed words to help prune the search for palindromes. The overall search tree has min(n, 26^c)! nodes, so let's only build tries for feasible word-sets.

We need a formal definition of our goal:

def palindromeP(seq):
""" Characteristic fn for palindromes.

>>> palindromeP(['a'])

>>> palindromeP(['a', 'b', 'a'])

>>> palindromeP(['ab', 'c', 'ba'])

>>> palindromeP(['ab', 'c', 'ab'])
return (lambda p: p[:int(len(p)/2)] == (p[int(math.ceil(len(p)/2.0)):])[::-1])(''.join(seq))

For those who don't use python, the triple-quotes delimit a docstring comment, and what looks like a toplevel transcript is a doctest specification showing inputs and expected outputs. The code just says that we are concerned with the concatenation of the given sequence: when the first half matches the reverse of the second half, we have a palindrome.

The definition of pangram depends on the letters that comprise our candidate solution:

class Histogram:
def __init__(self, seq):
self.h = collections.defaultdict(set)
for s in seq:
for i in s:
k = i.lower()
def __len__(self):
return len(self.h)
def __getitem__(self, key):
return self.h[key]
def __iter__(self):
if (not hasattr(self, 'ordered_h')):
self.ordered_h = sorted(self.h.items(), key = lambda xy: -len(xy[1]))
return self.ordered_h.__iter__()
def top(self):
for x in self.__iter__():
return x[1]
def __repr__(self):
return '(Histogram %s)' % self.h.__repr__()

The histogram is a glorified dictionary mapping letters to words. It participates in the python idiomatic iteration scheme by partially ordering its values by overall letter frequency.

def pangramP(seq):
Characteristic fn for pangrams.
(assumes the alphabet is limited to A-Za-z)

>>> pangramP([''])

>>> pangramP(['a'])

>>> pangramP([string.ascii_lowercase])

>>> pangramP(dict((x, x) for x in string.ascii_lowercase))
if isinstance(seq, collections.KeysView):
h = seq
elif isinstance(seq, str):
h = set(seq)
elif isinstance(seq, collections.Mapping):
h = seq.keys()
elif isinstance(seq, Histogram):
h = seq
h = Histogram(seq)
return len(h) == 26

Now, let's look at the search implementation. I've characterized the problem in terms of tree search, so I need a representation for nodes:

class WordSetNode:
def __init__(self, word, parent = None):
if not hasattr(word, '__iter__'):
word = str(word)
self.word = word
self.parent = parent
self.covered = dict.fromkeys(word)
if parent:
def coveredP(self, letter):
return letter in self.covered
def pangramP(self):
return pangramP(self.covered.keys())
def ancestors(self):
n = self
while (n):
yield n
n = n.parent
def phrase(self):
return reversed([n.word for n in self.ancestors() if n.word])
def phrase_string_ns(self):
if not hasattr(self, 'phrase_string_ns_cache'):
self.phrase_string_ns_cache = ''.join(self.phrase())
return self.phrase_string_ns_cache
def __repr__(self):
if not self.parent:
return "WordSetNode('%s')" % self.word
return "WordSetNode('%s', %s)" % (self.word, repr(self.parent))

The first thing you might notice is that I have backlinks from children to parents, and a very odd thing you'll notice is that I have no edges from parents to children! I'm not planning to build a tree and then search it for solutions. I want to build my tree on-demand in order to avoid needless computation and limit my memory footprint. Basically, I move from parents to children by generating nodes rather than by following a path through some pre-existing structure. The backlinks are there for convenience.

As in the sketch above, the tree search is a quest for pangrams having the palindrome property, so nodes are viewed as contributing letter-coverage and paths through the tree represent words in the overall pangram. I cache information about the path to each node along the way.

So, about those pangrams: my idea for pruning infeasible word-sequences involved prefix trees ("tries"). This is the first time I've used them in real life, so I will remind you that prefix trees are all about paths; the element data aren't associated with particular nodes but rather with paths from the root. It's very efficient for queries of the form: which elements have x as a prefix? In this case, my trie nodes will represent letters; the trie itself represents a collection of letter-sequences:

class Trie:
def __init__(self, seq = []):
if type(seq) == type(''):
seq = [seq]
self.terminus = False
self.children = {}
for i in seq:
def emptyP(self):
return not self.children
def match(self, word):
"""Find the deepest subtrie representing
a prefix of word, and the remaining suffix.

>>> Trie().match('')
(Trie(), '')

>>> Trie().match('abc')
(Trie(), 'abc')

>>> Trie('abc').match('')
(Trie(['abc']), '')

>>> Trie('abc').match('a')
(Trie(['bc']), '')

>>> Trie('abc').match('aac')
(Trie(['bc']), 'ac')

>>> Trie(['abc', 'aac']).match('achtung')
(Trie(['ac', 'bc']), 'chtung')

>>> Trie(['abc', 'aac']).match('aardvark')
(Trie(['c']), 'rdvark')

if not word:
return (self, word)
if not word[0] in self.children:
return (self, word)
return self.children[word[0]].match(word[1:])
def add(self, word):
if not word:
self.terminus = True
subtree, remainder = self.match(word)
if not remainder:
# mark the word if it's not already marked
return subtree.add('')
subtree.children[remainder[0]] = Trie([remainder[1:]])
def search(self, w):
quarry, remnant = self.match(w)
if remnant:
return None
return quarry
def traverse(self, preorderVisit, postorderVisit, prefix = ''):
preorderVisit(self, prefix)
for c in self.children.items():
c[1].traverse(preorderVisit, postorderVisit, prefix + c[0])
postorderVisit(self, prefix)
def pre_order_nodes(self, ancestry = None):
if ancestry is None:
ancestry = []
yield (self)
for c in self.children.items():
for i in c[1].pre_order_nodes():
yield i
def words(self, prefix = ''):
if self.terminus:
yield prefix
for c in self.children.items():
for l in c[1].words(prefix + c[0]):
yield l
def __repr__(self):
W = ', '.join(repr(w) for w in sorted(self.words()))
if W:
return "Trie([%s])" % W
return 'Trie()'

trie: {a, daffodil, lid, lug, off, offset}

a trie representing {a, daffodil, lid, lug, off, offset}

As I explore my overall search tree, building up a sequence of words, I want to ensure that that my sequence has palindromic symmetry. I can do this by querying a trie-filled-with-reverse-words for my current search tree path.

The basic idea seems OK, but I haven't addressed word segmentation within the pangram. We disregard whitespace in the problem definition, so we can't limit our search to pangrams with word-length symmetry. We've got to account for "massive … sam" wherein we could reach a terminal node of the prefix trie of reversed words before exhausting letters in the current search path.

def loopyTrieSearch(t):
"""Produce a function that searches
a trie augmented with backedges from
each leaf to the root.
>>> t = Trie()
>>> s = loopyTrieSearch(t)
>>> s(t, 'unmatchable')
>>> t = Trie(['abc', 'aac', 'b'])
>>> s = loopyTrieSearch(t)
>>> s(t, '')
[Trie(['aac', 'abc', 'b'])]

>>> s(t, 'a')
[Trie(['ac', 'bc'])]

>>> s(t, 'ba')
[Trie(['ac', 'bc'])]

>>> s(functools.reduce(Trie.merge, s(t, 'b')), 'ab')

>>> s(t, 'az')
>>> t = Trie(['ab', 'abc', 'd'])
>>> s = loopyTrieSearch(t)
>>> s(t, 'abd')

>>> s(t, 'azzz')
def s(t2, w):
ctx = t2 if isinstance(t2, collections.MutableSequence) else [t2]
if w == '':
return ctx
for l in w:
for i in ctx:
if i.terminus:
ctx = [j[0] for j in (i.match(l) for i in ctx) if j[1] == '']
if not ctx:
return None
return ctx
return s

From these ideas, I built a program that ran very nicely on a contrived, tiny, dictionary, but when I ran it on the 200kword ITA dictionary, my cpu fan revved to 100% and stayed there indefinitely. I should show some profiling data here.

How could I build my search tree more cheaply? Initially, to generate a node's children, I would take my letter-coverage dictionary's complement to see what letters I needed. I'd find all the words that could contribute a needed letter. For each word, I'd use my loopyTrieSearch. This tactic does avoid exploring infeasible branches, but it involves a lot of unsuccessful trie queries. What if I started by loopyTrieSearching the prospective parent's path, and then considering the loopy suffixes as possible words containing a needed letter?

Consider a trie pair (A, B) with A containing "awk" and "awkward" and B containing only "awesome." The naive approach requires us to conduct separate trie searches for "awk" and "awkward," both times stopping when we fail to match the ks. The improved approach allows us to simultaneously eliminate both words as soon as we find that "aw" has no child "k." If the average branching factor in the trie over all words (26 at most) is less than the average number of words associated with each letter (200000/26), then the latter could be a superior tactic. This is the purpose of the Trie.filter method:

class Trie:
# ...
def filter(self, criteria, root = None):
"""Return the words in this trie that
match the augmentation of criteria by
the addition of backedges from terminals
to the root.

>>> sorted(w for w in d if s(T, w)) == [w for w, i in itertools.groupby(sorted(FT.filter(T)))]
def f(words, crit, prefix = ''):
if words.terminus:
yield prefix
if words.emptyP():
backedge = root.children.items() if crit.terminus else ()
for c in itertools.chain(crit.children.items(), backedge):
m = words.match(c[0])
if m[1] != '': continue
for w in f(m[0], c[1], prefix + c[0]):
yield w
if not root: root = criteria
for i in f(self, criteria):
yield i
def merge(a, b):
"""Merge two tries.
>>> Trie.merge(Trie(['a']), Trie(['b']))
Trie(['a', 'b'])

>>> Trie.merge(Trie(['abc']), Trie(['bcd']))
Trie(['abc', 'bcd'])

>>> Trie.merge(Trie(['abc', 'aac']), Trie(['abe']))
Trie(['aac', 'abc', 'abe'])
result = Trie()
result.children = a.children.copy()
for k, v in b.children.items():
if k in result.children:
result.children[k] = Trie.merge(v, result.children[k])
result.children[k] = v
result.terminus = a.terminus or b.terminus
return result

For this, I'll use a pair of tries, one reverse and one forward. My search path corresponds to paths in each of those tries; as I query in the trie pair I need never pursue a branch that's available in only one of the tries.

class TriePair:
def __init__(self, W, Wr = None):
if not Wr: Wr = W
self.fwd = W if isinstance(W, Trie) else Trie(W)
self.rev = Wr if isinstance(Wr, Trie) else Trie(w[::-1] for w in Wr)
self.s = loopyTrieSearch(self.rev)
def expand(self, node, advisoryBudget = None, peek = True):
if not node.trieContinuation: return
if peek and node.pangramP() and not node.parent.pangramP():
# don't peek in the recursive expansion
for i in self.completePalindrome(node, expand = functools.partial(self.expand, peek = False)):
yield i
W = itertools.chain(*(self.fwd.filter(c, self.rev) for c in node.trieContinuation))
awords = set(a.word for a in node.ancestors())
for w in set(W):
if w in awords: continue
n = WordSetNode(w, node)
# if pangram and \exists a word matching some trie
# path to a leaf -> generate the easy palindrome
n.setupContinuation = self.setupContinuation
yield n

I mentioned earlier that the search tree is constructed during the search process. The expand function does that work, producing the child nodes of each parent. I'll explain peeking later. It would be possible to query our search tries for the string corresponding to our path whenever we expand a node, but then each descendent must redo its ancestors' trie search work. So, we copy the parent's search context into a cache (trieContinuation) on each child in order to avoid that extra work:

class TriePair:
# ...
def setupContinuation(self, n):
if hasattr(n, 'trieContinuation'):
elif self == n and hasattr(n, 'setupContinuation'):
elif n.parent and hasattr(n.parent, 'trieContinuation'):
n.trieContinuation = (
n.trieContinuation = (

Let's take a closer look at beam-stack search.

Recall that depth-first search always expands the deepest unexplored node, while breadth-first search always expands the shallowest unexplored node. Depth-first search uses memory proportional to the longest path to a leaf; breadth-first search uses memory proportional to the size of the largest layer, O(d^b) where d is the longest path to a leaf and b is the branching factor at each node. For many problems, the better of two feasible solutions is the one closer to the root, and for these problems breadth-first search finds good solutions faster than depth-first search (which can pay an infinite penalty for unlucky early expansions).

Best-first search algorithms take an intermediate course between depth-first and breadth-first search. They choose which node to expand next by applying a heuristic. A* is a well-known such algorithm, and it's the basis for beam-stack search. The exploration order (denoted f(n) = g(n) + h(n)) takes into account both the (known) cost of each node's path from the root (g(n)) and the (estimated) cost of its path to a solution (h(n)). The estimation must be conservative in the sense that it never overestimates a potential solution's cost; this allows admissible pruning: once we've found a solution, we can safely avoid expanding any subtrees with higher cost estimates.

Best-first search can degenerate into breadth-first search in the worst case for memory use; beam-stack search addresses real-world memory constraints by enabling inadmissible pruning without compromising completeness (we will find a solution if one exists) or optimality (no solution is better than the one we find). Beam-stack search does this by keeping metadata proportional to the depth of the search that allows it to backtrack, regenerating and thoroughly exploring inadmissibly pruned nodes after it has explored the more promising portion of the graph.

Note that beam-stack search is an anytime algorithm: it strives to return a feasible solution ASAP and then to return improved solutions as they discovered. It is also complete and optimal, so it will definitely return a solution if any exists and the final solution it returns is guaranteed to be at least as good as any that exist.

The following graphs illustrate the progress of a beam-stack search on a small graph with a simple cost function and heuristic.
root node with boundsBounds on the f-cost of child nodes are associated with each layer. Initially, we consider all possible children. Whenever we generate a child, we check for solution feasibility.
first layer expansion

For each node, we can compute the (certain) g-cost of the partial solution and the (guessed) h-cost of the potential solution descendent of the node. In this example, we assume that g is phrase length (not including spaces) and h is the number of letters we need to add to produce a pangram. These are sensible, if naive, assumptions for the pangramic palindrome problem. But, to keep this example small, we will cheat by using an easier feasibility predicate: we will accept any palindrome; pangrams are not required.

search progress

When memory is limited, we inadmissibly discard some children, updating the parent layer upper bound accordingly. If backtracking proves necessary, we can use the upper bound to regenerate only the inadmissibly discarded children, thus avoiding duplicated work. Assuming a beam width of 2 nodes, we partition the set of children by f-cost. In this case, a and lid survive while off and daffodil are pruned.

search progress

We are ready to expand the frontier.

search progress

The layer exceeds our memory threshold, so it's time to prune again.

search progress

We retain the most promising nodes and update the layer bound.

search progress

We are ready to expand the frontier.

increased memory bound

Our frontier exceeds our memory limit, but I think I've made the point about pruning so let's double the limit to 4. OK then we can expand again.

no further expansion is possible without backtracking

If the dictionary contains only the words {lid, off, a, daffodil} then we are unable to expand beyond this point without duplicating words. Notice that we have yet to find any palindromes. Because we cannot go forward, we must go backtrack.


We backtrack to the deepest layer at which inadmissible pruning occurred (i.e., where the upper bound is finite). There, we update the bounds to show that we have fully explored the space below the upper bound; now we will consider the space above. We are ready to expand the frontier.

search progress

Inadmissible pruning works just the same after backtracking as before. To see this, let's tighten the memory limit back to two.


As ever, we remove the nodes with high f-cost.

search progress

We are ready to expand the frontier.

search progress

This layer exceeds our memory limit, so we need to prune.


We discard unpromising nodes.

search progress

We are ready to expand the frontier.

first candidate solution found

We find a feasible solution on the frontier: lidoffadaffodil. This is a proof by example of the maximum cost of the optimal solution to our problem. Our heuristic is admissible, meaning that it never underestimates the cost of the best solution that uses a given node. Therefore, from now on, we can admissibly prune nodes whose f-cost exceed the f-cost of our candidate solution. The memory limit might still cause us to do some inadmissible prunings in the future, and in that case we might need to backtrack. But, admissible prunings never create an obligation to backtrack. If we continue from here, we will eventually exhaust the space between 27 (where we inadmissibly pruned some nodes) and 35 (where we have a solution) and determine that in fact we have an optimal solution.

So here are the parts to our cost function f:

def g(n):
# sort of like
# lambda n: sum(len(w.word) for w in n.ancestors())
# but with caching
if not hasattr(n, 'g'):
result = len(n.word)
if n.parent:
result += g(n.parent)
n.g = result
return n.g
def h(n):
"""Admissible heuristic estimate of cost
remaining from this node.

>>> h(WordSetNode('a')) == h(WordSetNode('aa'))

Though admissible, it's reasonable tight.
>>> minimalCost = 26 * 2 - 1
>>> inadmissibleMargin = 1
>>> n = WordSetNode('a')
>>> h(n) > minimalCost - inadmissibleMargin - len(''.join(n.phrase()))
phrase_width = g
tie_breaker = .01 * len(n.word) + .01 * abs(hash(n.phrase_string_ns()))/float(sys.maxsize)
covered = len(n.covered)
completely = len(string.ascii_lowercase)
# The symmetry of the palindrome guarantees
# that the first half is a pangram. So,
# there's no use evaluating palindrome-dness
# until we have a pangram.
if covered < completely:
# descendent nodes must:
# cover the missing letters on the left
# (just to the right of n)
# match the missing letters on the right
# (with a possible exception at the pivot)
# match what we have so far
# TODO: improve this estimate by:
# (a) looking ahead in the trie for the missing
# letters
# (b) using the max_missing(min(len(word) | missing \in word))
# rather than 1 for the cost component due to
# missing letters
missing = completely - covered
# strategy b:
# this doesn't help: when letterCosts
# wins the max, we lose our ability to
# distinguish between nodes that help
# us gain coverage and nodes that do not.
#needed = max(missing + (missing - 1), max(letterCosts[i] for i in string.ascii_lowercase if not i in n.covered))
needed = missing + (missing - 1)
return phrase_width(n) + needed - tie_breaker
return palindrome_balance(n) - tie_breaker

And here's how those functions are used, along with parameters for tuning memory use (beam_width), the upper bound on cost for solutions we're willing to accept (U), a predicate to control which intermediate solutions we yield, and a way to inject a stack (which is handy for resuming cancelled searches):

def bss(nodes, expand, g, h, beam_width, U = sys.maxsize, feasibleP = lambda n: False, stack = None):
"""Beam-stack search, a variation on
breadth-first branch and bound with

>>> [i for i in bss([], None, lambda n: 1, lambda n: 1, 10)]

>>> [i for i in bss([1], lambda n: [], lambda n: 1, lambda n: 1, 10)]

>>> sorted([i for i in bss(range(15), lambda n: [], lambda n: 1, lambda n: 1, 10)])
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

>>> sorted([i for i in bss(range(5), intexpand, lambda n: 1, lambda n: 0, 3)])
[1, 2]

def dedup(frontier):
# This is a warning rather than an error:
# some nodes really are equivalent though
# not identical. For example, spaces don't
# matter in pangramic palindromes.
frontier = sorted(frontier, key = lambda p: p[0])
groupedFrontier = (
key = lambda p: p[0]),
lambda p: p[0]))
frontier = []
for i in groupedFrontier:
elts = list(itertools.islice(i[1], 2))
if (len(elts) <= 1):
print('f needs nuance: these nodes have the same f-cost: %d = f(%s) = f(%s)' % (elts[0][0], ' '.join(elts[0][1].phrase()), ' '.join(elts[1][1].phrase())))
return sorted(frontier, key = lambda p: p[0])
def prune(frontier, beam_width):
if len(frontier) <= beam_width:
f_max = frontier[-1][0]
print('pruned nodes, new f_max = %s' % f_max)
return frontier, f_max
# We'll never backtrack to the nodes
# we're pruning, even if these nodes
# are never admissibly discarded.
# beam_width may be smaller than the largest
# equivalence class defined by f.
# Ideally, these classes should contain
# one node each; otherwise it's difficult
# to choose a beam width that will avoid
# splitting an equivalence class.
s = dedup(frontier)
retained, pruned = s[:beam_width], s[beam_width:]
return prune(retained, beam_width)

# 63: verbify ma pluck now sh ad go jo zax eta qat ex azo jog dahs wonk culpa my fib rev
# 65: buckrams donzel ti pyx of vug we jo hm suq us mho jew guv foxy pit lez nods mark cub
# 65: verbify na ducks to hm we jo zax al pig suq us gip lax azo jew mho tsk cud any fib rev
# 67: bonked vulgars ti hm oy cap wo jo zax if suq us fix azo jow pac yom hits rag luv de knob
# 69: verbify locks twang up mho zax ed ma jab suq us ba jam dex azo hm pug naw tsk coly fib rev
# 71: verbify lock nut spaz we jig am dex eh ta qua sau qat hexed magi jew zap stunk coly fib rev
# 76: drawknife butyls go hm op ox azo jo we vac suq us suq us cave wo jo zax op om hog sly tube fink ward
# 86: de but way xi sh on vug la ef pa me re ka jo ah coz om ta qat ta qat mozo chao jake remap feal guv noh six yaw tubed
# 92: drawknife butyls go hm op ox azo jo we vac suq usherettes setter eh suq us cave wo jo zax opo mho g sly tube fink ward
# 101: underflow machs ti by xi pe ka jo zag eve suq aimu umiaq use vega zo jake pixy bits h cam wolfr ed nu

f = lambda n: g(n) + h(n)
if not stack:
oldFrontier = [(f(i), i) for i in nodes]
for i in filter(lambda x: feasibleP(x[1]), oldFrontier):
U = min(U, g(i[1]))
yield i[1]
oldFrontier = [i for i in oldFrontier if i[0] < U]
stack = [[0, U, oldFrontier]]
j = 0
while stack:
f_min, f_max, oldFrontier = stack[-1]
print('f_min, f_max, level = %s, %s, %d' % (f_min, f_max, len(stack)))
if oldFrontier: print('sample phrase: %s' % ' '.join(oldFrontier[0][1].phrase()))
frontier = []
for i, o in enumerate(oldFrontier):
j += 1
if not j % 32: print('nodes expanded: %d' % j)
k = 0
for n in expand(o[1], advisoryBudget = f_max):
k += 1
# if not k % 8196: print('progress on frontier: %d' % k)
if feasibleP(n):
gn = g(n)
if gn >= U: continue
# g-values are integers, and f
# is admissible, so all the f-values
# above g - 1 have f-cost at least
# as high as g (i.e., they're no
# better than this solution).
U = gn - 1 + (1.0 / sys.maxsize)
f_max = U
yield n
oldFrontier[i + 1:] = filter(lambda p: p[0] < U, oldFrontier[i + 1:])
p = (f(n), n)
if f_min < p[0] < f_max:
if len(frontier) > beam_width * 2:
frontier, f_max = prune(frontier, beam_width)
# filter, sort, reduce equivalence classes to class representatives
prior_frontier = list(frontier)
frontier = filter(lambda p: p[0] <= f_max, frontier)
frontier = dedup(frontier)
if not frontier:
# backtrack
print('empty layer -> backtracking from %d to' % len(stack), end=' ')
while stack and stack[-1][1] >= U:
print(len(stack), end=' ')
if stack:
frame = stack[-1]
frame[0], frame[1] = frame[1], U
frame[2] = list(filter(lambda n: n[0] < end="'')

Now, a very obvious improvement can be had by noticing that we guarantee palindromes by construction. Once we attain the pangram property, it's entirely straightforward to complete the palindrome. I call that "peeking." Here's a way to take the short-cut at the appropriate moment during bss:

class TriePair:
# ...
def pivotH(self, n):
revCompletions = self.s(self.rev, n.phrase_string_ns())
completionWords = itertools.chain(*(t.words() for t in revCompletions))
return min(len(w) for w in completionWords) - (32 - len(n.word)) / 32 - abs(hash(n.phrase_string_ns())) / sys.maxsize / 64
def rightH(n, expected_length):
return expected_length - sum(len(i) for i in n.phrase()) - abs(hash(n.phrase_string_ns())) / sys.maxsize
def rightExpand(self, n, match, len_left):
"""Restricted search space for quickly finding
pangramic palindrome completions.
>>> left = 'drawknife butyls go hm op ox azo jo we vac suq us'.split()
>>> right = 'suq us cave wo jo zax op om hog sly tube fink ward'.split()
>>> extraneous = ['opossum']
>>> palindrome = left + right
>>> tries = TriePair(palindrome + extraneous)
>>> all(tries.fwd.match(i)[1] == '' for i in palindrome)
>>> n = functools.reduce(lambda a, i: WordSetNode(i, a), left, None)
>>> match = ''.join(left)[::-1]
>>> rx = functools.partial(tries.rightExpand, match)
>>> x = list(rx(n))
>>> [i.word for i in x]

>>> n = functools.reduce(lambda a, i: WordSetNode(i, a), itertools.chain(left, itertools.takewhile(lambda p: p != 'op', right)), None)
>>> m, r = tries.fwd.match('op')
>>> r == '' and sum(1 for i in m.words()) > 1

>>> [i.word for i in list(rx(n))]

>>> n = functools.reduce(lambda a, i: WordSetNode(i, a), palindrome[:-1], None)
>>> palindrome[-1] in (i.word for i in list(rx(n)))
m = match[sum(len(i) for i in n.phrase()) - len_left:]
r = self.fwd.match(m)[1]
nextWordSupersequence = m[:-len(r)] if r else m
for i in self.fwd.filter(Trie([nextWordSupersequence]), Trie()):
yield WordSetNode(i, n)
def completePalindrome(self, pangram, expand, f_max = sys.maxsize, match = None, left = None):
# the pivot of the palindrome can be anywhere
# left of the last letter needed for the pangram.
# So it could be in the rightmost word of pangram,
# or it could be in words yet to be added to our
# phrase. We'll expand the pangram in the usual
# way elsewhere, so we don't need to be complete
# here. We do need to remain feasible, however.
# Our strategy here is to find a pivot that
# coincides with a word boundary, so that we can
# use the reverse of the phrase as the sequence
# of letters to appear on the right. This will
# give us a much faster way to complete a
# palindrome and establish a tighter upper bound
# for the rest of our search.
# TODO: also check for more interior pivots by
# extending keystoneP to enumerate all possible
# interior pivots rather than just the first one.
# ababa -> 1, 2, 3
if match and left:
len_match = len(match)
len_left = len(left)
rx = functools.partial(self.rightExpand, match = match, len_left = l
rh = functools.partial(TriePair.rightH, expected_length = len_left + len_match)
print('building right with match = %s and left = %s' % (match, left))
for completion in bss([pangram], rx, g, rh, 1, len_left + len_match + 1/sys.maxsize, feasibleP):
yield completion
if keystoneP(pangram.word):
match = pangram.parent.phrase_string_ns() + pangram.word[:keystoneP(pangram.word)]
for i in self.completePalindrome(pangram, expand, f_max, match = match, left = pangram.phrase_string_ns()):
f_max = min(f_max, g(completion))
yield i
for left in bss([pangram], expand, g, self.pivotH, 1, f_max, lambda n: any(t.terminus for t in self.s(self.rev, n.phrase_string_ns()) + self.s(self.rev, n.phrase_string_ns()[:-1]))):
match = left.phrase_string_ns()[::-1]
for m in (match[1:], match):
for completion in self.completePalindrome(left, expand, f_max, match = m, left = match):
yield completion

In essence, it comes down to switching to very well-informed heuristics that can exploit our foreknowledge of the letter sequence we want in the second half of the phrase.

This still tied up my laptop for many hours. I had two ideas for making progress: (1) reduce the dictionary in hopes of lowering the upper bound for the search and (2) experiment with different heuristics in hopes of dynamically finding better solutions more quickly. As an example of (1), I tried excluding words with multiple occurrences of any letter. As an example of (2), I tried using a piecewise heuristic function that penalizes letter repetition early in the search, but also recognizes that the pangram may contain a "keystone" word overlapping its palindromic midpoint.

Here's how I disallowed intraword letter duplication:

def minKeystoneP(w):
"""Determine whether this word's interior
contains the palindromic pivot of a theoretically
minimal-sized pangramic palindrome.
>>> minKeystoneP('abc')

>>> minKeystoneP('aba')

>>> minKeystoneP('abaa')

k = keystoneP(w)
if not k: return False
sides = w[:k], w[k:]
return all(len(s) == len(set(s)) for s in sides)

def mintries(d):
minKeystones = [w for w in d if minKeystoneP(w)]
nodups = [w for w in d if len(set(w)) == len(w)]
return TriePair(minKeystones + nodups)

Here's some profiling data from a run that used a mintries dictionary (stopped at the first feasible solution). I used beam_width = 512, U = 1024:

solution (65): buckrams donzel ti pyx of vug we jo hm suq us mho jew guv foxy pit lez nods mark cub
3063736602 function calls (1650738620 primitive calls) in 15627.224 CPU seconds
Ordered by: standard name
63010.0050.0000.0050.000{built-in method abs}
20.0000.0000.0000.000{built-in method any}
9420.0040.0000.0040.000{built-in method ascii_encode}
19100.0010.0000.0010.000{built-in method ceil}
10.0010.00115627.22415627.224{built-in method exec}
3111903069.1970.00069.1970.000{built-in method fromkeys}
40.0000.0000.0000.000{built-in method getattr}
155614108204.2890.000204.2890.000{built-in method hasattr}
3111898217.3150.00017.3150.000{built-in method hash}
530.0000.0000.0000.000{built-in method id}
3113382176.9050.000200.8190.000{built-in method isinstance}
50.0000.0000.0000.000{built-in method issubclass}
15569546650.3740.00050.3740.000{built-in method len}
10.0000.0000.0000.000{built-in method lookup}
1790.0000.0000.0000.000{built-in method max}
40.0000.0000.0000.000{built-in method min}
69310.0020.0000.0020.000{built-in method next}
4470.0050.0000.4080.001{built-in method print}
900.0010.0000.0030.000{built-in method sum}
13890.0010.0000.0010.000{method '__enter__' of '_thread.lock' objects}
20.0000.0000.0000.000{method '__subclasses__' of 'type' objects}
40.0000.0000.0000.000{method 'add' of 'set' objects}
827570.0340.0000.0340.000{method 'append' of 'list' objects}
10.0000.0000.0000.000{method 'disable' of '_lsprof.Profiler' objects}
9420.0010.0000.0010.000{method 'extend' of 'bytearray' objects}
26306010678.3120.00078.3120.000{method 'items' of 'dict' objects}
31120994126.5320.000126.5320.000{method 'join' of 'str' objects}
3113199010.4020.00010.4020.000{method 'keys' of 'dict' objects}
310.0000.0000.0000.000{method 'pop' of 'list' objects}
31119029111.0080.000111.0080.000{method 'update' of 'dict' objects}
4470.3570.0010.3570.001{method 'write' of '_FileIO' objects}

Using the same parameters but substituting a piecewise expand function to reduce the search space, I produced the same solution in 12949.838 CPU seconds (shortly after the 7232nd node expansion):

# this function takes a TriePair to an expand function
# suitable for use with bss:
def budgetExpand(ordinarytries):
def bex(n, advisoryBudget):
if n.pangramP():
for x in ordinarytries.expand(n):
yield x
letterAllowance = int((advisoryBudget / 2) - g(n))
if letterAllowance < 5:
missing = len(string.ascii_lowercase) - len(n.covered)
if missing < letterAllowance:
# we might be able to afford to waste the next word
for x in ordinarytries.expand(n):
yield x
if missing == 2:
# find a suitable tp
for lp, t in raretries:
if not any(l in n.covered for l in lp):
for x in t.expand(n):
yield x
if missing <= 4:
letters = string.ascii_lowercase - n.covered.keys()
if all(l in lettertries for l in letters):
for x in itertools.chain.from_iterable(t.expand(n) for l, t in lettertries.items() if not n.coveredP(l)):
yield x
for x in ordinarytries.expand(n):
yield x
return bex
2481633428 function calls (1172623990 primitive calls) in 12949.838 CPU seconds
Ordered by: standard name
72870.0050.0000.0050.000{built-in method abs}
9230.0030.0000.0050.000{built-in method all}
153570.0130.0000.0240.000{built-in method any}
63990.0230.0000.0230.000{built-in method ascii_encode}
3940.1390.0000.1390.000{built-in method ceil}
10.0000.00012949.83812949.838{built-in method exec}
9110.0010.0000.0010.000{built-in method from_iterable}
2820644858.5170.00058.5170.000{built-in method fromkeys}
40.0000.0000.0000.000{built-in method getattr}
141057204171.3260.000171.3260.000{built-in method hasattr}
2820642814.6060.00014.6060.000{built-in method hash}
220.0000.0000.0000.000{built-in method id}
2824172265.7300.000169.5950.000{built-in method isinstance}
50.0000.0000.0000.000{built-in method issubclass}
14117409042.9040.00042.9040.000{built-in method len}
10.0000.0000.0000.000{built-in method lookup}
290.0000.0000.0000.000{built-in method max}
20.0000.0000.0000.000{built-in method min}
77660.0020.0000.0020.000{built-in method next}
31900.0310.0000.9490.000{built-in method print}
370.0000.0000.0010.000{built-in method sum}
95890.0090.0000.0090.000{method '__enter__' of '_thread.lock' objects}
20.0000.0000.0000.000{method '__subclasses__' of 'type' objects}
40.0000.0000.0000.000{method 'add' of 'set' objects}
857970.0340.0000.0340.000{method 'append' of 'list' objects}
10.0000.0000.0000.000{method 'disable' of '_lsprof.Profiler' objects}
63990.0080.0000.0080.000{method 'extend' of 'bytearray' objects}
24419620668.3310.00068.3310.000{method 'items' of 'dict' objects}
28206877105.9380.000105.9380.000{method 'join' of 'str' objects}
282298748.7170.0008.7170.000{method 'keys' of 'dict' objects}
150.0000.0000.0000.000{method 'pop' of 'list' objects}
2820644792.2870.00092.2870.000{method 'update' of 'dict' objects}
31900.6260.0000.6260.000{method 'write' of '_FileIO' objects}

What about that penultimate case, searching up to four overlapping TriePairs each representing the set of words containing a missing letter? How much does it help? The mintriesd superset has 35103 words, and a sample of four-combinations range in size from 11263 to 24428 words. But, the average size lettertrie has 4784 words, so we get a noticeable speedup even though trie search has logarithmic time complexity. Without the lettertries case, we need 2656858411 function calls (1319973770 primitive calls) in 14205.551 CPU seconds.

Finally, I added to the budgetExpand repertoire with lengthtries.

The idea for lengthtries comes from the observation that it would be useful to limit the depth of trie traversals as we approach the current cost bound. With this technique I brought the first solution discovery down to 398442028 function calls in 9485.551 CPU seconds:

def budgetExpand(ordinarytries):
raretries = [(lp, TriePair([w for w in ordinarytries.fwd.words() if all(l in w for l in lp)])) for lp in 'jm vx jy fx jk gq mq jp qy bq hq pq kz jv wz hj fq qv xz kq jw fj qz wx jz kx qw jx qx jq'.split()]
hist = Histogram(ordinarytries.fwd.words())
lettertries = {l : TriePair(hist[l]) for l in [k for k, v in sorted(hist.h.items(), key = lambda i: len(i[1]))][:16]}
lengthtries = {i : TriePair((w for w in mintriesd.fwd.words() if len(w) <= i or keystoneP(w)), mintriesd.rev.words()) for i in range(1, 8)}
def bex(n, advisoryBudget):
if n.pangramP():
for x in ordinarytries.expand(n):
yield x
letterAllowance = int((advisoryBudget / 2) - g(n))
if letterAllowance < 5:
missing = len(string.ascii_lowercase) - len(n.covered)
if missing < letterAllowance:
# we might be able to afford to waste the next word
for x in lengthtries[letterAllowance].expand(n):
yield x
if missing == 2:
# find a suitable tp
for lp, t in raretries:
if not any(l in n.covered for l in lp):
for x in t.expand(n):
yield x
if missing <= 4:
letters = string.ascii_lowercase - n.covered.keys()
if all(l in lettertries for l in letters):
for x in itertools.chain.from_iterable(t.expand(n) for l, t in lettertries.items() if not n.coveredP(l)):
yield x
elif letterAllowance in lengthtries:
for x in lengthtries[letterAllowance].expand(n):
yield x
for x in ordinarytries.expand(n):
yield x
return bex

One last idea for search space reduction is to exclude words containing covered letters:

def budgetExpand(ordinarytries):
# least-common 2-combinations of letters in d:
# [('kx', 53), ('qw', 27), ('jx', 27), ('qx', 26), ('jq', 8)]
raretries = [(lp, TriePair((w for w in ordinarytries.fwd.words() if all(l in w for l in lp)), ordinarytries.rev)) for lp in 'jm vx jy fx jk gq mq jp qy bq hq pq kz jv wz hj fq qv xz kq jw fj qz wx jz kx qw jx qx jq'.split()]
hist = Histogram(ordinarytries.fwd.words())
lettertries = {l : TriePair(hist[l], ordinarytries.rev) for l in [k for k, v in sorted(hist.h.items(), key = lambda i: len(i[1]))][:16]}
lengthtries = {i : TriePair((w for w in ordinarytries.fwd.words() if len(w) <= i or keystoneP(w)), ordinarytries.rev) for i in range(1, 8)}
def mfl(k, n):
"""Kth most frequent letter k, n -> (#, a)"""
if not hasattr(n, 'hist'):
n.hist = Histogram(
lf = list(itertools.islice(n.hist, k + 1))[-1]
return (sum(1 for i in lf[1]), lf[0])
exttltries = Trie() = mintriesd
# this isn't optimal; there may be unreachable
# nodes and some of those may be more selective
# than letter-equivalent reachable nodes. But
# it won't yield incorrect results and it's
# an approximation I can build quickly.
for i in range(32):
(freq, split_char), parent = max(
(mfl(len(n.children), n), n) for n in
exttltries.pre_order_nodes() if len(n.children) <= 4)
subtrie = = TriePair(Trie(w for w in if split_char not in w),
print([x for x in exttltries.pre_order_nodes()])
print([sum(1 for y in for x in exttltries.pre_order_nodes()])
def pickTP(t, n):
if (len(t.children) == 0): return
for l in t.children:
if not l in n.covered: continue
return pickTP(t.children[l], n)
def bex(n, advisoryBudget):
if n.pangramP():
for x in ordinarytries.expand(n):
yield x
letterAllowance = int((advisoryBudget / 2) - g(n))
if letterAllowance < 5:
missing = len(string.ascii_lowercase) - len(n.covered)
if missing < letterAllowance:
# we might be able to afford to waste the next word
for x in lengthtries[letterAllowance].expand(n):
yield x
if missing == 2:
# find a suitable tp
for lp, t in raretries:
if not any(l in n.covered for l in lp):
for x in t.expand(n):
yield x
if missing <= 4:
letters = string.ascii_lowercase - n.covered.keys()
if all(l in lettertries for l in letters):
for x in itertools.chain.from_iterable(t.expand(n) for l, t in lettertries.items() if not n.coveredP(l)):
yield x
if letterAllowance in lengthtries:
for x in lengthtries[letterAllowance].expand(n):
yield x
for x in pickTP(exttltries, n).expand(n):
yield x
return bex

There are unreachable TriePairs in exttltries, and some of those are smaller than letter-equivalent reachable ones. To avoid this, we should maintain the exttltries property that no node's letter should appear in the set of siblings of its ancestors. Together with the greedy order for splitting nodes, this minimizes size and optimizes expected search cost. But we do pretty well even with the quick and dirty experiment: we find a first solution (cost 67) before the 7808 node expansion in under 4562.899 CPU seconds. A 65 solution is found before the 9984 node expansion in under 5141.208 CPU seconds.

I never got around to rewriting variations in search of more beautiful code. I still wonder what other tricks I've overlooked for improving the heuristic function or reducing the search space.

Rumor has it that Common Lisp has been really well-optimized over the years, so I wonder how much faster this could go if rewritten in lisp. Also, I could go twice as fast on my dual-core CPU if python had threads (just partition the input) or if I quit some other apps in order to fit two python instances of this program in my 4GB of RAM.

Many have posted solutions for this problem. Solutions I can recommend reading if this post has piqued your interest include Jim Vellenga's and Keoki Zee's.

No comments:

Post a Comment