lu.cs.co.graph
Class Words

java.lang.Object
  |
  +--lu.cs.co.graph.Graph
        |
        +--lu.cs.co.graph.Words

public class Words
extends Graph

Constructors for undirected graphs whose vertices are five-letter words. A pair of vertices is joined by an edge if the distance between the corresponding words is below some threshold. Two distances are computed:

  1. The Hamming distance Hbetween two words is the number of places where they differ, so H(tombs,splat)= 5.
  2. The character distance C between two characters is their distance in the alphabet; the character distance between two words is the sum of the character distances of their letters. Thus C(tombs,splat)=5 and C(yucca,abuzz)=24+19+18+23+25=109.

Further, two words can be joined independently of their distance if they are anagrams. Thus artsy, trays, stray, and satyr form a clique.

The length of an edge is the character distance of the words it joins.

Graphs of any size up to and including 5757 may be constructed. The words fall into three categories depending on how common they are, any of these groups may be in- or excluded from the graph.

A word of caution: 5757 is a lot of nodes. Merely constructing some of the graphs takes the machine to its limits. The constructor writes a message to standard error indicating when its task is over.

Some graphs

Words(10) is a complete graph on 10 words, the length of an edge is the character distance between its incident words. Here is a TSP tour (a shortest Hamiltonian cycle) of length 266:

caved -- dater -- paper -- satyr -- whops -- nippy -- nooks -- rowdy -- shoal -- cynic -- caved

Words(0,1,0,0) contains all 5757 words. Two words are joined only if their Hamming distance is 1. The largest clique in this graph has size 13:

wills rills kills dills hills pills bills gills tills cills fills sills mills,

Words(1000,2,0,0), in which words are joined if they differ in at most 2 letters, has slightly more interesting cliques, here is one of size 16:

fazed jaded maced paled waled paged hazed lamed waxed raved maned taxed rayed waked caned caved

Words(0,0,0,3) contains all words and joins them if their character distance is at most 3. The clique number is 11:

bards barer cards cares dares carer cases bases bares barfs caret.


Fields inherited from class lu.cs.co.graph.Graph
m, n, name
 
Constructor Summary
Words(int size)
          A complete graph on a specified number of randomly chosen vertices.
Words(int size, int HammingDistance, int characterDistance, boolean allowAnagrams, boolean common, boolean advanced, boolean unusual, long seed)
          A graph on a specified number of randomly chosen vertices.
Words(int size, int HammingDistance, int characterDistance, long seed)
          A graph on a specified number of randomly chosen vertices; edges are included if their Hamming and character distances do not exceed the given thresholds.
 
Methods inherited from class lu.cs.co.graph.Graph
degree, edges, findByName, getAdjecencyMatrix, getDistanceMatrix, getEdge, getIndex, getVertex, getVertex, hasEdge, incident, incoming, indegree, index, isDirected, main, outdegree, outgoing, parseGraph, toString, vertices
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Words

public Words(int size)
A complete graph on a specified number of randomly chosen vertices. All edges are included, their length is the character distance between the corresponing words.
Parameters:
size - the size of the graph. If this value exceeds 5757 or is zero, the value 5757 is substituted.

Words

public Words(int size,
             int HammingDistance,
             int characterDistance,
             long seed)
A graph on a specified number of randomly chosen vertices; edges are included if their Hamming and character distances do not exceed the given thresholds.
Parameters:
size - the size of the graph. If this value exceeds 5757 or is zero, the value 5757 is substituted.
HammingDistance - the maximal Hamming distance allowed between two neighbouring vertices. If this value is 0, the value 5 is substituted, trivialising this condition.
characterDistance - the maximal character distance allowed between two neighbouring vertices. If this value is 0, the value 26*HammingDistance is substituted, trivialising this condition.
seed - the seed to be used in all random choices

Words

public Words(int size,
             int HammingDistance,
             int characterDistance,
             boolean allowAnagrams,
             boolean common,
             boolean advanced,
             boolean unusual,
             long seed)
A graph on a specified number of randomly chosen vertices. Edges are included if their Hamming and character distances do not exceed the given thresholds, or the corresponding words are anagrams. Words can be in- or excluded depending on how common they are.
Parameters:
size - the size of the graph. If this value is too large or zero, the largest value consistent with the other parameters is substituted
HammingDistance - the maximal Hamming distance allowed between two neighbouring vertices. If this value is 0, the value 5 is substituted, trivialising this condition.
characterDistance - the maximal character distance allowed between two neighbouring vertices. If this value is 0, the value 26*HammingDistance is substituted, trivialising this condition.
allowAnagrams - if true, two words are joined if they are anagrams, independently of their Hamming or character distances
common - include the 3300 common words from `aback' to `zooms.'
advanced - include the 1194 advanced words from `abaci' to `zonal.'
unusual - include the 1263 unusual words from `aargh' to `zowie.'
seed - the seed to be used in all random choices