Suffix Trees

Implements generic suffix trees which operate on lists of tokens, where tokens can be of any type that can be compared with eqv?. Suffix trees support fast searches by prefix and approximate searches. Approximate searches quickly find "words" which match a query within some edit distance, and can be directly used to implement spelling correction.

Note: the documentation uses the terms "word" and "substring" with the understanding that the tree can store any lists of atoms.

Procedures

suffix-tree:make

Creates a new empty suffix tree.

suffix-tree:add-subtree! tree token subtree

suffix-tree:insert-word! tree word-tokens word

Inserts a new word into a suffix tree. tokens is a list of the word's subelements such as characters. word identifies the word. For example, in a common scenario one might insert a string-word like follows:

 (suffix-tree:insert! tree (string->list "scheme") "scheme")
 

suffix-tree:get-words tree

Returns a list of all words in a suffix tree.

suffix-tree:search tree tokens [optional: exists-only]

Searches for all words containing the given substring (tokens). If exists-only is true, will only check whether the substring is present and return #t or #f.

suffix-tree:search-prefix tree tokens [optional: exists-only]

Like suffix-tree:search, but matches only prefixes of words.

suffix-tree:search-approx tree tokens max-dist

Finds all words which match tokens up to a Levenshtein edit distance of max-dist.

(suffix-tree:get-words st)
;Value 22786: ("to" "correction" "of" "searches." "which" 
               "Approximate" "generic" "quickly" "can" "prefix" 
               "distance," "tokens," "find" "some" "match" 
               "any" "tokens" "approximate" "searches" "be" 
               "fast" "computer" "where" "Suffix" "directly" 
               "within" "spelling" "query" "trees" "operate" 
               "lists" "implement" "support" "compared" 
               "Implements" "that" "suffix" "edit" "type" 
               "words" "by" "a" "with" "eqv?." "on" "and" "used")

(suffix-tree:search-approx st (string->list "suprt") 2)
;Value 22788: ("support")

(suffix-tree:search-approx st (string->list "hype") 2)
;Value 22789: ("type")

(suffix-tree:search-approx st (string->list "element") 2)
;Value: ()

(suffix-tree:search-approx st (string->list "element") 3)
;Value 22790: ("implement")


Scheme Power Tools Documentation
(c) Maciej Pacula 2010-2011
http://mpacula.com