SoundEx Algorithms
SoundEx is a Phonetic Algorithm for indexing names by sound as pronounced. The
goal is for homophones to be encoded to the same representation so that they can
be matched regardless of minor variations in spelling. The algorithm primarily
does this by encoding the consonants. A vowel will not be encoded unless it is the
first letter of a word. SoundEx is the most widely known of all Phonetic
Algorithms.
SoundEx was developed by Robert C. Russell and Margaret K. Odell and patented
in 1918 and 1922. A variation called American SoundEx was used in the 1930s for
a retrospective analysis of the US censuses from 1890 through 1920. The SoundEx
code came to prominence in the 1960s when it was the subject of several articles
in the Communications and Journal of the Association for Computing Machinery,
and especially when described in Donald Knuth's The Art of Computer
Programming.
The National Archives and Records Administration (NARA) maintains the current
rule set for the official implementation of SoundEx used by the U.S. Government.
These encoding rules are available from NARA, upon request, in the form of
General Information Leaflet 55, "Using the Census SoundEx".
King James Pure Bible Search uses an enhanced variation of the American
SoundEx for English text and has added support for variations in French, Spanish,
and German in preparation for future use with Foreign Language Bible Texts.
English SoundEx Algorithm
https://en.wikipedia.org/wiki/Soundex
https://creativyst.com/Doc/Articles/SoundEx1/SoundEx1.htm
The correct value can be found as follows:
1. Retain the first letter of the name and drop all other
occurrences of a, e, i, o, u, y, h, w.
2. Replace consonants with digits as follows (after the first
letter):
b, f, p, v => 1
c, g, j, k, q, s, x, z => 2
d, t => 3
l => 4
m, n => 5
r => 6
3. If two or more letters with the same number are adjacent in
the original name (before step 1), only retain the first
letter; also two letters with the same number separated by
'h' or 'w' are coded as a single number, whereas such
letters separated by a vowel are coded twice. This rule
also applies to the first letter.
172