Class ColognePhonetic
- java.lang.Object
-
- org.apache.commons.codec.language.ColognePhonetic
-
- All Implemented Interfaces:
Encoder
,StringEncoder
public class ColognePhonetic extends java.lang.Object implements StringEncoder
Encodes a string into a Cologne Phonetic value.Implements the Kölner Phonetik (Cologne Phonetic) algorithm issued by Hans Joachim Postel in 1969.
The Kölner Phonetik is a phonetic algorithm which is optimized for the German language. It is related to the well-known soundex algorithm.
Algorithm
-
Step 1:
After preprocessing (conversion to upper case, transcription of germanic umlauts, removal of non alphabetical characters) the letters of the supplied text are replaced by their phonetic code according to the following table.(Source: Wikipedia (de): Kölner Phonetik -- Buchstabencodes) Letter Context Code A, E, I, J, O, U, Y 0 H - B 1 P not before H D, T not before C, S, Z 2 F, V, W 3 P before H G, K, Q 4 C at onset before A, H, K, L, O, Q, R, U, X before A, H, K, O, Q, U, X except after S, Z X not after C, K, Q 48 L 5 M, N 6 R 7 S, Z 8 C after S, Z at onset except before A, H, K, L, O, Q, R, U, X not before A, H, K, O, Q, U, X D, T before C, S, Z X after C, K, Q Example:
"M
üller-L
üdenscheidt" => "MULLERLUDENSCHEIDT" => "6005507500206880022"
-
Step 2:
Collapse of all multiple consecutive code digits.Example:
"6005507500206880022" => "6050750206802"
-
Step 3:
Removal of all codes "0" except at the beginning. This means that two or more identical consecutive digits can occur if they occur after removing the "0" digits.Example:
"6050750206802" => "65752682"
This class is thread-safe.
- Since:
- 1.5
- See Also:
- Wikipedia (de): Kölner Phonetik (in German)
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description private class
ColognePhonetic.CologneBuffer
This class is not thread-safe; the fieldColognePhonetic.CologneBuffer.length
is mutable.private class
ColognePhonetic.CologneInputBuffer
private class
ColognePhonetic.CologneOutputBuffer
-
Field Summary
Fields Modifier and Type Field Description private static char[]
AEIJOUY
private static char[]
AHKLOQRUX
private static char[]
AHOUKQX
private static char[]
CKQ
private static char[]
GKQ
private static char[][]
PREPROCESS_MAP
Maps some Germanic characters to plain for internal processing.private static char[]
SCZ
private static char[]
SZ
private static char[]
TDX
private static char[]
WFPV
-
Constructor Summary
Constructors Constructor Description ColognePhonetic()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description private static boolean
arrayContains(char[] arr, char key)
java.lang.String
colognePhonetic(java.lang.String text)
Implements the Kölner Phonetik algorithm.java.lang.Object
encode(java.lang.Object object)
Encodes an "Object" and returns the encoded content as an Object.java.lang.String
encode(java.lang.String text)
Encodes a String and returns a String.boolean
isEncodeEqual(java.lang.String text1, java.lang.String text2)
private java.lang.String
preprocess(java.lang.String text)
Converts the string to upper case and replaces germanic characters as defined inPREPROCESS_MAP
.
-
-
-
Field Detail
-
AEIJOUY
private static final char[] AEIJOUY
-
SCZ
private static final char[] SCZ
-
WFPV
private static final char[] WFPV
-
GKQ
private static final char[] GKQ
-
CKQ
private static final char[] CKQ
-
AHKLOQRUX
private static final char[] AHKLOQRUX
-
SZ
private static final char[] SZ
-
AHOUKQX
private static final char[] AHOUKQX
-
TDX
private static final char[] TDX
-
PREPROCESS_MAP
private static final char[][] PREPROCESS_MAP
Maps some Germanic characters to plain for internal processing. The following characters are mapped:- capital a, umlaut mark
- capital u, umlaut mark
- capital o, umlaut mark
- small sharp s, German
-
-
Method Detail
-
arrayContains
private static boolean arrayContains(char[] arr, char key)
-
colognePhonetic
public java.lang.String colognePhonetic(java.lang.String text)
Implements the Kölner Phonetik algorithm.
In contrast to the initial description of the algorithm, this implementation does the encoding in one pass.
- Parameters:
text
- The source text to encode- Returns:
- the corresponding encoding according to the Kölner Phonetik algorithm
-
encode
public java.lang.Object encode(java.lang.Object object) throws EncoderException
Description copied from interface:Encoder
Encodes an "Object" and returns the encoded content as an Object. The Objects here may just bebyte[]
orString
s depending on the implementation used.- Specified by:
encode
in interfaceEncoder
- Parameters:
object
- An object to encode- Returns:
- An "encoded" Object
- Throws:
EncoderException
- An encoder exception is thrown if the encoder experiences a failure condition during the encoding process.
-
encode
public java.lang.String encode(java.lang.String text)
Description copied from interface:StringEncoder
Encodes a String and returns a String.- Specified by:
encode
in interfaceStringEncoder
- Parameters:
text
- the String to encode- Returns:
- the encoded String
-
isEncodeEqual
public boolean isEncodeEqual(java.lang.String text1, java.lang.String text2)
-
preprocess
private java.lang.String preprocess(java.lang.String text)
Converts the string to upper case and replaces germanic characters as defined inPREPROCESS_MAP
.
-
-