GuessLanguage Class Reference
from PyKDE5.sonnet import *
Namespace: Sonnet
Detailed Description
GuessLanguage determines the language of a given text.
GuessLanguage can determine the differnce between ~75 languages for a given string. It is based off a perl script origionaly written by Maciej Ceglowski <maciej@ceglowski.com> called Languid. His script used a 2 part huristic to determine language. First the text is is checked for the scripts it contains, then for each set of languages useing those scripts a n-gram frequency model of a given language is compared to a model of the text. The most similar language model is assumed to be the language. If no language is found an empty string is returned.
- Since:
- 4.3
Methods | |
__init__ (self) | |
QString | identify (self, QString text, QStringList suggestions=QStringList()) |
setLimits (self, int maxItems, float minConfidence) |
Method Documentation
__init__ | ( | self ) |
Constructor Creates a new GuessLanguage instance. If text is specified, it sets the text to be checked.
- Parameters:
-
text the text that is to be checked
QString identify | ( | self, | ||
QString | text, | |||
QStringList | suggestions=QStringList() | |||
) |
Returns the 2 digit ISO 639-1 code for the language of the currently set text and. Three digits are returned only in the case where a 2 digit code does not exist. If text isn't empty, set the text to checked.
- Parameters:
-
text to be identified
- Returns:
- list of the presumed languages of the text, sorted by decreasing confidence. Empty list means it is impossible to determine language with confidence required by setLimits
setLimits | ( | self, | ||
int | maxItems, | |||
float | minConfidence | |||
) |
Sets limits to number of languages returned by identify(). The confidence for each language is computed as difference between this and next language on the list normalized to 0-1 range. Reasonable value to get fairly sure result is 0.1 . Default is returning best guess without caring about confidence - exactly as after call to setLimits(1,0).
- Parameters:
-
maxItems The list returned by identify() will never have more than maxItems item minConfidence The list will have only enough items for their summary confidence equal or exceed minConfidence.