TextCat Language Guesser
Demo
This is a demonstration of a language guesser, as proposed in
Cavnar, Trenkle,
N-Gram-Based Text Categorization
. It's implemented in Perl. You can get the programme under GPL certain copyright conditions
here
. For free! No commercial version available!
The competitors!
Type some text. The more text you provide, the more reliable the guesser works.
LIST OF LANGUAGES currently supported.
But some languages are only supported in certain encodings...
afrikaans
albanian
amharic-utf
arabic-iso8859_6
arabic-windows1256
armenian
basque
belarus-windows1251
bosnian
breton
bulgarian-iso8859_5
catalan
chinese-big5
chinese-gb2312
croatian-ascii
czech-iso8859_2
danish
dutch
english
esperanto
estonian
finnish
french
frisian
georgian
german
greek-iso8859-7
hawaian
hebrew-iso8859_8
hindi
hungarian
icelandic
indonesian
irish
italian
japanese-euc_jp
japanese-shift_jis
korean
latin
latvian
lithuanian
malay
marathi
middle_frisian
mingo
nepali
norwegian
persian
polish
portuguese
quechua
romanian
russian-iso8859_5
russian-koi8_r
russian-windows1251
sanskrit
scots
scots_gaelic
serbian-ascii
slovak-ascii
slovak-windows1250
slovenian-ascii
slovenian-iso8859_2
spanish
swahili
swedish
tagalog
tamil
thai
turkish
ukrainian-koi8_u
vietnamese
welsh
yiddish-utf