PROBLEM STATEMENT
For computers it can be hard to determine in which language a given text is written. A simple way
to try to determine the language is the following: for the given text and for some sample texts,
for which we know the languages, we determine the letter frequencies and compare these.
The frequency of a letter is the total number of occurrences of that letter divided by the total
number of letters in the text. To determine this, we ignore case and non-letter characters.
Once the letter frequencies of the text and of a language are known, we can calculate the
difference between the two. This difference we define by the sum of the squared differences of the
frequencies:
The lesser this value, the closer text resembles that language. Compare text with each element of
languages and return the (0-based) index of the language that has the smallest difference with
text. In case of a tie, return the smallest index.
DEFINITION
Class:LanguageRecognition
Method:whichLanguage
Parameters:vector , string
Returns:int
Method signature:int whichLanguage(vector languages, string text)
CONSTRAINTS
-languages contains between 1 and 50 elements, inclusive.
-Each element of languages has length between 1 and 50, inclusive.
-text has length between 1 and 50, inclusive.
-Each element of languages and text consists only of characters with ASCII value between 32 and
127, inclusive.
-Each element of languages and text contains at least one letter ('A'-'Z' and 'a'-'z').
EXAMPLES
0)
{"This is an English sentence.",
"Dieser ist ein Deutscher Satz.",
"C'est une phrase Francaise.",
"Dit is een Nederlandse zin."
}
"In welke taal is deze zin geschreven?"
Returns: 3
The differences are 0.0385, 0.0377, 0.0430 and 0.0276, so the sentence is written in language 3,
Dutch. Note that Dutch is somewhat similar to German, somewhat less similar to English and not
similar to French.
1)
{"aaaaa","bbbb","ccc","dd","e"}
"xxx"
Returns: 0
In case of a tie, return the language with the smallest index.
2)
{"AABB","AaBb","A? B!","ab!@#$%"}
"ab"
Returns: 0
Ignore case and the non-letter characters.
This problem statement is the exclusive and proprietary property of TopCoder, Inc. Any unauthorized use or reproduction of this information without the prior written consent of TopCoder, Inc. is strictly prohibited. (c)2010, TopCoder, Inc. All rights reserved.