A window of letters is identified within a text sample input. If the window contains matches to reference letter sequences (RLS) contained in multiple sets of n-gram language profiles (profiles), then the longest match is kept and scored for each language. Scoring each language is based on frequency parameters of the matched RLS in profiles for each language. The window is incrementally shifted through the sample and the matching and scoring is done on the letters within the window. At the end of the sample input, the language having the highest cumulative score is identified as the sample's language. Scoring may be improved by restricting the RLS within longer profiles to be full words, using two passes where the second pass disregards languages that are not scored near the highest scoring language during the first pass, favoring matched RLS within profiles of complete words during scoring, favoring longer matched RLS within profiles during scoring, and increasing a score of a match that does not frequently appear in many languages. The profiles may be enhanced by removing some of the RLS if the frequency of the RLS does not meet a predefined threshold and a variable threshold.