Kheng.info also has a reading tool that automatically segments Khmer text and annotates each word with audio and dictionary definitions.
Searching in English matches against definitions from the Headley Khmer-to-English dictionaries and against lemmas from the Babiloo open source English-to-Khmer dictionary, and searching in Khmer returns all entries for a lemma, as well as definitions for any stems contained in that lemma or compounds formed from it.
Kheng.info also supports searching in International Phonetic Alphabet (IPA) field of Khmer entries. Go to the search page to turn on IPA search.
Regular expressions can be used on Khmer and IPA searches, but need to be activated on the search page.
Khmer is typically written without word dividers, unlike English. So, it is often helpful for beginners to automate this process of text segmentation.
Texts are limited to a maximum of 20,000 characters. For news articles and texts that are not highly specialized, you will be able to hear audio and see definitions for nearly every word.
From this corpus, a word frequency count of each lemma in the Headley Khmer-to-English dictionaries was performed. Each lemma is associated with a frequency in the Kheng.info database, allowing the most relevant Khmer-to-English search results to be returned first. This is especially useful when searching for common words like "come" or "make", which will match against thousands of definitions in the database.
The corpus works well for its purpose, but was not compiled carefully in the way that an academic corpus would be. I attempted to balance the text between topics such as politics, travel, food, business, medicine, etc., but in general the corpus and resulting frequency list are skewed toward subjects typically discussed in newspapers and online periodicals. It is not a good approximation of spoken Khmer.
If an audio recording is available, a speaker icon () will appear next to the Khmer lemma. If a word has not been recorded yet, check the stems listed to the right as it is often possible to reconstruct a word by listening to the individual stems. The audio should play in the background without moving to another page. If you have any trouble with the audio:
- Check your system settings and close any other programs that interact with your audio driver.
- Make sure your browser is compatible with HTML5 audio.