About
Search
Reading Tool
Corpus
Audio
Contact
Kheng.info is an online audio dictionary for the Khmer language with over 3000 recordings. Kheng.info is backed by multiple dictionaries and a large text corpus, and supports search in English and Khmer with search results ordered by word frequency.

Kheng.info also has a reading tool that automatically segments Khmer text and annotates each word with audio and dictionary definitions.
Kheng.info supports search in English, Khmer, and IPA.

Searching in English matches against definitions from the Headley Khmer-to-English dictionaries and against lemmas from the Babiloo open source English-to-Khmer dictionary, whereas searching in Khmer returns all entries for a particular lemma, as well as definitions for any stems contained in that lemma or compounds formed from it.

Kheng.info also supports searching in International Phonetic Alphabet (IPA) field of Khmer entries. Go to the search page to turn on IPA search.

Regular expressions can be used on Khmer and IPA searches, but must also be activated on the search page.
The reading tool segments Khmer text and annotates each word with dictionary data and audio recordings.

Khmer is typically written without word dividers, unlike English. So, it is often helpful for beginners to automate this process of text segmentation.

Texts are limited to a maximum of 20,000 characters. For news articles and texts that are not highly specialized, you will be able to hear audio and see definitions for nearly every word.
Kheng.info is built on a three million line Khmer text corpus, assembled mostly from news articles found on the web.

From this corpus, a word frequency count of each lemma in the Headley Khmer-to-English dictionaries was performed. Each lemma is associated with a frequency in the Kheng.info database, allowing the most relevant Khmer-to-English search results to be returned first. This is especially useful when searching for common words like "come" or "make", which will match against thousands of definitions in the database.

The corpus is suitable for this project, but it was not compiled carefully in the way that an academic corpus would be. A rough attempt was made to balance the text between topics such as politics, travel, food, business, medicine, etc., but in general the corpus and resulting frequency list are skewed toward subjects typically discussed in newspapers and online periodicals. It is not a good approximation of spoken Khmer.
Kheng.info contains over 3000 audio recordings of the most common Khmer words. Given the diversity of sounds in Khmer and the initial difficulty of learning the script, it is extremely useful to hear a native speaker's pronunciation.

If an audio recording is available, a speaker icon () will appear next to the Khmer lemma. If a word has not been recorded yet, check the stems listed to the right as it is often possible to reconstruct a word by listening to the individual stems. The audio should play in the background without moving to another page. If you have any trouble with the audio:
  • Make sure JavaScript is turned on in your browser's settings.
  • Check your system settings and close any other programs that interact with your audio driver.
  • Make sure your browser is compatible with HTML5 audio.
The voice behind these recordings is Sinett Sun, a native Khmer speaker.
If you have any questions about or problems with kheng.info, contact me at mf88mf88@gmail.com.