Japanese Homophones
One class of words I constantly struggle with during vocabulary review is homophones, words that sound the same, but have different meanings.
In Japanese, there are a lot of them! If there are kanji, it's easier to remember their difference, but for learners like me it is still easy to make mistakes.
This list was generated using the JMdict database JMdict.
The list is organized as follows:
Homophones are sorted by the number of colocations (the number of dictionary entries, shown in parentheses) from most to least.
The list will only show the readings and the colocation number, as it would get pretty really large and hard to read otherwise.
Click this link to see the list
Homophone Statistics
Some statistics about the generated list.
Total number entries: 208954
Total number homophones: 13105
Total number of entries part of a homophone set: 47022
Maximum number of colocations: (spelling, num) = (こう, 51)
Number of duplicate kanji-compounds: 4443
Top 10 colocations: こう (51), かん (42), しょう (40), し (40), そう (36), こうし (32), き (31), こうき (29), こうしょう (28), せん (28)
Limitations
There is very limited processing during the generation and as such there are some limitations to keep in mind.
First of all, this list was created by a non-linguist Japanese learner based on things I found interesting or difficult during learning and might contain inaccuracies due to ignorance.
Some of the found colocations might correspond to a special reading of a kanji compound that might not be commonly in use.
This list only uses spelling and does not take into account pitch. This is due to multiple reasons. First, laziness in finding and including a dataset for pitch accents and matching it with the entries, though even with such a database it might not always match up or contain information for all variants. Additionally, while not a reason to ignore the correct pronunciation of words, many learners probably have trouble discerning the different pitches, especially in a sentence, so a pure reading based grouping might make more sense.