25 Aug 13
00:11

Mac Dictionary Services API Tease

Basic lookup working

I am a big fan of Dictionary.app. It’s pretty handy for English, but what makes it really shine is that it has a zomg-amazing Japanese dict called Daijisen. What’s more, with 10.8 Apple threw in German, French, Spanish, and Chinese dictionaries as well. However, after getting used to the app, while still using it, I have decided that it sucks. I should clarify. The dictionary content that it has is great. But the app itself is lacking some features that would make it so much more useful. I want search history that goes beyond a single app launch, and an interface for seeing what words I looked up when. I want to be able to export this list so I can make flash cards. I also want to be able to search by words other than by a ‘starts-with’ scheme, such as ‘contains’ or ‘ends with’, like many online dictionaries have.

For a while I just assumed there was no easy way of using the content the mac dictionary app uses. Then I found out about the mac dictionary services API, which looks promising at a glance.

I created the basic lookup pretty quickly while on the BART commute to work and back. But it became apparent after some tinkering that the existing API (which has only two functions for word lookup) is entirely incomplete. You can look up a string, and get a definition back from the same dictionaries that Dictionary.app uses. However, you can’t specify which dictionary, or which entry within a dictionary (e.g. for a word that has multiple definitions). This means you only will the first entry of the first dictionary that gets hit. So I decided to spend a good part of today trying to see what could be done about this.

I came across one or two or three interesting posts that showed some private API off. Most of these were for simple CLI programs, or for building their own dictionary. And used private calls such as DCSRecordCopyData() and DCSCopyAvailableDictionaries(). DCSCopyAvailableDictionaries allowed me to access specific dictionaries, and used in conjunction with DCSGetTermRangeInString and DCSCopyRecordsForSearchString, I was able to generate a reasonable list of candidate DCSRecordRefs entries from a single input word. The only thing missing was the definitions for each DCSRecordRef.

I wanted to make this dictionary for my Japanese studies. In Japanese, there are tons of homonym and homophones depending on if you use write the word with kanji or not. I didn’t see a function from my googles that would show the word listing, but doing an NSLog(@”%@”, record) on an example search for ‘いる’ showed some info about the DCSRecord structure:

lldb output:
{key = いる, headword = いる【射る】, bodyID = 111081}
{key = いる, headword = いる【要る】, bodyID = 104584}
{key = いる, headword = いる【居る】, bodyID = 163639}

It looked like the ‘bodyID’ or ‘headword’ field of DCSRecordRef had the most specific information about the result. So I went into the framework and searched for symbols of functions that might do the trick:

cd /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/DictionaryServices.framework
nm -gU DictionaryServices|grep DCS
0000000000007b06 T _DCSActivateDictionaryPanel
000000000000914d T _DCSCopyActiveDictionaries
000000000000916f T _DCSCopyAvailableDictionaries
[...]
0000000000007e07 T _DCSRecordCopyData
0000000000007e1e T _DCSRecordCopyDataURL
0000000000007e63 T _DCSRecordGetAnchor
00000000000076ef T _DCSRecordGetAssociatedObj
0000000000007e4c T _DCSRecordGetDictionary
0000000000007ebd T _DCSRecordGetHeadword
0000000000007e91 T _DCSRecordGetRawHeadword
0000000000007f04 T _DCSRecordGetString
0000000000007e35 T _DCSRecordGetSubDictionary
0000000000007e7a T _DCSRecordGetTitle
0000000000009181 T _DCSRecordGetTypeID
00000000000076d9 T _DCSRecordSetAssociatedObj
0000000000007ea8 T _DCSRecordSetHeadword
000000000000878a T _DCSSearchSessionCreate
[...]

The second entry for 棺 should be ひつぎ

There were about a 100 or so symbols or so that Apple didn’t feel like sharing via docs. I tried a few combinations, but it looks like DCSRecordGetTitle or DCSRecordGetRawHeadword produced the best strings for use with DCSCopyTextDefinition. This solves the homonym problem for the most part. However, this did not work at all with heteronyms (words that are spelled the same, but pronounced differently), since the headword/display title would be the same. For example, for the input 棺 I need the definitions for both 棺 read as ‘kan’ and 棺 read as ‘hitsugi’, but this method would give me two definitions for ‘kan’ instead. Eventually I gave up. I hope someone else figures this out. I put the intermediate result up on github. Let me know if you make any progress.

On iOS it goes without saying that you should probably avoid using private APIs, since the main means of distribution is through apple. On the mac distributing an app yourself is still viable, and thus you can use private APIs to your heart’s content. However, because the non-documented but exported symbols do not provide function signatures, it’s pretty much just a tease unless you want to spend a lot of time to figure out what each method takes. After googling symbol names, it seems dictionary services are relatively unexplored. But I’m mostly interested in this for making my hobby dictionary that I can nerd out on and add lots of obscure features to while using the apple-provided dictionaries.