Tuesday, 24 March 2009

Yay homographs!

Natural Language Processing - it's a minefield. The NLP facet that every google user knows about (even if they might not know the technical name) is the HOMOGRAPH. That's a word that is spelt the same way as another word with a different meaning. 

Not a Homonym.

A homonym is a word that has the same spelling and the same pronunciation as another word witha different meaning - like 'left', 'right', 'current', or 'set' ('set' is the all time winner with 21 discrete meanings in the OED!) Homographs only need to have the same spelling. Computers, after all, aren't fussed with pronunciation.

(For reference, words that have the same pronunciation but different meanings are homophones, but don't concern NLP). 

Here's a list of homographs, and it's longer than you might think:  http://en.wikipedia.org/wiki/List_of_English_homographs

Add in the acronyms of companies, which are so prevalent on the internet, and you have a swamp. 

This is why I'm a fan of the semantic web. Until there's a reliable way for a machine to tell the difference between 'bear' and 'bear', which I don't think there ever will be (even a person requires detailed context), RDF will be the best we have.

TERMMeaningSpellingPronunciation
HomonymDifferentSameSame
HomographDifferentSameDifferent
HomophoneDifferentDifferentSame
HeteronymDifferentSameDifferent
PolysemeMultiple meaningsSameSame or Different
CapitonymDifferentDifferent if capital first letterDifferent

My favourite will always be the capitonym. From polish to Polish and march to March is a lovely semantic jump!

No comments: