Statistical machine translation. Looking at text in one language and using the information in another. You need to grok syntax and semantics of both, a big dictionary, etc. Google has access to lots of CPU and lots of text, so they took a statistical approach using world pairs, phrases, etc.
Example of a news story translated from Arabic to English.
Named entity extraction (people, companies, products, etc). Lots of relationships to find in the text they've got. They started with simple patterns in "easy" sentences. If text such "such as" they're using it. It helps them extract facts like "HP is a computer manufacturer."
Word clusters is next. They build a bayesian network of words and word clusters.
On-line demo time. Interactive use of word clusters. Using "george bush" and "john kerry". Amusing results. "That's what the web says."
See Also: My Web 2.0 post archive for coverage of all the other sessions I attended.
Posted by jzawodn at October 07, 2004 11:13 AM
Efficient clustering of words have already been done.
McCallum, Andrew Kachites. "Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering."