Seminar on language processing: Acquisition of 'Deep' Knowledge from Shallow Corpus Statistics

Date: December 04, 2008 (Thursday) at 09:00 to 11:00

Dekang Lin from Google is holding a seminar on language processing entitled "Acquisition of 'Deep' Knowledge from Shallow Corpus Statistics".

Many hard problems in natural language processing seem to require knowledge and inference about the real world. For example, consider the referent of the pronoun 'his' in the following sentences:
(1) John needed his friends
(2) John needed his support
(3) John offered his support
A human reader would intuitively know that 'his' in (1) and (3) is likely to refer to John, whereas it must refer to someone else in (2). Since the three sentences have exactly the same syntactic structure, the difference cannot be explained by syntax alone. The resolution of the pronoun references in (2) seem to hinges on the fact that one never needs one's own support (since one already has it).

I will present a series of knowledge acquisition methods to show that seemingly deep linguistic or even world knowledge may be acquired with rather shallow corpus statistics. I will also discuss the evaluation of the acquired knowledge by making use of them in applications.

Dekang Lin is a Staff Research Scientist at Google and was a Professor at University of Alberta 2000-2008. He served on NAACL Executive Board 2003-2004 and was program co-chair for ACL-2002 and EMNLP-2004. His main research interests include principle-based parsing and unsupervised learning from text and question-answering.

URL: http://research.google.com/pubs/author108.html

