QLVL Research Project sem•metrix
The sem.metrix investigation is situated at the borderline between computational linguistics and theoretical linguistics. More particularly, it confronts Word Space models of semantic similarity with theory formation in lexical semantics.
Word meaning is one of the linguistic concepts that most eludes quantitative investigation. So far, the new usage-based movement in theoretical linguistics has therefore only skirted the issue. Still, there is a tight relationship between word meaning and word use -- one already noted by the likes of Firth, Harris or Wittgenstein. While this insight has given rise to the development of so-called Word space Models in computational linguistics, theoretical linguistics has not yet embraced its possibilities.
Word Space Models try and capture the meaning of a word in terms of its use. On the basis of a large corpus, they record all contextual features that a word co-occurs with. These features might be syntactic -- like is the subject of verb v -- or more collocational -- like co-occurs with word w. It is then hypothesized that two words that share many contextual features will also be semantically related.
Such Word Space Models are pervasive in many kinds of computational-linguistic tasks, from Question Answering to Automated Essay Grading. So far, however, we have very little knowledge of what type of semantic information they model. Do they indeed discover semantic similarity, like between plane and airplane, or are they more sensitive to semantic relatedness, like between airplane and airport? What about the different types of semantic similarity, like synonymy, hyponymy, hypernymy and cohyponymy? And how well do the various models deal with the different meanings a word may have? The importance of these questions is clear: not only will the answers give us more insight in the precise relationship between meaning and use, they will also make it possible to develop tailor-made Word space Models for specific computational-linguistic tasks.
Further informationSem•metrix is funded by the Research Council of the K.U.Leuven as OT 3H051085 and the Fund for Scientific Research — Flanders (FWO).
So far, two workshops have grown out of the sem•metrix project. More are certainly to follow!sem•metrix kick-off workshop
10 January 2007, University of Leuven, Belgium.
New Ways of Analyzing Lexical Variation
Workshop at the IClaVE conference in Cyprus, 17-19 June 2007.
Peirsman, Yves, Simon De Deyne, Kris Heylen and Dirk Geeraerts. 2008. "The Construction and Evaluation of Word Space Models". In Proceedings of the Language Resources and Evaluation Conference (LREC 2008), Marrakech, Morocco. [pdf]
Heylen, Kris, Yves Peirsman and Dirk Geeraerts. 2008. "Modelling Word Similarity. An Evaluation of Automatic Synonymy Extraction Algorithms". In Proceedings of the Language Resources and Evaluation Conference (LREC 2008), Marrakech, Morocco. [pdf]
Peirsman, Yves, Kris Heylen and Dirk Speelman. 2008. "Putting things in order: First and second order context models for the calculation of semantic similarity". In Actes des 9es Journées internationales d'Analyse statistique des Données textuelles (JADT 2008), Lyon, France. [pdf]
Peirsman, Yves, Kris Heylen and Dirk Speelman. 2007. "Finding
semantically similar words in Dutch. Co-occurrences versus syntactic contexts".
In Proceedings of the CoSMO workshop, Roskilde, Denmark, pages 9-16. [pdf]