QLVL members - Sofie Van Gijsel


Sofie Van Gijsel studied Germanic Languages (English-Dutch) at the University of Leuven. In 2000, she spent a semester studying at the University of Coimbra (Portugal) as part of the Erasmus exchange. After graduating in 2001, she took the MPhil in Linguistics at Trinity College, Dublin (Ireland). As research assistant of the FWO (Research Foundation –Flanders), she is currently preparing her PhD, which will be defended in 2007.




Locating lexical richness: a sociolectometric, corpus linguistic analysis

This PhD sets out to investigate the distribution of lexical richness, using a corpus of spoken Dutch (CGN, Schuurman et al., 2003). Lexical richness measures have been studied both in applied linguistics (see e.g. Read, 2000) and in the context of word frequency distributions (Baayen, 2001). Yet, relatively little research exists that specifically attempts to investigate the distribution of lexical use from a sociolectometric, variationist perspective, integrating the effect of extralinguistic parameters, such as ‘register’ or ‘region’, in a multivariate analysis.

Using statistical analyses, the type-token ratio’s (TTR’s) of a stratified sample of equally sized and relatively small texts chunks are analyzed. The results show that the register of the subcorpora strongly determines their lexical richness, while the effect of the other factors (viz. ‘sex’, but especially ‘region’ and ‘educational level’) appears to be less strong. Following this, more detailed analyses on subcorpora consisting of a single part-of-speech (POS) are performed. The POS-specific analyses bring out differences between Flanders and The Netherlands and a consistently lower TTR for women. Yet, the analyses also indicate an interference of both the thematic and the communicative specificity of the registers. Therefore, supplementary analyses are proposed (viz. using the number of texts in each subcorpus, analyzing the keywords using a log likelihood statistic, and calculating the number of hapaxes), showing the effect of multithematicity in the different registers on the TTR-measure. In order to reduce the thematic bias in the subcorpora, a randomized sampling method is implemented. The results confirm the sociolectometric variation in the data, while allowing for a more detailed localization of lexical richness effects.

Representative publications

The following publications are representative of Sofie's research. The rest of her published work can be found in the QLVL publication list.

Van Gijsel, S., D. Speelman & D. Geeraerts. To appear. "A variationist, corpus linguistic analysis of lexical richness.". In Proceedings of Corpus Linguistics 2005, Birmingham, UK.

Van Gijsel, S., D. Speelman & D. Geeraerts. 2006. "Locating lexical richness: a corpus linguistic, sociovariational analysis". In J.-M. Viprey et al. (eds.), Proceedings of the 8th International Conferene on the statistical analysis of textual data (JADT) 961-971. Besanšon, France. [link]

Van Gijsel, S., D. Geeraerts & D. Speelman. 2004. "A functional analysis of the linguistic variation in Flemish spoken commercials". In C. Fairon, G. Purnelle & A. Dister (eds.), 7th International Conference on the statistical analysis of textual data 1136-1144. Louvain-la-Neuve, Belgium. (Available online).