Analysis of Text Non-Homogeneity Using Markers
Articles
Monika Lapėnaitė-Gedvilė
Vilnius University, Lithuania
Karolina Piaseckienė
Šiauliai University, Lithuania
Marijus Radavičius
Vilnius University, Lithuania
Published 2015-12-20
https://doi.org/10.15388/LJS.2015.13884
PDF

Keywords

statistical linguistics
over-dispersion
deviance
binomial logistic regression
functional words

How to Cite

Lapėnaitė-Gedvilė M., Piaseckienė K. and Radavičius M. (2015) “Analysis of Text Non-Homogeneity Using Markers”, Lithuanian Journal of Statistics, 54(1), pp. 92-100. doi: 10.15388/LJS.2015.13884.

Abstract

The aim of the paper is to assess the distributional non-homogeneity of texts in the usage of functional words andother linguistic units. Our empirical study is based on recommended school fiction works taken from a digital library athttp://ebiblioteka.mkp.emokykla.lt. Sets of frequent word forms, called markers, are made, and their frequency counts in blocks of 50successive sentences are calculated. The frequency counts of the markers show significant excess variability (overdispersion) withrespect to a text homogeneity model usually assumed in linguistics. For chosen markers, different kinds of hierarchical binomiallogistic regression models with the author's identifier, the block length and the frequency counts of the remaining markers as explanatory variables are fitted to the block data in order to explain the observed overdispersion of the markers chosen.

PDF
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Please read the Copyright Notice in Journal Policy