Analysis of Text Non-Homogeneity Using Markers

Monika Lapėnaitė-Gedvilė; Karolina Piaseckienė; Marijus Radavičius

doi:10.15388/LJS.2015.13884

Articles

Monika Lapėnaitė-Gedvilė

Vilnius University, Lithuania

Karolina Piaseckienė

Šiauliai University, Lithuania

Marijus Radavičius

Vilnius University, Lithuania

Published 2015-12-20

https://doi.org/10.15388/LJS.2015.13884

PDF

Keywords

statistical linguistics
over-dispersion
deviance
binomial logistic regression
functional words

How to Cite

Lapėnaitė-Gedvilė, M., Piaseckienė, K. and Radavičius, M. (2015) “Analysis of Text Non-Homogeneity Using Markers”, Lithuanian Journal of Statistics, 54(1), pp. 92–100. doi:10.15388/LJS.2015.13884.

Download Citation

Abstract

The aim of the paper is to assess the distributional non-homogeneity of texts in the usage of functional words andother linguistic units. Our empirical study is based on recommended school fiction works taken from a digital library athttp://ebiblioteka.mkp.emokykla.lt. Sets of frequent word forms, called markers, are made, and their frequency counts in blocks of 50successive sentences are calculated. The frequency counts of the markers show significant excess variability (overdispersion) withrespect to a text homogeneity model usually assumed in linguistics. For chosen markers, different kinds of hierarchical binomiallogistic regression models with the author's identifier, the block length and the frequency counts of the remaining markers as explanatory variables are fitted to the block data in order to explain the observed overdispersion of the markers chosen.

PDF

References

Downloads

Download data is not yet available.

Most read articles by the same author(s)

Vaidotas Kanišauskas, Karolina Piaseckienė, Prediction of the Geometric Renewal Process , Lithuanian Journal of Statistics: Vol. 56 No. 1 (2017): Lithuanian Journal of Statistics
Gediminas Murauskas, Marijus Radavičius, Multi-Unit Assignment Problem: FCFS Course Allocation System Data Analysis , Lithuanian Journal of Statistics: Vol. 55 No. 1 (2016): Lithuanian Journal of Statistics
Gediminas Murauskas, Marijus Radavičius, Discriminating poetry and prose using syllable statistics , Lithuanian Journal of Statistics: Vol. 61 (2022): Lithuanian Journal of Statistics
Marijus Radavičius, Editorial Board and Table of Contents , Lithuanian Journal of Statistics: Vol. 58 No. 1 (2019): Lithuanian Journal of Statistics