Information system of Lithuanian grammar
Articles
Daiva Šveikauskienė
The Institute of the Lithuanian Language
Published 2016-12-15
https://doi.org/10.15388/Lietkalb.2016.10.9915
PDF

Keywords

computer linguistics
grammar
morphology
morphemic
morpheme
grammatical features
information system

How to Cite

Šveikauskienė D. (2016) “Information system of Lithuanian grammar”, Lietuvių kalba, (10), pp. 1-19. doi: 10.15388/Lietkalb.2016.10.9915.

Abstract

The article presents a brief overview of studies in the field of computational morphology in Latvian, Czech, Russian, and English. A more extensive discussion is provided on such studies carried out in Lithuania. Morphemes are marked in the database developed by the Institute of Mathematics and Informatics in Vilnius University by the use of different fonts. A particularly uninformative way of graphical marking of morphemes has been noted in the database of Vytautas Magnus University in which morphemes are separated from each other by using hyphens. Occasionally, such information is even misleading as a result of the use of the same marking in words that have different morphemic structure. Examples of German and Estonian morphological analysers demonstrate that such tools are suitable only for specialists since they include a lot of abbreviations and symbols that are not comprehensible to the wide public. Probably the most comprehensive information in this respect is provided in the Russian morphological analyser. It has been noted that the morphological analysing tools of Lithuanian make a lot of mistakes: in some information systems one may find words that do not exist in Lithuanian, e.g.*blizgėjas; yet others are unable to recognise a lot of Lithuanian words, e.g. toliaregis, apyrankė, nebeatsinešdavau, while the system in such cases reports that “this text is not in Lithuanian or it is grammatically incorrect.” The article describes an information system of Lithuanian grammar which is in its initial stages of development and which is targeted at non-professionals and which, for that matter, provides morphological information in a particularly clear and explicit fashion. In addition, one of the key goals of the information system reported is accuracy and reliability of information therefore it makes use of an error-protection tool. The words will be added to the database by putting them into a generalised format of a Lithuanian word. The users of the tool are provided with two types of information about words: morphological and morphemic information. The part on morphology provides all relevant data about the word as a whole, i.e. part of speech the word belongs to and its relevant grammatical features. In the case of a noun, for example, the tool indicates its case, number, gender, etc; whereas relevant grammatical information about verbs indicated by the system includes the tense, person, number, mood, and so on. In addition, the tool shows the lemma of a word and, in the case of derivatives and compounds, it also shows the underlying words. The morphemic part includes a graphic representation of the structure of a word by providing not only its segmentation into morphemes but also indicating detailed information about each morpheme. Different types of morphemes are marked using different colours followed by more detailed information about the relevant features of that morpheme, e.g. suffixes: derivational, inflectional; ending: pronominal ending, shortened ending, etc. The article presents figures with tentative results of word analysis of a test sample.

PDF
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Please read the Copyright Notice in Journal Policy

Most read articles by the same author(s)