The article deals with the system-oriented and text-oriented approaches to the classification of collocations, which differ in the criteria and the sequence of applied procedures. The developed methodology of collocation identification with the help of statistical analysis of the texts and lemmatization allows extracting the fixed two-word combinations from the corpus of Ukrainian texts automatically. Threeword co-occurrences, mistakenly identified as two-word, homonymy and positional co-occurrence of the words that are not related syntactically present the main problems of the collocation identification and classification. The appliance of the corpus-oriented approach provided the revelation of two-word lexical, grammatical and predicative collocations, which are relevant to the modern Law discourse, observed in the subset of the sub-corpus of the Ukrainian Law Acts. The use of a bigger corpus and linguistically-motivated filters guarantees the increasing efficiency of the results of collocation identification and classification.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Please read the Copyright Notice in Journal Policy.