Models of virtual library users’ behavior analysis

. In this paper, we present models for the analysis of the behavior of the virtual library (VL) users. Unlike the models presented in the literature, they use only the big data that is stored in the log ﬁles of virtual library servers and methods of statistics, association rules, and recommendation systems. The proposed models were implemented with R software. Using the proposed models, the analysis of the behavior of VL users of Lithuanian research and study of higher education institutions was performed for the ﬁrst time. The results showed that the proposed models allow to operatively analyze the behavior of virtual library users using advanced search ﬁlters, facets, and provide suggestions for improvement of service quality.


Introduction
When different types of virtual library services are provided online, it is important to ensure high-quality services for different VL users. A review of the scientific literature has revealed that the configuration of the layout of facets and filters is usually based on surveys or intuition of the librarians and administrators. Such a layout of VL search filters and facets does not fully meet the real needs of VL users and the growing needs for service quality are deficiently satisfied. The additional solutions need to be integrated for more efficient use of search and facet into VL to find the necessary sources as soon as possible. The analysis of search filters and facets could allow us to make recommendations on how to significantly speed up the filtering of search results and improve the quality of the VL services.
Analysis of user behavior is performed by applying the methods of classification, clustering, visualization [7]. Most often the analysis of users' behavior is executed via interviews and surveys. According to findings, web logs analysis allows not only to identify actions of VL users but also to accumulate detailed information, e.g. about the search words, advanced search filters, and facets (filters applied for search) [1]. The literature [8] highlights the three most popular results filters -Resource Type, Creation Date, Topic. According to findings, the results filters placed on the top of the VL webpage have been used most frequently. Libraries are increasingly focusing on web analytics when data is collected automatically and reflects the actual website users' actions [4]. This is the way to identify users' actions, yet the reasons for such behavior remain unclear. It is important to take account of the needs of different VL users and their most frequently used facets and filters. It is also necessary to ensure that users could choose facets that are relevant for them and the place in the VL webpage is appropriate.

The models for the analysis of VL users' behavior
The proposed models of users' behavior research consist of the exploratory analysis of users' behavior, analysis of filters and facets, and building recommender systems. Analysis of filters and facets allows determining which facets and filters are used most frequently or simultaneously. To perform the advanced search filters and facets analysis, it is not necessary to conduct surveys, interviews, or additional software, only log files is proposed to use. Parsing of URL by the parameters, as well as the development of the lists of used filters and facets for each action of search is necessary. This data allows generating the association rules defining the relations between two sets of search filters and facets.
Apriori method [2] is used to identify association rules. It follows an iterative approach commonly known as a level wise-search, where k-itemsets are used to explore (k + 1)-itemsets. At first, the set of frequent 1-itemsets, denoted as L 1 , is determined. L 1 is then used to find the set of frequent 2-itemsets L 2 and so on until no more frequent k-itemsets can be determined. Let I = {i 1 , i 2 , . . . , i m } denote as the set of m facets. A rule is defined as an implication of the form X ⇒ Y where X, Y ⊆ I [2]. Set of facets X and Y are called the left-hand side (LHS) and right-hand side (RHS) of the rule [2]. Support confidence and lift are used to evaluate the association rules [2,6]. http://www.journals.vu.lt/LMR

Fig. 1. Frequencies of the advanced search filters usage.
A recommender system is suggested to recommend additional facets for VL users. Three methods are used to implement recommender systems: Random Items, Popular Items, and Item-Based Collaborative Filtering (IBCF). IBCF method recommends items that are similar to the items that users prefer [5]. The prediction of the user i to the item j is where w(k, j)-similarity of items calculated by cosine similarity of users; v i,k -user i rating to item k; λ is the normalization factor [3]. Receiver Operating Characteristic curve (ROC) is used to identify the most accurate method [9]. ROC curve is a plot of the recommender system's probability of detection (true positive rate (TPR)) by the probability of false alarm (false positive rate (FPR)) [5]. R programming language was used for the implementation of the models.

Analysis of VL users' behavior of Lithuanian higher education institutions
Using the proposed models, the analysis of the virtual library (based on Ex Libris Primo search and discovery tools) users' behavior of Lithuanian research and higher education institutions was performed for the first time. The big data of server logs of May 2020 were analyzed and accumulated 12.3 million log events during this period. 828.000 log events were selected complying with the users' actions. During the period under analysis 51,000 unique VL users have been identified. The analysis of advanced search filters and facets was performed to analyze the frequency of their usage, which filters, and facets are used at the same time, and how frequently their combinations occur in the searches of VL users. Three advanced search filters are used in VL: Material Type, Publication Date, Language. Figure 1 presents the frequencies of filters' usage in five VL: Vilnius University (VU), Lithuanian Academic Electronic Library (ELABA), Kaunas University of Technology (KTU), Vytautas Magnus University (VDU), Kaunas University of Applied Sciences (KK). Only the advanced searches with filters were analyzed. According to the findings, the most popular filter is Material Type. The association rules for usage of advanced search filters have been created and their analysis carried out. Table 1 presents the association rules, support, confidence, and lift of them. Association rules of advanced search filters were generated and four of them have lift value greater than 1. For example, the first rule shows that both filters are used in 10.5 percent of advanced searches. The lift value greater than 1 (lift = 2) shows that the occurrence of filter Language has a positive effect on the occurrence of the publication date in the advanced searches with filters.
About 30 facets are used in VL. The most frequently used facets are Resource Type, Availability, Creation Date, Language, eLABa institution. Other facets are used infrequently, e.g.: eResource Collection, FMT, eLABa Object Type, New Rocords. They occur in less than 0.7 percent of searches with facets.
Over 200 association rules were generated with the parameters of minimum confidence 0.1 and minimum support 0.1 and the rules with the highest lift are presented in Table 2. For example, filters eLABa Institution, Language, and Access Rights of eLABa Object were used in 0.1 percent of searches with facets, and the occurrence of both filters eLABa Institution and Language has a positive effect on the occurrence of the filter Access Rights of eLABa Object.
To update VL facets, the recommender system for facets was developed using the data of searches when a user applied at least one facet. Three methods were used for the development of the recommender system: Random items, Popular items, and IBCF. Figure 2 presents ROC curve with the TRP and FPR values by applying three different methods for the development of the recommender system.  the data of searches when a user applied at least one facet. Three methods were used for the development of the recommender system: Random items, Popular items, and IBCF. Figure 2 presents ROC curve with the TRP and FPR values by applying three different methods for the development of the recommender system.  The obtained results showed that IBCF method for the development of recommender system is the most accurate. Table 3 presents three examples of recommendations based on users' behavior on May 2020, i.e. which facets were used, and additional filters were recommended. For example, if the user has chosen Resource Type results filter, it is recommended to use three additional filters: Creation Date, Availability and Language.

Conclusions
Analysis of VL users' behavior is very important to improve quality of services for VL users. In this paper, we have proposed two models for the analysis of users' behavior, applying the data of users' behavior collected only from the automatically accumulated server Logs for the first time in VL. The advantage is that the analysis of VL users' behavior does not require user surveys or additional software recording behavior of users. Advanced search filters and facets analysis showed that the behavior of VL users in Lithuanian higher education institutions differs. Association rules revealed differences and similarities in the behavior of various VL users. According to the results of the Recommender System, it was suggested to further develop the layout of VL facets, taking account of the behavior of different users to match the facets with the needs of separate VL users, including a proper layout of facets in the webpage. Proposed models allow to simplify the search filters and facets analysis, investigate the users' behavioral patterns, adapt, and update the content of VL .
[2] Gianni D'Angelo, Salvatore Rampone, and Francesco Palmieri. Developing a The obtained results showed that IBCF method for the development of recommender system is the most accurate. Table 3 presents three examples of recommendations based on users' behavior on May 2020, i.e. which facets were used, and additional filters were recommended. For example, if the user has chosen Resource Type results filter, it is recommended to use three additional filters: Creation Date, Availability and Language.

Conclusions
Analysis of VL users' behavior is very important to improve quality of services for VL users. In this paper, we have proposed two models for the analysis of users' behavior, applying the data of users' behavior collected only from the automatically accumulated server Logs for the first time in VL. The advantage is that the analysis of VL users' behavior does not require user surveys or additional software recording behavior of users. Advanced search filters and facets analysis showed that the behavior of VL users in Lithuanian higher education institutions differs. Association rules revealed differences and similarities in the behavior of various VL users. According to the results of the Recommender System, it was suggested to further develop the layout of VL facets, taking account of the behavior of different users to match the facets with the needs of separate VL users, including a proper layout of facets in the webpage. Proposed models allow to simplify the search filters and facets analysis, investigate the users' behavioral patterns, adapt, and update the content of VL.