Ontologies and Technologies for Integrating and Accessing Digital Cultural Heritage: Lithuanian Approach

Received: 29/10/201. Accepted: 03/02/2020 Copyright © 2019 Regina Varnienė-Janssen, Albertas Šermokas. Published by Vilnius University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Informacijos mokslai ISSN 1392-0561 eISSN 1392-1487 2020, vol. 88, pp. 66–82 DOI: https://doi.org/10.15388/Im.2020.88.32


Introduction
Web technologies are the key for the implementing and ensuring the full range of user needs in the digital age which were identified in the Recommendations 1 of the World Wide Web Consortium (W3C) as Web for All, Web on Everything, Web for Rich Interaction, Web of Data and Services, and Web of Trust. This paper analyses two aspects of the information on the Web: Web for Rich Interaction and Web of Trust.
Various cultural domains provide different sets of digital cultural content: descriptions (metadata) of fine art, archaeological and cultural heritage, archival documents, literature, natural history, geology and collections of digital objects. Currently, memory institutions seek to integrate digital information and data resources because of the well-known advantages of this integration. Such integration raises the problem of interoperability of data and information. In this paper we describe an ontology-based metadata interoperability approach. In particular, we describe the use of the CIDOC CRM ontology as a mediating schema within VEPIS. However, it is not a mere technological issue: we have to achieve the semantic interoperability as well. The Semantic Web, which is based on Semantic Web technologies (RDF, OWL, SKOS, SPARQL, etc.) provides a language that expresses both data and rules for reasoning about data. By using Web languages, such as RDF and OWL, it is possible to create semantically rich data models. Drawing 2001), we could assume that a core ontology is a key structural block essential for enabling the integration of information from diverse sources. However, the problem is that the VEPIS sources do not use any common ontology that could serve as a common ground for semantic interoperability. Within VEPIS, the role of such an ontology is performed by BAVIC, which is based on CIDOC CRM and diverse sources in such a way that relations between various entities are main carriers of semantic information and provide semantic interoperability of VEPIS objects.
With the increasing amount of data available on the Web, users need to have reliable means to obtain their provenance to decide whether they can trust information they ac-cess. The paper focuses on provenance in the digital environment, which provides critical foundation for assessing authenticity, enabling trust and allowing reproducibility. Users make trust judgments based on provenance that may or may not be explicitly offered to them. The paper analyses how provenance is managed within digital preservation and access processes within VEPIS and defines whether it meets the W3C Provenance Incubator Group's Requirements for Provenance on the Web. The paper presents the results from the research initiated at Vilnius University in 2018-2019.
Methodology of the research. We use qualitative analysis of research papers and draw on the potential of using methods of information technologies for integrating heterogenic collections of the digital culture within integrated systems and their presentation on the Semantic Web. Another methodological tool is the Requirements for Provenance on Web, which define Content Category referring to what types of information would need to be represented in a provenance record, Management Category referring to mechanisms that make provenance available and accessible within a system and Use Category implying that provenance records may need to accommodate a variety of uses as well as diverse users. The application of the Requirements for Provenance on the Web for a particular integrated system (VEPIS) contributes to the novelty of the research.

From metadata to an object-oriented approach
According to Oreste Signore 2 , data integration at a metadata level "<…> cannot exploit the full richness of possible associations among different information items. The association mechanism remains in mind of the user". Since a metadata vocabulary schema does not organise entities into a hierarchy, Oreste Signore argues that an ontology based metadata relationship is a very efficient way of integrating information. The value of ontologies, in particular CIDOC CRM, for integrating information from various domains and different data schemas is presented in papers by various authors (Crofts, Doerr & Gill, 2003;Kakali, et al.2007). According to Constancia Kakali et al. (2007), metadata are used to describe resources in terms of elements and facilitate discovery and access to information. Ontologies define entities on an abstract level with the insertion to conceptualize a domain interest. Ontologies do not provide specific elements for the description of the resource.
With the involvement of ten memory institutions within VEPIS in 2010, the number of metadata formats from source systems increased as well: at present MARC (UNIMARC, MARC 21), ESE, EAD, CDWA Lite and DC are applied as it illustrated in Fig 1. The architecture of VEPIS relies on the approach that it is more feasible to have mappings from many metadata schemas to a single a core ontology, (CIDOC CRM) than to apply mappings between numerous schemas. Although at the development phase of VEPIS in 2010-2012 , some subfields of 325 Reproduction Note were implemented to provide important information on digital objects regarding provenance, the data were modelled on the basis of the MARC 21 because their equivalents in UNIMARC at that time did not exist in the official mapping schema. Therefore the authors of this paper consider that it is appropriate to move to the formal structure of the UNIMARC/B fields and update the mapping of UNIMARC/B to CIDOC CRM and CRMdig in line with the updates published on the website of IFLA. 3 Since the resources are tightly connected, the integration of the new updates in the UNIMARC /A presented on the IFLA website 4 will ensure the presentation of more related data about the entities: one can move from a digital copy to other copies of the same publication, to other editions, to other works of the same author, to works about the author and so on. If the data is verified and supplemented with external sources and different author appellations are correctly associated with the relevant author, it is much easier for a user to be satisfied with the data. For ingest data to VEPIS central database and creating XML document instances METS 5 is used. It expresses the structure of digital library objects, associated descriptive and administrative metadata and the names and locations of files that comprise the digital object. The description of the object in METS serves as a linking element between various parts of the document and its versions. Regarding the mapping methodology, all schemas used by various partners are integrated by converting descriptive metadata to UNIMARC/B and mapping to CIDOC CRM, which functions as a universal schema allowing aggregation of digital objects and information related to them. 6 All this enables data exchange and data integration. It allows bringing together disparate data sources and combining them into a single stream of data. The event-centric core model CIDOC CRM, which contains 86 classes and 137 properties in RDF and OWL, is a major step from entity-relation to object-oriented approach. The use of CIDOC CRM instead of a set of metadata elements opens up more advanced search and resource discovery possibilities.

The BAVIC Thesaurus for integrating data about Agents, Places and Time
There have been done many attempts on unified representation of names, dates, and places, which have different meaning in a multicultural distributed environment by creating the uniform authority files, e.g. VIAF 7 , and standard vocabularies, e.g. TNG 8 , ULAN 9 , etc. Integration is often attempted at metadata level: MARC (UNIMARC,MARC 21) or DC formats. According to Martin Doerr, Jane Hunter and Carl Lagoze (2003), many metadata vocabularies are largely resource-centric, inadequately expressing entities such as people, places, ideas, and etc. For example, UNIMARC/A provides a set of properties that are associated with a primary resource, the "library object". The values of some of these properties, e.g. of "Personal name" in Field 200 in UNIMARC/A are entities themselves having poor interoperability across other systems. Martin Doerr, Jane Hunter and Carl Lagoze (2003) 10 state that "In contrast, a core ontology provides underlying formal model tools that integrate sources and perform a variety of functions". In order to provide value added services in VEPIS achieve better integration of diverse cultural sources, i.e. offer abstracted information and knowledge rather than returning documents (in the manner of most current Web search engines ), the CIDOC CRM based BAVIC 11 has been developed. BAVIC is based on the CIDOC CRM data model and is compatible with the VEPIS data structure. It brings together authority records created for diverse cultural domains. In this stage of the development, BAVIC was built automatically by making use of possibilities of information technologies to relate and identify information about personal names, geographical names and chronological data from diverse cultural sectors. The figure illustrates description of a manuscript, which was created by person Stepon Batory (E82.1) in the role of Author (P14.1) and which has type King of Poland and Grand Duke of Lithuania (P2). CRMdig is used for describing all stages of the creation of the content of digital cultural heritage. It is based on events that relate physical objects, digital objects, actors, times and places. The figure also illustrates the events within VEPIS: class D2 Digitization Process comprises events that result in the creation instances of D9 Data Object that represent appearance and form of instance of D9 Data Object that represents appearance or form of an instance of E84 Information Carrier (manuscript). Class D9 comprises instances of D1 Digital Object. VEPIS integrates two metadata categories: authority (BAVIC) and descriptive (metadata of digital objects linked with BAVIC), thus ensuring that semantic queries and provenance metadata refer to the versions of objects as they evolve and are modified or accessed over time. In particular it provides for a representation of how one version (or parts thereof) was derived from another version. VEPIS also provides the derivation chain ("ByWhom"), which documents the history of the content information and refers to its origin or source, to any changes that may have taken place since it was originated and to those who has had custody of it since it was originated as we can see in the figure. CRMdig captures and models requirements regarding the provenance of digital objects.
For example, Fig. 3. user wishes to get the creator of the book Aitvaras ("The Kite"). In terms of UNIMARC, the user searches for records for UNIMARC/B 200 title = Aitvaras. The query is then propagated to an ontology mediator where it is transformed by using a set of mapping rules from UNIMARC/A to CIDOC into equivalent query terms of CIDOC CRM paths such as E33 (Linguistic Object-) -P102 has title( is title of) =Aitvaras-P 94 (was created by) -E65 (Creation Event) E39 (Actor) -P131 (is identified by/ identifies) -E82 12 (Actor Appelation) = Judita Vaičiūnaitė. The analysis of the research papers on provenance shows that provenance provides critical foundation for assessing authenticity, enabling trust and allowing reproducibility. It is essential for decision makers to make trust judgments about the information they use over the Semantic Web. According to the OAIS Reference Model 13 , provenance provides information about the events that occur during the lifecycle of digital objects (related to a license holder, registration and copyright). It guarantees the authenticity of the object because the customer is informed and has a certain "knowledge base" about the digital object, which may change over time. According to Factor et al. (2009), trust is a term with many definitions and uses, but in many cases establishing trust in an object or an entity involves analyzing its origin and authenticity. Trust is related to provenance because it is derived from provenance information and typically is a subjective judgment that depends on the context and use. It can be argued that provenance is a platform for trust. According to the Requirements for Provenance on Web, provenance encompasses the initial sources of information used as well as any entity and process involved in producing a result. That is why provenance data representation and management is important at any segment of the life cycle of a digital object. We analysed the provenance information within VEPIS in line with the methodology of the Requirements for Provenance on the Web, which defines Content Category referring to what types of information would need to be represented in a provenance record, Management Category referring to mechanisms that make provenance available and accessible within a system and Use Category implying that provenance records may need to accommodate a variety of uses as well as diverse users.

Provenance as a basis for authenticity of the cultural content within VEPIS: results of the research
Since VEPIS is based on the OAIS 14 reference model, PDI within the Archival Information Package provides information about events that occur during the lifecycle of digital objects (related to a license holder, registration and copyright). It guarantees the authenticity of the object. One of the most important insights embedded in the OAIS Reference Model is that the "Content Information" to be preserved by an archive is composed not only of a "set of bit sequences" (the "data object") but is also associated with sufficient preservation description information. However, Moreau (2009) draws attention to the fact that this provenance approach is inherent in closed systems and states that in the context of the Web, a broader approach is required by which chunks of provenance representation can be brought together to describe the provenance of information flowing across multiple systems 15 .
As it has been mentioned, in order to provide value added services in VEPIS and achieve better integration of sources from diverse cultural sectors, i.e. offer abstracted information and knowledge rather than returning documents (in the manner of most current Web search engines ), the CIDOC CRM based BAVIC 16 was developed. The research by authors of this paper was intended to ascertain whether the provenance information at VEPIS ensures the authenticity of the objects at a sufficient level and how the information of provenance is represented on the Semantic Web.

Content of provenance data within VEPIS
According to the Requirements for Provenance on the Web, the first Category of provenance data is Content, which refers to the structure and meaning of provenance records. Since VEPIS is based on CIDOC CRM, (ISO 21127) and CRMdig, Content Category within this System is also based on this ontology.  According to the Requirements for Provenance on the Web, the first dimension of Content Category is Object: Content Category is Object: any statements about provenance and a possibility to refer to it. Object on the Web is a resource, essentially E73 Information Object (CIDOC CRM) and D1 Digital Object CRMdig), a subclass of E73, which can be identified with an URI (PURL in VEPIS). The second dimension is Attribution. In VEPIS, it is people, organizations, and other identifiable groups that contributed to the creation of the digital artefact: E21 Person, E74 Group (CIDOC CRM) and D21 Person Name (CRMdig.).These attributes are directly related with entities of BAVIC (personal names, geographical names and chronological data) and their metadata that describe these objects and extend the information about the metadata of VEPIS objects and play an important role in the search for objects on the Semantic Web.
According to the Requirements for Provenance on the Web, a special status should be attributed to the dimension Versioning in a provenance representation. It can be often difficult to understand whether a resource has changed its version because the representations of resources may differ but the underlying resource should be constant.
Justification is another dimension of Content Category. According to the W3C Provenance Working Group, it is the justification of decisions, which means why and how a particular decision is made. The purpose of justification is to allow those decisions to be discussed and understood. Some provenance information may be directly asserted by the relevant sources of some data or actors in a process, while other information may be derived from that which was asserted.
The dimension Entailment of Content Category represents explanations that show how facts were derived from other facts. Some provenance information may be directly asserted by relevant sources of some data or actors in a process, while other information may be derived from that which was asserted. A standard way for implementing Versioning, Justification and Entailment within VEPIS is the realization of the following components: Format Conversion, Data Verification and Logging Events (Varnienė-Janssen, R. and Šermokas A., 2018).

Management of provenance data within VEPIS
According to the Requirements for Provenance on the Web, the second Category of provenance data is Management Category referring to mechanisms that make provenance available and accessible within a system.  The ability to find the provenance for a particular artefact Access is realized via the portal http://www.epaveldas.lt and automatic data import using the OAI-PHM protocol Dissemination Defining how provenance should be distributed and controlled Dissemination: BAVIC and metadata of digital objects are based on the CI-DOC CRM and CRMdig and are in the RDF form in line with the XML schema, thus ensuring provenance related query services Scale Dealing with large amounts of provenance

Scale within VEPIS has been only partially realized
Publication within VEPIS is realized by the component Publication and Access. The portal's interface has all the accessibility features according to the recommendations of the European Union's WAI (Web Accessibility Initiative) and is intuitive, understandable and easy for users and is realized via portal http://www.epaveldas.lt. Another way to access is automatic data import via the OAI-PHM protocol. The search of the provenance information is based on CRMdig.
Dissemination: BAVIC and metadata of digital objects are based on CIDOC CRM and CRMdig and are in the RDF form in line with the XML schema, thus ensuring provenancerelated query services by providing data about the creator of the object, the earlier versions of the item, events that changed the custody of the item, input that influenced the result, the master version of the object and the scanner/resolution of the digital object (see Fig. 3).
Scale within VEPIS has been only partially realized. BAVIC ensures formulation of queries and organizing search results and permits obtaining information about the object from all the VEPIS partners independent of media types within VEPIS. However, it does not guarantee access to information about the investigation of the object that has been carried out or its results across numerous repositories.
As the example of the realization of Management of provenance data is presented in the Fig.3 Figure 3 illustrates queries within VEPIS, which are realized as follows: Get the creator of the object -Get the earlier versions of the item -Get the events that changed the custody of the item -Get the master version of the object -Get the scanner/resolution of the digital object -Get access to the object. On the other hand, the thesaurus BAVIC, which serves as a framework for semantic search, has to be extended by semantic relationship between the entities in order to improve searching on the Semantic Web for provenance information from heterogeneous data repositories. This issue will be solved during later stages after evaluating the results of the integration of BAVIC and VEPIS data.

Use of provenance data within VEPIS
According to the Requirements for Provenance on the Web, the third Use Category of provenance data implying that provenance records may need to accommodate a variety of uses as well as diverse users. An important consideration is how to make provenance information understandable for its users as well as provide appropriate presentation and visualization, compare artefacts according to their origin, imperfections, trust and interoperability.

Interoperability
Combining provenance produced by multiple different systems We could refer to interoperability only in the sense that VEPIS aggregates data from diverse systems and all descriptive information is converted into UNIMARC including provenance data (however, it is not interoperable as regards search) Comparison Comparing artefacts through their provenance.

Accountability
Using provenance to assign credit or blame.

Trust
Using provenance to make trust judgments There are several components for understanding the provenance and validating the authenticity of a preserved data object and within VEPIS: Component of Metadata Verification, which ensures control of metadata loaded into VEPIS in line with the requirements for quality, comprehensiveness and excellence of data, and Component of Logging Events, which tracks the import of digitized objects from VEPIS data providers and systems supporting the OAI-PMH protocol and verifies whether information about digitized objects satisfy / do not satisfy the requirements for quality, comprehensiveness and excellence of data. However, we have to admit that we could refer to interoperability only in the sense that VEPIS aggregates data from diverse systems and all descriptive information is converted into UNIMARC including provenance data (but it is not interoperable as regards search).
According to Moreau (2009), a powerful argument for provenance is that it can help make systems transparent so that it becomes possible to determine whether a particular use of information is appropriate under a set of rules. Such capability helps make systems and information accountable. Our analysis showed that the dimensions Comparison and Accountability were not implemented within VEPIS. For to this reason, VEPIS does not support the possibility to compare artefacts through their provenance and assign credit or blame. Debugging is realized within VEPIS by Component of Metadata Verification and Component of Logging Events.

Summarizing conclusions
1. Semantic interoperability of metadata and data within the cultural domain is one of main issues within integrated systems. In our attempt to accomplish the goal of the research, we analysed the role of CIDOC CRM as a mediating tool for integrating metadata represented in different schemas from various cultural domains of Lithuania. The authors of this paper consider that it is appropriate to update the mapping of UNIMARC/B and UNIMARC/A to CIDOC CRM and CRMdig within VEPIS in line with the updates published on the website of IFLA. 2. The creation of the BAVIC Thesaurus encompassing personal names, geographical names and chronological data from diverse cultural domains by applying methods of information technologies answered its purpose as it allowed providing more semantic links and better interoperability of VEPIS objects and using these links for searching and presentation of data. Furthermore, we propose that the BAVIC Thesaurus data structure be further developed by extending semantic relationships between entities and improving representation and management of entities of authority records on the national level and drawing on the information from international thesauri of similar nature. 3. The qualitative analysis of the Requirements for Provenance on the Web and the specification of VEPIS and its services, allowed us to conclude that VEPIS, which is based on CIDOC CRM, CRMdig, RDF the OAIS Reference Model as well as on Component of Metadata Verification, Component of Logging Events and Component of Publication and Access, meets the main Requirements for Provenance on the Web as it supports the following functionality:  Provides support for three major categories of provenance: the content of provenance information, the management of provenance as it exists on the Web, and the use of provenance.  Provides metadata and context of the digitization process referring to the master version and derivation chain. All this creates trustworthy provenance information and provides access to it by using open protocols.
 The portal www.epaveldas.lt allows querying the most relevant facts and retrieving complete descriptions encoded in this model by generic CIDOC CRM terms without the need to refer to its specific properties. The user has the possibility to identify the creator of the object, earlier versions of the item, the events that changed the custody of the item as well as to find out how results were derived (what input influenced the result), identify the master version of the object and the scanner / resolution of the digital object and information about access of the resource. We can conclude that VEPIS satisfies the requirement defined by the W3C Provenance Incubator Group that provenance on the web should include information about the creation and publication of Web resources and information about access of those resources as well as activities related to their discussion, linking and reuse.

Future developments
This research has identified key directions for the development of VEPIS regarding the provenance in order to ensure the representation and exploiting provenance information on the Web. Interoperability. At present VEPIS aggregates data from diverse systems and all descriptive information is converted into UNIMARC including provenance data and mapped to CIDOC CRM; however, it is not interoperable as regards search. In order to meet the Requirements for Provenance on the Web, the BAVIC Thesaurus, which serves as a framework for semantic search, has to be extended by entities and their semantic relationship in order to improve searching on the Semantic Web for provenance information from heterogeneous data repositories.
Accountability. In order to meet the Requirements for Provenance on the Web regarding accountability, new services and functions need to be established so that the possibility to compare artefacts through their provenance and assignment of credit or blame could be exploited.
Representation of metadata. The extensiveness of the metadata has a profound impact on the reliability of information. It is, therefore, very important to harmonize descriptive metadata regarding the provenance information of VEPIS objects.
Representation of data. In order to achieve utmost conformance of VEPIS to the Requirements for Provenance on the Web, it is essential to warrant the coordination of activities of all institutions related to VEPIS, ensure extensiveness of metadata and their conformance to uniform requirements and supplement the database of BAVIC with authority files and provide monitoring of these data.