An Algorithm for Data Extraction from Web Pages Based on Data Similarities
Technological Sciences
Kiril Griavev
Simona Ramanauskaitė
Published 2017-07-03
https://doi.org/10.21277/jmd.v47i1.88
pdf

Keywords

Data Extraction
Data Parsing
Data Similarity

How to Cite

Griavev, K. and Ramanauskaitė, S. (2017) “An Algorithm for Data Extraction from Web Pages Based on Data Similarities”, Jaunųjų mokslininkų darbai, 47(1), pp. 73–79. doi:10.21277/jmd.v47i1.88.

Abstract

Problems with data extraction from web pages were analysed, a proposed solution is provided in the paper. Analysis showed that data-based algorithms are more popular than path-based data extraction. We propose a new data retrieval algorithm based on web page data similarity to controlled data.
The efficiency of the proposed data retrieval algorithm was applied to the retrieval of currency exchange rates data, the efficiency of this algorithm prototype was evaluated by comparing it to other products. Research showed that the proposed data retrieval algorithm, although more suitable for the retrieval of constantly changing data and requires controlled data, is more efficient than other similar products.

pdf

Downloads

Download data is not yet available.