Duomenų išrinkimo interneto puslapiuose algoritmas, paremtas duomenų tarpusavio panašumu

Kiril Griavev; Simona Ramanauskaitė

doi:10.21277/jmd.v47i1.88

Technological Sciences

Kiril Griavev

Simona Ramanauskaitė

Published 2017-07-03

https://doi.org/10.21277/jmd.v47i1.88

pdf

Keywords

Data Extraction
Data Parsing
Data Similarity

How to Cite

Griavev, K. and Ramanauskaitė, S. (2017) “An Algorithm for Data Extraction from Web Pages Based on Data Similarities”, Jaunųjų mokslininkų darbai, 47(1), pp. 73–79. doi:10.21277/jmd.v47i1.88.

Download Citation

Abstract

Problems with data extraction from web pages were analysed, a proposed solution is provided in the paper. Analysis showed that data-based algorithms are more popular than path-based data extraction. We propose a new data retrieval algorithm based on web page data similarity to controlled data.
The efficiency of the proposed data retrieval algorithm was applied to the retrieval of currency exchange rates data, the efficiency of this algorithm prototype was evaluated by comparing it to other products. Research showed that the proposed data retrieval algorithm, although more suitable for the retrieval of constantly changing data and requires controlled data, is more efficient than other similar products.

pdf

References

Downloads

Download data is not yet available.

Most read articles by the same author(s)

Rolandas Terminas, Simona Ramanauskaitė, Method for Document Management Process Mining , Jaunųjų mokslininkų darbai: Vol. 50 No. 1 (2020): Journal of Young Scientists
Eligijus Andriulionis, Simona Ramanauskaitė, Tatjana Balvočienė, Assessing Vulnerability of Students’ Programming Projects: Application of Testing Tools and Estimation of Checklist Effect on Code Quality , Jaunųjų mokslininkų darbai: Vol. 55 (2025): Journal of Young Scientists
Karolis Kiaunė, Simona Ramanauskaitė, Classification of the Lithuanian Text of Email Enquieries of an Insurance Company with a Big Number of Customer Categories , Jaunųjų mokslininkų darbai: Vol. 49 No. 2 (2019): Journal of Young Scientists
Simona Ramanauskaitė, Kiril Griazev, Similarity Estimation for HTML Code Blocks , Jaunųjų mokslininkų darbai: Vol. 48 No. 1 (2018)