Similarity Estimation for HTML Code Blocks

Simona  Ramanauskaitė; Kiril  Griazev

doi:10.21277/jmd.v48i1.219

Technological Sciences

Simona Ramanauskaitė

Vilnius Gediminas Technical University

Kiril Griazev

Vilnius Gediminas Technical University

Published 2018-06-25

https://doi.org/10.21277/jmd.v48i1.219

Keywords

HTML
data similarity
similarity estimation

How to Cite

Ramanauskaitė, S. and Griazev, K. (2018) “Similarity Estimation for HTML Code Blocks”, Jaunųjų mokslininkų darbai, 48(1), pp. 30–35. doi:10.21277/jmd.v48i1.219.

Download Citation

Abstract

Data mining from web pages becomes more frequently adapted in business areas. However on the one hand while analyzing the current situation, we observe that solutions for mining structured data from web pages exists. On the other hand we see that a scientific dataset for unstructured data that would allow create and test new data selection methods does not exist. This limits the development and research of unstructured web data therefore we propose a method for HTML code block similarity estimation. The method combines both data and structure comparison and allows quantitative similarity presentation of two HTML code blocks.

References

Downloads

Download data is not yet available.

Most read articles by the same author(s)

Rolandas Terminas, Simona Ramanauskaitė, Method for Document Management Process Mining , Jaunųjų mokslininkų darbai: Vol. 50 No. 1 (2020): Journal of Young Scientists
Eligijus Andriulionis, Simona Ramanauskaitė, Tatjana Balvočienė, Assessing Vulnerability of Students’ Programming Projects: Application of Testing Tools and Estimation of Checklist Effect on Code Quality , Jaunųjų mokslininkų darbai: Vol. 55 (2025): Journal of Young Scientists
Karolis Kiaunė, Simona Ramanauskaitė, Classification of the Lithuanian Text of Email Enquieries of an Insurance Company with a Big Number of Customer Categories , Jaunųjų mokslininkų darbai: Vol. 49 No. 2 (2019): Journal of Young Scientists
Kiril Griavev, Simona Ramanauskaitė, An Algorithm for Data Extraction from Web Pages Based on Data Similarities , Jaunųjų mokslininkų darbai: Vol. 47 No. 1 (2017)