Similarity Estimation for HTML Code Blocks
Technological Sciences
Simona Ramanauskaitė
Vilnius Gediminas Technical University, Lithuania
Kiril Griazev
Vilnius Gediminas Technical University
Published 2018-06-25
https://doi.org/10.21277/jmd.v48i1.219

Keywords

HTML
data similarity
similarity estimation

How to Cite

Ramanauskaitė, S. and Griazev, K. (2018) “Similarity Estimation for HTML Code Blocks”, Jaunųjų mokslininkų darbai, 48(1), pp. 30–35. doi:10.21277/jmd.v48i1.219.

Abstract

Data mining from web pages becomes more frequently adapted in business areas. However on the one hand while analyzing the current situation, we observe that solutions for mining structured data from web pages exists. On the other hand we see that a scientific dataset for unstructured data that would allow create and test new data selection methods does not exist. This limits the development and research of unstructured web data therefore we propose a method for HTML code block similarity estimation. The method combines both data and structure comparison and allows quantitative similarity presentation of two HTML code blocks.

Downloads

Download data is not yet available.