Mahdi Jafari

Mahdi Jafari

Msc Student


Thesis title:

Semantic based data cleaning to detect duplication via ontology


Thesis abstract:

Today, in the age of digital information, data plays a very key role in business and in everyday human life in general. Nowadays, the expansion of the use of the Internet has confronted us with huge data sources such as IoT devices, social networks, medical systems, etc., which are constantly producing a huge flood of Are data. Collecting and maintaining this data is valuable when it can be used as useful information to improve the current trend. For this purpose, data analysis technologies are introduced. In addition, an efficient and, more importantly, reliable data analysis can be performed when we have quality data. This is why the issue of data quality is so important. Data quality is one of the key factors in the success or failure of data-based systems.


In order to improve the quality of data, various processes must be performed for each of the data problems in order to obtain quality data. Duplicate data considered as one of the main quality problems. Semantic duplicate data as the most complex type of duplicate data that requires the machine to understand the meaning of the data, our main goal. For this purpose, we will try to use the ontology.

Thesis abstract:

Mahdi Jafary