Discovering XML Conditional Dependencies for Data Quality Issues
##plugins.themes.bootstrap3.article.main##
Extensible Markup Language (XML) is emerging as the primary standard for representing and exchanging data, with more than 60% of the total; XML considered the most dominant document type over the web; nevertheless, their quality is not as expected. XML integrity constraint especially XFD plays an important role in keeping the XML dataset as consistent as possible, but their ability to solve data quality issues is still intangible. The main reason is that old-fashioned data dependencies were basically introduced to maintain the consistency of the schema rather than that of the data. The purpose of this study is to introduce a method for discovering pattern tableaus for XML conditional dependencies to be used for enhancing XML document consistency as a part of data quality improvement phases. The notations of the conditional dependencies as new rules are designed mainly for improving data instance and extended traditional XML dependencies by enforcing pattern tableaus of semantically related constants. Subsequent to this, a set of minimal approximate conditional dependencies (XCFD, XCIND) is discovered and learned from the XML tree using a set of mining algorithms. The discovered patterns can be used as a Master data in order to detect inconsistencies that don’t respect the majority of the dataset.
Downloads
References
-
S. Juddoo, ?Overview of data quality challenges in the context of Big Data,? in 2015 International Conference on Computing, Communication and Security, ICCCS 2015, 2016, pp. 1?9.
Google Scholar
1
-
L. Bedgood, ?How Much is Dirty Data Costing You?,? 2015. [Online]. Available: https://www.linkedin.com/pulse/how-much-dirty-data-costing-you-larisa-bedgood/. [Accessed: 16-Jan-2016].
Google Scholar
2
-
N. Laranjeiro, S. N. Soydemir, and J. Bernardino, ?A Survey on Data Quality: Classifying Poor Data,? in Proceedings - 2015 IEEE 21st Pacific Rim International Symposium on Dependable Computing, PRDC 2015, 2016, vol. 10, no. November, pp. 179?188.
Google Scholar
3
-
L. Li, ?Data quality and data cleaning in database applications,? vol. U639248, no. September, p. 1, 2012.
Google Scholar
4
-
S. Grijzenhout and M. Marx, ?The quality of the XML Web,? J. Web Semant., vol. 19, pp. 59?68, 2013.
Google Scholar
5
-
W. Fan, F. Geerts, and X. Jia, ?A revival of integrity constraints for data cleaning,? Proc. VLDB Endow., vol. 1, no. 2, pp. 1522?1523, 2008.
Google Scholar
6
-
M. ?virec and I. Ml?nkov?, ?Efficient Detection of XML Integrity Constraints Violation,? Commun. Comput. Inf. Sci., vol. 293 PART 1, pp. 259?273, 2012.
Google Scholar
7
-
H. Hamrouni, Z. Brahmia, and R. Bouaziz, ?An Efficient Approach for Detecting and Repairing Data Inconsistencies Resulting from Retroactive Updates in Multi-temporal and Multi-version XML Databases,? in Advances in Intelligent Systems and Computing, vol. 312, Cham: Springer, 2015, pp. 135?146.
Google Scholar
8
-
Z. Tan and L. Zhang, ?Improving XML Data Quality with Functional Dependencies,? in International Conference on Database Systems for Advanced Applications, 2011, no. 60603043, pp. 450?465.
Google Scholar
9
-
Z. Tan, L. Zhang, W. Wang, and B. Shi, ?XML data exchange with target constraints,? Inf. Process. Manag., vol. 49, no. 2, pp. 465?483, 2013.
Google Scholar
10
-
M. Hakawati, Y. Yacob, R. A. A. Raof, A. Amir, J. M. Mohammed, and E. S. Al-Hodiani, ?Conditional inclusion dependencies for improving xml data consistency,? J. Theor. Appl. Inf. Technol., vol. 95, no. 17, pp. 4221?4235, 2017.
Google Scholar
11
-
L. T. H. Vo, J. Cao, and W. Rahayu, ?Discovering conditional functional dependencies in XML data,? in Proceedings of the Twenty-Second Australasian Database Conference-Volume 115, 2011, vol. 115, no. 5, pp. 143?152.
Google Scholar
12
-
M. Hakawati, P. Saad, N. Sabri, Y. Yacob, R. B. Ahmad, and M. S. Salim, ?XML integrity constraints, What?s next?,? J. Theor. Appl. Inf. Technol., vol. 92, no. 2, pp. 365?371, 2016.
Google Scholar
13
-
P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis, ?Conditional functional dependencies for data cleaning,? Proc. - Int. Conf. Data Eng., vol. 33, no. 2, pp. 746?755, 2007.
Google Scholar
14
-
W. Fan, L. Bravo, and S. Ma, ?Extending Dependencies with Conditions,? Constraints, pp. 243?254, 2007.
Google Scholar
15
-
W. Fan and F. Geerts, ?Foundations of Data Quality Management,? Synth. Lect. Data Manag., vol. 4, no. 5, pp. 1?217, 2012.
Google Scholar
16
-
M. W. Vincent, J. Liu, and M. Mohania, ?On the equivalence between FDs in XML and FDs in relations,? Acta Inform., vol. 44, no. 3?4, pp. 207?247, 2007.
Google Scholar
17
-
M. Karlinger, M. Vincent, and M. Schrefl, ?Inclusion dependencies in XML: Extending relational semantics,? Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 5690 LNCS, no. 09, pp. 23?37, 2009.
Google Scholar
18
-
S. Fajt, I. Mlynkova, and M. Necasky, ?On mining XML integrity constraints,? in 2011 Sixth International Conference on Digital Information Management, 2011, pp. 23?29.
Google Scholar
19
-
C. Yu and H. V. Jagadish, ?XML schema refinement through redundancy detection and normalization,? VLDB J., vol. 17, no. 2, pp. 203?223, 2008.
Google Scholar
20
-
M. Arenas, ?Normalization Theory for XML,? vol. 35, no. 4, pp. 57?64, 2006.
Google Scholar
21
Most read articles by the same author(s)
-
Mohammed Ragheb Hakawati,
Yasmin Yacob,
Rafikha Aliana A. Raof,
Mustafa M.Khalifa Jabiry,
Eiad Syaf Alhudiani,
Data Cleaning Model for XML Datasets using Conditional Dependencies , European Journal of Electrical Engineering and Computer Science: Vol. 4 No. 1 (2020)