##plugins.themes.bootstrap3.article.main##

Extensible Markup Language (XML) is emerging as the primary standard for representing and exchanging data, with more than 60% of the total; XML considered the most dominant document type over the web; nevertheless, their quality is not as expected. XML integrity constraint especially XFD plays an important role in keeping the XML dataset as consistent as possible, but their ability to solve data quality issues is still intangible. The main reason is that old-fashioned data dependencies were basically introduced to maintain the consistency of the schema rather than that of the data. The purpose of this study is to introduce a method for discovering pattern tableaus for XML conditional dependencies to be used for enhancing XML document consistency as a part of data quality improvement phases. The notations of the conditional dependencies as new rules are designed mainly for improving data instance and extended traditional XML dependencies by enforcing pattern tableaus of semantically related constants. Subsequent to this, a set of minimal approximate conditional dependencies (XCFD, XCIND) is discovered and learned from the XML tree using a set of mining algorithms. The discovered patterns can be used as a Master data in order to detect inconsistencies that don’t respect the majority of the dataset.

Downloads

Download data is not yet available.

References

  1. S. Juddoo, ?Overview of data quality challenges in the context of Big Data,? in 2015 International Conference on Computing, Communication and Security, ICCCS 2015, 2016, pp. 1?9.
     Google Scholar
  2. L. Bedgood, ?How Much is Dirty Data Costing You?,? 2015. [Online]. Available: https://www.linkedin.com/pulse/how-much-dirty-data-costing-you-larisa-bedgood/. [Accessed: 16-Jan-2016].
     Google Scholar
  3. N. Laranjeiro, S. N. Soydemir, and J. Bernardino, ?A Survey on Data Quality: Classifying Poor Data,? in Proceedings - 2015 IEEE 21st Pacific Rim International Symposium on Dependable Computing, PRDC 2015, 2016, vol. 10, no. November, pp. 179?188.
     Google Scholar
  4. L. Li, ?Data quality and data cleaning in database applications,? vol. U639248, no. September, p. 1, 2012.
     Google Scholar
  5. S. Grijzenhout and M. Marx, ?The quality of the XML Web,? J. Web Semant., vol. 19, pp. 59?68, 2013.
     Google Scholar
  6. W. Fan, F. Geerts, and X. Jia, ?A revival of integrity constraints for data cleaning,? Proc. VLDB Endow., vol. 1, no. 2, pp. 1522?1523, 2008.
     Google Scholar
  7. M. ?virec and I. Ml?nkov?, ?Efficient Detection of XML Integrity Constraints Violation,? Commun. Comput. Inf. Sci., vol. 293 PART 1, pp. 259?273, 2012.
     Google Scholar
  8. H. Hamrouni, Z. Brahmia, and R. Bouaziz, ?An Efficient Approach for Detecting and Repairing Data Inconsistencies Resulting from Retroactive Updates in Multi-temporal and Multi-version XML Databases,? in Advances in Intelligent Systems and Computing, vol. 312, Cham: Springer, 2015, pp. 135?146.
     Google Scholar
  9. Z. Tan and L. Zhang, ?Improving XML Data Quality with Functional Dependencies,? in International Conference on Database Systems for Advanced Applications, 2011, no. 60603043, pp. 450?465.
     Google Scholar
  10. Z. Tan, L. Zhang, W. Wang, and B. Shi, ?XML data exchange with target constraints,? Inf. Process. Manag., vol. 49, no. 2, pp. 465?483, 2013.
     Google Scholar
  11. M. Hakawati, Y. Yacob, R. A. A. Raof, A. Amir, J. M. Mohammed, and E. S. Al-Hodiani, ?Conditional inclusion dependencies for improving xml data consistency,? J. Theor. Appl. Inf. Technol., vol. 95, no. 17, pp. 4221?4235, 2017.
     Google Scholar
  12. L. T. H. Vo, J. Cao, and W. Rahayu, ?Discovering conditional functional dependencies in XML data,? in Proceedings of the Twenty-Second Australasian Database Conference-Volume 115, 2011, vol. 115, no. 5, pp. 143?152.
     Google Scholar
  13. M. Hakawati, P. Saad, N. Sabri, Y. Yacob, R. B. Ahmad, and M. S. Salim, ?XML integrity constraints, What?s next?,? J. Theor. Appl. Inf. Technol., vol. 92, no. 2, pp. 365?371, 2016.
     Google Scholar
  14. P. Bohannon, W. Fan, F. Geerts, X. Jia, and A. Kementsietsidis, ?Conditional functional dependencies for data cleaning,? Proc. - Int. Conf. Data Eng., vol. 33, no. 2, pp. 746?755, 2007.
     Google Scholar
  15. W. Fan, L. Bravo, and S. Ma, ?Extending Dependencies with Conditions,? Constraints, pp. 243?254, 2007.
     Google Scholar
  16. W. Fan and F. Geerts, ?Foundations of Data Quality Management,? Synth. Lect. Data Manag., vol. 4, no. 5, pp. 1?217, 2012.
     Google Scholar
  17. M. W. Vincent, J. Liu, and M. Mohania, ?On the equivalence between FDs in XML and FDs in relations,? Acta Inform., vol. 44, no. 3?4, pp. 207?247, 2007.
     Google Scholar
  18. M. Karlinger, M. Vincent, and M. Schrefl, ?Inclusion dependencies in XML: Extending relational semantics,? Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 5690 LNCS, no. 09, pp. 23?37, 2009.
     Google Scholar
  19. S. Fajt, I. Mlynkova, and M. Necasky, ?On mining XML integrity constraints,? in 2011 Sixth International Conference on Digital Information Management, 2011, pp. 23?29.
     Google Scholar
  20. C. Yu and H. V. Jagadish, ?XML schema refinement through redundancy detection and normalization,? VLDB J., vol. 17, no. 2, pp. 203?223, 2008.
     Google Scholar
  21. M. Arenas, ?Normalization Theory for XML,? vol. 35, no. 4, pp. 57?64, 2006.
     Google Scholar


Most read articles by the same author(s)