##plugins.themes.bootstrap3.article.main##

Performance metrics give us an indication of which model is better for which task. Researchers attempt to apply machine learning and deep learning models to measure the performance of models through cost function or evaluation criteria like Mean square error (MSE) for regression, accuracy, and f1-score for classification tasks Whereas in NLP performance measurement is a complex due variation of ground truth and results obta.

Downloads

Download data is not yet available.

References

  1. Satanjeev Banerjee and Alon Lavie. 2005. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pages 65?72.
     Google Scholar
  2. George Doddington. 2002. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the second international conference on Human Language Technology Research, pages 138?145.
     Google Scholar
  3. Morgan Kaufmann Publishers Inc. Hideki Isozaki, Tsutomu Hirao, Kevin Duh, Katsuhito Sudoh, and Hajime Tsukada. 2010. Automatic evaluation of translation quality for distant language pairs. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 944?952. Association for Computational Linguistics.
     Google Scholar
  4. Guillaume Klein, Yoon Kim, Yuntian Deng, Vincent Nguyen, Jean Senellart, and Alexander M. Rush. 2018. Opennmt: Neural machine translation toolkit.
     Google Scholar
  5. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. Text Summarization Branches Out.
     Google Scholar
  6. Nitin Madnani. 2011. ibleu: Interactively debugging and scoring statistical machine translation systems. In 2011 IEEE Fifth International Conference on Semantic Computing pages 213?214. IEEE.
     Google Scholar
  7. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311?318. Association for Computational Linguistics.
     Google Scholar
  8. Maja Popovic. 2015. chrf: character n-gram f-score ? for automatic mt evaluation. In Proceedings of the Tenth Workshop on Statistical Machine Translation, pages 392?395.
     Google Scholar
  9. Matthew Snover, Bonnie Dorr, Richard Schwartz, Linnea Micciulla, and John Makhoul. 2006. A study of translation edit rate with targeted human annotation. In Proceedings of the association for machine translation in the Americas, volume 200.
     Google Scholar
  10. Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. 2016. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.
     Google Scholar
  11. Qi Ye, Sachan Devendra, Felix Matthieu, Padmanabhan Sarguna, and Neubig Graham. 2018. When and why are pre-trained word embeddings useful for neural machine translation. In HLT-NAACL.
     Google Scholar
  12. Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675.
     Google Scholar