Large-scale Text-based Video Classification using Contextual Features


  •   Zein Al Abidin Ibrahim

  •   Siba Haidar

  •   Ihab Sbeity


The production of video has increased and expanded dramatically. There is a need to reach accurate video classification. In our work, we use deep learning as a mean to accelerate the video retrieval task by classifying them into categories. We classify a video depending on the text extracted from it. We trained our model using fastText, a library for efficient text classification and representation learning, and tested our model on 15000 videos. Experimental results show that our approach is efficient and has good performance. Our technique can be used on huge datasets. It produces a model that can be used to classify any video into a specific category very quickly.

Keywords: Deep learning, Text-based Video Classification, Contextual Information, fastText


M. Darji and D. Mathpal, “A review of video classification techniques,” IRJET Journal, vol. 4. no. 6, June 2017.

G. Kaur and P. Kaur, “Review on text classification by NLP approaches with machine learning and data mining approaches,” IJARIIT Journal, vol. 3, no. 4, pp. 767-771, 2017.

S. Parameswaran and D. Joseph, “A review of machine learning techniques used for video classification,” IJCESR Journal, vol. 4, no. 12, pp. 64-69, 2017.

J. Pennington, R. Socher, and C. Manning, “Glove: Global vectors for word representation,” Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543, Doha, Qatar, 2014.

T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” Proceedings of the International Conference on Learning Representations, January 2013.

P. Bojanowski, E. Grave, A. Joulin, and T. Mikolov, “Enriching word vectors with subword information,” arXiv preprint arXiv:1607.04606 [Online]. Available:, 2016.

B. Cui, C. Zhang, and G. Cong, “Content-enriched classifier for Web video classification,” Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 619–626, USA, 2010.

S. Schmiedeke, P. Xu, I. Ferrané, M. Eskevich, C. Kofler, M. Larson, Y. Estève, L. Lamel, G. Jones, and T. Sikora, “Blip10000: A social video dataset containing SPUG content for tagging and retrieval,” ACM Multimedia Systems Conference, Oslo, Norway, 2013.

L. Yang, J. Liu, X. Yang, and X.-S. Hua, “Multi-modality web video categorization,” Proceedings of the International Workshop on Workshop on Multimedia Information Retrieval, pp. 265–274, NY, USA, September 2007.

J. R. Zhang, Y. Song, and T. Leung, “Improving video classification via YouTube video co-watch data,” Proceedings of the 2011 ACM workshop on Social and behavioral networked media access - SBNMA ’11, pp. 21-26, Arizona, USA, December 2011.

W.-H. Lin, and A. Hauptmann, “News video classification using SVM-based multimodal classifiers and combination strategies,” Proceedings of the tenth ACM international conference on Multimedia, pp. 323-326, NY, USA, December 2002.

“Related Words - Find Words Related to Another Word,” [Online]. Available:

B. Ionescu, I. Mironica, K. Seyerlehner, P. Knees, J. Schluter, M. Schedl, C. Horia, A. Buzo, and P. Lambert, “ARF @ MediaEval 2012: Multimodal Video Classification,” Proceedings of the MediaEval 2012 Workshop, Pisa, Italy.

Semela Tomas, Tapaswi Makarand, Ekenel Hazim Kemal, and Stiefelhagen Rainer, “KIT @ MediaEval 2012: Content-based Genre Classification using Visual Cues,” Proceedings of the MediaEval 2012 Workshop, Pisa, Italy, 2012.

S. Schmiedeke, P. Kelm, and T. Sikora, “TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visual)-Words Approaches,” Proceedings of the MediaEval 2012 Workshop, Pisa, Italy, 2012.

Y. Shi, M. Larson, P. Wiggers, and C. Jonker, “MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion Networks,” Proceedings of the MediaEval 2012 Workshop, Pisa, Italy, 2012.

P. Xu, Y. Shi, and M. Larson, “TUD @ MediaEval 2012 Genre Tagging Task: Multi-modality Video Categorization with one-vs-all Classifiers,” Proceedings of the MediaEval 2012 Workshop, Pisa, Italy, 2012.

J. Almeida, T. Salles, E. Martins, O. Penatti, R. Torres, M. Goncalves, and J. Almeida, “UNICAMP-UFMG @ MediaEval 2012: Genre Tagging Task,” Proceedings of the MediaEval 2012 Workshop, Pisa, Italy, 2012.


Download data is not yet available.


How to Cite
Ibrahim, Z.A.A., Haidar, S. and Sbeity, I. 2019. Large-scale Text-based Video Classification using Contextual Features. European Journal of Electrical Engineering and Computer Science. 3, 2 (Apr. 2019). DOI: