IMPLEMENTATION OF HIDDEN MARKOV MODEL (HMM) FOR PARTS OF SPEECH TAGGING IN TELUGU LANGUAGE
Keywords:
Telugu, Parts-of-speech tagger, corpus, TDIL proposed Telugu tag set, HMM techniqueAbstract
All NLP applications have fundamental task of POS(Parts of Speech) Tagging. Like Grammar Checking, Speech processing, Machine translation etc. that assign the correct tag to the word for a number of available tags. The accuracy of a tagger is the biggest challenge today. A lot of taggers have been proposed by different Researchers for the different languages (Telugu, Tamil, Kannada, Punjabi, Hindi, Bengali etc.) using different techniques like HMM (Hidden Markov Model), SVM (Support Vector Machine), ME (Maximum Entropy) etc. A Telugu POS tagger based on HMM model is one of them. This tagger uses Hidden Markov Model., a statistical technique to accurately tag the words in Telugu language using 630 tags developed by Rama Sree, R.J, Kusuma Kumari,2007.This large tag set (630 tags)results in data sparseness problem. Finally the result has been manually evaluated from a linguistic person. To cope up with this problem, in this research paper an experiment with reduced POS Tag set (36 tags) proposed by Technical Development of Indian Languages (TDIL) has been used to improve the tagging accuracy of HMM based POS Tagger
References
Ahmed, Raju S.B, Chandrasekhar Pammi V. S.,Prasad M.K (2002), “Application of multilayerperceptron network for tagging parts-of- speech”, Proceedings of the Language EngineeringConference, IEEE.
RamaSree, R. J., and P. Kusuma Kumari.2007. Combining POS taggers for improved accuracy to create telugu annotated texts for information retrieval. Dept. of Telugu Studies, Tirupathi, India (2007).
Krishnapriya, V., Sreesha, P., Harithalakshmi, T. R., Archana, T. C., & Vettath, J. N. 2014. Design of a POS tagger using conditional random fields for Malayalam. IEEE 2014 First International Conference on Computational Systems and Communications (ICCSC), pp. 370-373
Sharma S.K, Lehal G.S (2011) “Using HMM to Improve accuracy of Punjabi POS tagger” 2011 IEEE International Conference on computer science and Automation Engineering. Shanghai (China)
Kumar, D., & Josan, G. S. 2010. Part of speech taggers for morphologically rich Indian languages: a survey. International Journal of Computer Applications (0975-8887) Volume, 1-9.
Reddy, S., & Sharoff, S. 2011. Cross language POS taggers (and other tools) for Indian languages: An experiment with Kannada using Telugu resources. Cross Lingual Information Access, p. 11.
AniketDalal, Kumar Nagaraj, Sawant Uma,ShelkeSandeep (2006), “Hindi Part-of-SpeechTagging and Chunking: A Maximum Entropy Approach” Proceedings of the NLPAI MLcontestworkshop, National Workshop on Artificial Intelligence.
Ankur Parikh (2009), “Part-Of-Speech Tagging usingNeural network”, Proceedings of ICON-2009: 7th International Conference on Natural Language Processing.
Manju, K., Soumya, S., & Idicula, S. M. 2009. Development of a POS tagger for Malayalam-an experience. IEEE International Conference on Advances in Recent Technologies in Communication and Computing, 2009. ARTCom'09. pp. 709-713.
Antony P.J, Mohan S. P., Soman K.P (2010), “SVM Based Part of Speech Tagger for Malayalam”, Proceedings of 2010 International Conference on Recent Trends in Information, Telecommunication and Computing, IEEE.
AnupamBasu, Ray, RanjanPradipta, Harish V.and SarkarSudeshna(2003), ”Part of speech tagging and local word grouping techniques fornatural language parsing in Hindi”, Proceedings of the International Conference on Natural Language Processing (ICON 2003).
Arulmozhi.P, L Sobha (2006) “A Hybrid POS Tagger for a Relatively Free Word Order Language”,Proceedings of MSPIL-2006, Indian Institute of Technology, Bombay.
Avinesh PVS and GaliKarthik (2007), “Part-of-speech tagging and chunking using conditional random fields and transformation based learning”, Proceedings of the IJCAI and the Workshop On Shallow Parsing for South Asian Languages (SPSAL), pp. 21–24.
Ekbal, S. Mondal and S. Bandyopadhyay (2007). POS Tagging using HMM and Rule-based Chunking. In Proceedings of the Workshop on Shallow Parsing in South Asian Languages, International Joint Conference on Artificial Intelligence (IJCAI 2007), 6-12 January 2007, Hyderabad, India, PP. 25-28.
[16] Ekbal, R. Haque and S. Bandyopadhyay (2007), “Bengali Part of Speech Tagging using Conditional Random Field”, Proceedings of the 7th International Symposium on Natural Language Processing (SNLP-07), Thailand, pp.131-136.
Ekbal and S. Bandyopadhyay (2008), “Part of Speech Tagging in Bengali using
Support Vector Machine”,Proceedings of the International Conference on Information
Technology (ICIT 2008), pp.106-111, IEEE.
Ekbal , M. Hasanuzzaman and S. Bandyopadhyay (2009), “Voted Approach for Part of
speech Tagging in Bengali”, Proceedings of the 23rd Pacific Asia Conference on
Language, Information and Computation (PACLIC-09), December 3-5, Hong Kong, pp.
-129.
Ganesan M (2007), “Morph and POS Tagger for Tamil” (Software) Annamalai
University, Annamalai Nagar.
G.SindhiyaBinulal, Goud P. A, K.P.Soman(2009), “A SVM based approach to Telugu Parts
Of Speech Tagging using SVMTool”, International Journal of Recent Trends in
Engineering, Vol. 1, No. 2, May 2009.
HimanshuAgrawal, Mani Anirudh (2006), “Part Of Speech Tagging and Chunking Using
Conditional Random Fields” Proceedings of the NLPAI MLcontest workshop, National
Workshop on Artificial Intelligence.
Mandeep Singh Gill, Lehal G.S. (2008) “Grammer Checking System for Punjabi” Coling
:companion volume Posters and Demonstrations pages 149–152 Manchester.
Manish Shrivastava, Bhattacharyya Pushpak (2008), “Hindi POS Tagger Using Naive Stemming: Harnessing Morphological Information Without Extensive Linguistic Knowledge”, Proceedings of ICON-2008: 6th International Conference on Natural Language Processing.
Manjuk,SSoumya , Idicula S.M. (2009), “Development of A Pos Tagger for
Malayalam-An Experience”, Proceedings of 2009 International Conference on Advances in
Recent Technologies in Communication and Computing, IEEE .
NavanathSaharia, Das Dhrubajyoti, Sharma Utpal, KalitaJugal (2009), “Part of Speech Tagger for Assamese Text”, Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, Suntec, Singapore, pp. 33–36
Ratnaparkhi, A. 1996. A maximum entropy model for part-of-speech tagging. In Proceedings of the conference on empirical methods in natural language processing Vol. 1, pp. 133-142.
Additional Files
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 International Education and Research Journal (IERJ)
This work is licensed under a Creative Commons Attribution 4.0 International License.