IMPLEMENTATION OF HIDDEN MARKOV MODEL (HMM) FOR PARTS OF SPEECH TAGGING IN TELUGU LANGUAGE

Authors

  • Dr. V. Suresh Assistant Professor in IT, Anil Neerukonda Institute of Technology and Sciences (ANITS), Sangivalasa, Visakhapatnam, Andhra Pradesh, India.

Keywords:

Telugu, Parts-of-speech tagger, corpus, TDIL proposed Telugu tag set, HMM technique

Abstract

All NLP applications have fundamental task of POS(Parts of Speech) Tagging. Like Grammar Checking, Speech processing, Machine translation etc. that assign the correct tag to the word for a number of available tags. The accuracy of a tagger is the biggest challenge today. A lot of taggers have been proposed by different Researchers for the different languages (Telugu, Tamil, Kannada, Punjabi, Hindi, Bengali etc.) using different techniques like HMM (Hidden Markov Model), SVM (Support Vector Machine), ME (Maximum Entropy) etc. A Telugu POS tagger based on HMM model is one of them. This tagger uses Hidden Markov Model., a statistical technique to accurately tag the words in Telugu language using 630 tags developed by Rama Sree, R.J, Kusuma Kumari,2007.This large tag set (630 tags)results in data sparseness problem. Finally the result has been manually evaluated from a linguistic person. To cope up with this problem, in this research paper an experiment with reduced POS Tag set (36 tags) proposed by Technical Development of Indian Languages (TDIL) has been used to improve the tagging accuracy of HMM based POS Tagger

References

Ahmed, Raju S.B, Chandrasekhar Pammi V. S.,Prasad M.K (2002), “Application of multilayerperceptron network for tagging parts-of- speech”, Proceedings of the Language EngineeringConference, IEEE.

RamaSree, R. J., and P. Kusuma Kumari.2007. Combining POS taggers for improved accuracy to create telugu annotated texts for information retrieval. Dept. of Telugu Studies, Tirupathi, India (2007).

Krishnapriya, V., Sreesha, P., Harithalakshmi, T. R., Archana, T. C., & Vettath, J. N. 2014. Design of a POS tagger using conditional random fields for Malayalam. IEEE 2014 First International Conference on Computational Systems and Communications (ICCSC), pp. 370-373

Sharma S.K, Lehal G.S (2011) “Using HMM to Improve accuracy of Punjabi POS tagger” 2011 IEEE International Conference on computer science and Automation Engineering. Shanghai (China)

Kumar, D., & Josan, G. S. 2010. Part of speech taggers for morphologically rich Indian languages: a survey. International Journal of Computer Applications (0975-8887) Volume, 1-9.

Reddy, S., & Sharoff, S. 2011. Cross language POS taggers (and other tools) for Indian languages: An experiment with Kannada using Telugu resources. Cross Lingual Information Access, p. 11.

AniketDalal, Kumar Nagaraj, Sawant Uma,ShelkeSandeep (2006), “Hindi Part-of-SpeechTagging and Chunking: A Maximum Entropy Approach” Proceedings of the NLPAI MLcontestworkshop, National Workshop on Artificial Intelligence.

Ankur Parikh (2009), “Part-Of-Speech Tagging usingNeural network”, Proceedings of ICON-2009: 7th International Conference on Natural Language Processing.

Manju, K., Soumya, S., & Idicula, S. M. 2009. Development of a POS tagger for Malayalam-an experience. IEEE International Conference on Advances in Recent Technologies in Communication and Computing, 2009. ARTCom'09. pp. 709-713.

Antony P.J, Mohan S. P., Soman K.P (2010), “SVM Based Part of Speech Tagger for Malayalam”, Proceedings of 2010 International Conference on Recent Trends in Information, Telecommunication and Computing, IEEE.

AnupamBasu, Ray, RanjanPradipta, Harish V.and SarkarSudeshna(2003), ”Part of speech tagging and local word grouping techniques fornatural language parsing in Hindi”, Proceedings of the International Conference on Natural Language Processing (ICON 2003).

Arulmozhi.P, L Sobha (2006) “A Hybrid POS Tagger for a Relatively Free Word Order Language”,Proceedings of MSPIL-2006, Indian Institute of Technology, Bombay.

Avinesh PVS and GaliKarthik (2007), “Part-of-speech tagging and chunking using conditional random fields and transformation based learning”, Proceedings of the IJCAI and the Workshop On Shallow Parsing for South Asian Languages (SPSAL), pp. 21–24.

Ekbal, S. Mondal and S. Bandyopadhyay (2007). POS Tagging using HMM and Rule-based Chunking. In Proceedings of the Workshop on Shallow Parsing in South Asian Languages, International Joint Conference on Artificial Intelligence (IJCAI 2007), 6-12 January 2007, Hyderabad, India, PP. 25-28.

[16] Ekbal, R. Haque and S. Bandyopadhyay (2007), “Bengali Part of Speech Tagging using Conditional Random Field”, Proceedings of the 7th International Symposium on Natural Language Processing (SNLP-07), Thailand, pp.131-136.

Ekbal and S. Bandyopadhyay (2008), “Part of Speech Tagging in Bengali using

Support Vector Machine”,Proceedings of the International Conference on Information

Technology (ICIT 2008), pp.106-111, IEEE.

Ekbal , M. Hasanuzzaman and S. Bandyopadhyay (2009), “Voted Approach for Part of

speech Tagging in Bengali”, Proceedings of the 23rd Pacific Asia Conference on

Language, Information and Computation (PACLIC-09), December 3-5, Hong Kong, pp.

-129.

Ganesan M (2007), “Morph and POS Tagger for Tamil” (Software) Annamalai

University, Annamalai Nagar.

G.SindhiyaBinulal, Goud P. A, K.P.Soman(2009), “A SVM based approach to Telugu Parts

Of Speech Tagging using SVMTool”, International Journal of Recent Trends in

Engineering, Vol. 1, No. 2, May 2009.

HimanshuAgrawal, Mani Anirudh (2006), “Part Of Speech Tagging and Chunking Using

Conditional Random Fields” Proceedings of the NLPAI MLcontest workshop, National

Workshop on Artificial Intelligence.

Mandeep Singh Gill, Lehal G.S. (2008) “Grammer Checking System for Punjabi” Coling

:companion volume Posters and Demonstrations pages 149–152 Manchester.

Manish Shrivastava, Bhattacharyya Pushpak (2008), “Hindi POS Tagger Using Naive Stemming: Harnessing Morphological Information Without Extensive Linguistic Knowledge”, Proceedings of ICON-2008: 6th International Conference on Natural Language Processing.

Manjuk,SSoumya , Idicula S.M. (2009), “Development of A Pos Tagger for

Malayalam-An Experience”, Proceedings of 2009 International Conference on Advances in

Recent Technologies in Communication and Computing, IEEE .

NavanathSaharia, Das Dhrubajyoti, Sharma Utpal, KalitaJugal (2009), “Part of Speech Tagger for Assamese Text”, Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, Suntec, Singapore, pp. 33–36

Ratnaparkhi, A. 1996. A maximum entropy model for part-of-speech tagging. In Proceedings of the conference on empirical methods in natural language processing Vol. 1, pp. 133-142.

Additional Files

Published

15-05-2022

How to Cite

Dr. V. Suresh. (2022). IMPLEMENTATION OF HIDDEN MARKOV MODEL (HMM) FOR PARTS OF SPEECH TAGGING IN TELUGU LANGUAGE. International Education and Research Journal (IERJ), 8(5). Retrieved from http://ierj.in/journal/index.php/ierj/article/view/2485