Improving Accuracy of Named Entity Recognition on Social Media

Bongi Vijay, D.T.V Dharmajee Rao


Twitter has drew a large number of users to share and disperse most onward data, bringing about large volumes of information produced systematic. Be that as it may, numerous applications in Information Retrieval (IR) and Natural Language Processing (NLP) experience the ill effects of the boisterous and short nature of tweets. In this paper, we propose a novel system for tweet segmentation in a cluster mode, called HybridSeg. By splitting tweets into significant sections, the semantic or setting data is all around safeguarded and effectively removed by the downstream applications. HybridSeg finds the ideal segmentation of a tweet by boosting the total of the stickiness scores of its hopeful sections. The stickiness score considers the likelihood of a fragment being an expression in English (i.e., worldwide setting) and the likelihood of a section being an expression inside the clump of tweets (i.e., nearby setting). For the last mentioned, we propose and assess two models to determine nearby setting by considering the phonetic elements and term-reliance in a cluster of tweets, separately. HybridSeg is additionally intended to iteratively gain from sure portions as pseudo criticism. Tests on two tweet informational indexes demonstrate that tweet segmentation quality is essentially enhanced by learning both worldwide and nearby settings contrasted and utilizing worldwide setting alone. Through examination and correlation, we demonstrate that community phonetic origins are more solid for learning nearby setting contrasted and term-dependency.


Named Entity Recognition, Social network communication, knowledge mining, Burst Analysis.



Chenliang Li, Aixin Sun, Jianshu Weng, and Qi He, Member, IEEE, Tweet Segmentation and Its Application to Named Entity Recognition. Transactions on knowledge and data engineering, vol. 27, no. 2, february 2015

B.G. Obula Reddy, Dr. Maligela Ussenaiah, “Literature Survey on Clustering Techniques,” IOSR Journal of Computer Engineering, Volume 3, pp 01-12.

VARUN CHANDOLA, ARINDAM BANERJEE, VIPIN KUMAR, “Anomaly Detection: A Survey,” A modified version of this technical report will appear in ACM Computing Surveys, September 2009.

Artur Silie, Lovro Zmak, Bojana Dalbelo, MarieFrancine Moens, “Comparing Document Classification using K-means Clustering”.

A. Ghose and P. G. Ipeirotis, “Estimating the helpfulness and economic impact of product reviews: Mining text and reviewer characteristics,” IEEE Trans. Knowl. Data Eng., vol. 23, no. 10, pp. 14981512. Sept.2010.

K.A. Kontogiannis, R. Demori, M. Galler, M. Bernstein,“Pattern matching for Clone and Concept Detection,” Automated Software Engineering Volume 3, pp 77-108, 1996.

Genrikh Altshuller, “Concept Generation,” Soviet patent investigator, 1950.

Prof. Nam Suh, “Axiomatic Design for Concept Generation,” MIT.

Ankan Saha and Vikas Sindhwani: 2012,“ Learning evolving and emerging topics in social media: 0.1145/2124295.2124376.

Victoria J. Hodge, “A survey of outlier Detection Methodologies,” Kluwer Academic Publisher, Netherlands, 2004.

Full Text: PDF [Full Text]


  • There are currently no refbacks.

Copyright © 2013, All rights reserved.|

Creative Commons License
International Journal of Science Engineering and Advance Technology is licensed under a Creative Commons Attribution 3.0 Unported License.Based on a work at IJSEat , Permissions beyond the scope of this license may be available at