Semantic Linking for Event-Based Tweet Classification

Topic > Semantic Linking for Event-Based Tweet Classification

IndexAbstractIntroductionSequencingAbstractSocial media moves fast, but nowhere is it faster than on Twitter. The tweets may be short, but the Twitter community is large and the platform has 320 million monthly active users. Identifying which tweets are event-related and classifying them into categories is a challenging task due to the peculiarities of Twitter's language and the lack of contextual information. The ability to understand and analyze the flow of messages on Twitter is an effective way to monitor what people are thinking, what trending topics are emerging, and what major events are impacting people's lives. In this article we will detect events within the Twitter stream using machine learning where our main approach is semantic linking from DBpedia, YOGO and other ontologies which enrich the textual information within the tweet and we get a named entity more generic in the tweet. We used different types of supervised machine learning approaches to detect events and determine the effect of semantic linking with named entities in tweets. Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an Original Essay Introduction The ability to think about and analyze messages on Twitter is an effective way to examine what people are thinking and what trending events are affecting people's lives. Event detection in tweets is a trivial task. To monitor trending topics and how events affect people's lives, automated topic and event detection is an emerging research field. Therefore, event detection is an important task, and many new ways to handle this task have been proposed in the literature. Furthermore, processing Twitter messages is challenging because tweets are 240 characters long, contain little contextual information, and often contain useless data that is not useful for analyzing such as stopwords, emojis, misspelled words, etc. In this paper, we performed name entity recognition and then linked tweets to related events and analyzed them. Next we proposed related events to make single twitter semantically meaningful, for this we apply many different machine learning approaches through which we know the advantages of our approach with high precision and accuracy. Event detection on Twitter is a challenging task due to the absence of relevant data in tweets and most tweets are not identified with events. Furthermore, conventional Textmining strategies are not appropriate, due to the short length of tweets, the high number of spelling and linguistic errors, and the continuous use of random and different languages. We have chosen the supervised learning method for our project and will consider multinomial classification of tweet data since each event type has its own classes. Our motive is to learn machine learning algorithms like Naive Bayesian (NB), Support Vector Machine (SVM) and train these algorithms using the tweet dataset we created and validate our results via Confu-sion Matrix. Furthermore, we replace named entities in the data with their semantic types from different ontologies and calculate the impact of semantic linking of named entities on the accuracy of different classifiers such as Naive Bayesian (NB). There are many URLs present in the text of the tweet that do not carry much information regarding the semantics of the tweet. Sowe remove these URLsRemove duplicate tweetsBy extracting tweets via tweepy we get many duplicate tweets. Then, before preprocessing them further, we remove duplicate tweets in the dataset. Remove Mentions Remove Unicode Unicode UTF-8 special characters are useful for adding small extra characters to text, but this extra information is not useful for sentiment analysis, so we remove them characters.Remove unwanted special symbols Word segmentationRemove emoticonsRemove open lines or blankRemove stop wordsStop words are high frequencies like is, at etc. are removed from the dataset. Spell Correction Remove extra spacing Spell Correction Lemmatization After preprocessing, the dataset can be analyzed and a further machine learning approach can be applied. Sequential Labeling We label the tweet into different event categories. If the tweet is pulled from arts events like Amsterdam Dance Event, we will insert the Arts label into the dataset. Our strong assumption is that if we crawl data from an arts event, we enter the label as arts. Name Entity Recognition, Linking and Replacement After the preprocessing phase, we need to perform NE recognition, replacement and linking to the NERD API. There are several extractors available for semantic analysis. We used DandelionAPI, TextRazor and DBpedia for this project. Dandelion API The Dandelion API follows a general pattern for requesting data. All requests must be sent, via GET or POST, to the API endpoint, which follows this structure: https://api. dandelion. eu/api/product/methodpath/api-versionEach request must be authenticated. The Dandelion API implements authentication via a singletoken parameter that identifies the caller. Currently only one token is assigned to each user. WordVector is a two-layer neural network that processes text. It takes a corpus of text data as input and generates word vectors as output. It first creates a vocabulary from the training text dataset and then learns the vector representation of the words. The generated output word vector file can be used as features in many natural language processing (NLP) and machine learning methods. Classification After feature extraction we need to perform classification to accurately predict and analyze our dataset. For classification we used the supervised machine learning approach. The supervised machine learning approach is that we have input variables (X) and an output variable (Y) and we use an algorithm to learn the mapping function from input to output. Y = f(X) The goal is to approximate the mapping function so that when we have new input data (x) we can predict the output variables (Y) for that data set. We used Naive Bayes classifier, DecisionT ree Classifier and Support Vector Classifier (SVM). Naive Bayes Classifier Naive Bayes classifiers are a collection of classification algorithms based on Bayes' theorem. This is not a single algorithm but a group of algorithms where they all share a common principle, so each pair of classified features is independent of each other. the assumption of conditional independence is rarely true in real-world applications, hence the characterization as Naive, however the algorithm tends to perform well and learn quickly in various supervised classification problems [4]. It is a simple probabilistic classifier that calculates the probability by counting the frequency in a given dataset. The Naive Bayes classifier has two assumptions.