Open Access Open Access  Restricted Access Subscription Access

doi:10.3808/jei.201500311
Copyright © 2017 ISEIS. All rights reserved

Atmospheric Environment and Quality of Life Information Extraction from Twitter with the Use of Self-Organizing Maps

M. Riga1*,M. Stocker2,M. Rönkkö2,K. Karatzas1 and M. Kolehmainen2

  1. Department of Mechanical Engineering, Aristotle University of Thessaloniki, Thessaloniki GR-54124, Greece
  2. Department of Environmental Science, University of Eastern Finland, Kuopio FI-70211, Finland

*Corresponding author. Tel: +30-2310-994359 Fax: +30-2310-994176 Email: mriga@isag.meng.auth.gr

Abstract


The emergence of Web 2.0 technologies has changed dramatically not only the way users perceive the Internet and interact on it but also the way they influence a community and act in real life aspects. With the rapid rise in use and popularity of social media, people tend to share opinions and observations for almost any subject or event in their everyday life. Consequently, microblogging websites have become a rich data source for user-generated information. The leading opportunity is to take advantage of the wisdom of the crowd and to benefit from collective intelligence in any applicable domain. Towards this direction, we focus on the problem of mining and extracting knowledge from unstructured textual content, for the atmospheric environment domain and its effect to quality of life. As the main contribution, we propose a combined methodology of unsupervised learning methods for analyzing posts from Twitter and clustering textual data into concepts with semantically similar context. By applying Self-Organizing Maps and k-means clustering, we identify possible inter-relationships and patterns of words used in tweets that can form upper concepts of atmospheric and health related topics of discussion. We achieve to group together tweets, from more generic to more specific description levels of their content, according to the selected number of clusters. Strong clusters with significant semantic relatedness among their content are revealed, and hidden relations between concepts and their related semantics are acquired. The results highlight the potential use of social media text streams as a highly-valued supplement source of environmental information and situation awareness.

Keywords: air quality, clustering, computational intelligence, k-means, semantic analysis, self-organizing maps, text mining, twitter


Full Text:

PDF

Supplementary Files:

Refbacks

  • There are currently no refbacks.