Network Analysis meets Text Mining for Social Media Analysis

Előadó: Bernd Wiswedel, KNIME.com AG

The goal of every company these days (small or big, local or international) is to make use of their social data. Tons of pre-packaged applications are available, to screen the user sentiment or to represent the user circles using channel reporting tools, score-carding systems and predictive analytic techniques (primarily text mining).

Each has its useful aspects, but each also has limitations. In this presentation we will discuss a fourth approach – using a predictive analytic platform (KNIME), that includes not only text mining, but network analysis as well as other predictive techniques such as clustering, to overcome the limitations of the previous techniques and to generate new fact based insight.   KNIME [naim] is a user-friendly graphical workbench for the entire analysis process: data access, data transformation, initial investigation, powerful predictive analytics, visualisation and reporting. The open integration platform provides over 1000 modules (nodes), including those of the KNIME community and its extensive partner network.

This approach was first used at a major European Telco. However, since data was proprietary, we replicated the work on publicly available data, to explain the detailed approach. In this project, text mining and network analytics were combined together to provide a better description of each user of a forum in terms of leadership and sentiment. By using network analytics, an authority score was calculated for each forum user. Text mining was used to measure the attitude of each user in the forum. Combining the authority/follower score with the attitude measure in a scatter plot, we easily detected the most extreme users in terms of attitude. It was interesting then to observe their degree of influence on the other forum participants.

Outlier identification, though, helps neither with an automatic user characterization nor with the description of the remaining more average users. Therefore, we reached to traditional data analytics in order to define a few groups with more general user features. Indeed, we identified a number of different clusters, including a very large cluster of inactive neutral users, a smaller cluster with positive and very active users, and an even smaller cluster with negative very active users. Different actions were then devised for different clusters of users.