Volume 10 - Issue 4
Reference architecture for social networks graph analysis
- Maxim Kolomeets
Laboratory of Computer Security Problems, St. Petersburg Institute for Informatics and Automation (SPIIRAS), 199178, St. Petersburg, Russia, Universite de Toulouse – IRIT, 31400, Toulouse, France, ITMO University, 197101, Saint-Petersburg, Russia
kolomeec@comsec.spb.ru
- Amira Benachour
LAAS CNRS, Universite de Toulouse, 31031, Toulouse, France, University of Sciences and Technology Houari Boumediene, BP 32 El-Alia, 16111, Algiers, Algeria
abenachour@laas.fr
- Didier El Baz
LAAS CNRS, Universite de Toulouse, 31031, Toulouse, France, ITMO University, 197101, Saint-Petersburg, Russia
elbaz@laas.fr
- Andrey Chechulin
Laboratory of Computer Security Problems, St. Petersburg Institute for Informatics and Automation (SPIIRAS), 199178, St. Petersburg, Russia
chechulin@comsec.spb.ru
- Martin Strecker
Universite de Toulouse – IRIT, 31400, Toulouse, France
martin.strecker@irit.fr
- Igor Kotenko
Laboratory of Computer Security Problems, St. Petersburg Institute for Informatics and Automation (SPIIRAS), 199178, St. Petersburg, Russia
kotenko@comsec.spb.ru
Keywords: Social networks, graph analysis, big data, parallel computing, GPU, graph databases, data visualization, real-world networks.
Abstract
When analyzing social networks, graph data structures are often used. Such graphs may have a complex
structure that makes their operational analysis difficult or even impossible. This paper discusses
the key problems that researchers face in the field of processing big graphs in that particular area.
The paper proposes a reference architecture for storage, analysis and visualization of social network
graphs, as well as a big graph process “pipeline”. Based on this pipeline it is possible to develop
a tool that will be able to filter, aggregate and process in parallel big graphs of social networks,
and at the same time take into account its structure. The paper includes the implementation of that
pipeline using the OrientDB graph database for storage, parallel processing for graph measures calculation
and visualization of big graphs using the D3 library. The paper also includes the conducted
experiments based on the calculation of betweenness centrality of some graphs collected from the
VKontakte social net.