- Athul Harilal
ST Electronics-SUTD Cyber Security Laboratory, Singapore University of Technology and Design, Singapore
athul_harilal@sutd.edu.sg
- Flavio Toffalini
ST Electronics-SUTD Cyber Security Laboratory, Singapore University of Technology and Design, Singapore
flavio_toffalini@mymail.sutd.edu.sg
- Ivan Homoliak
ST Electronics-SUTD Cyber Security Laboratory, Singapore University of Technology and Design, Singapore
ivan_homoliak@sutd.edu.sg
- John Castellanos
ST Electronics-SUTD Cyber Security Laboratory, Singapore University of Technology and Design, Singapore
john_castellanos@mymail.sutd.edu.sg
- Juan Guarnizo
ST Electronics-SUTD Cyber Security Laboratory, Singapore University of Technology and Design, Singapore
juan_guarnizo@mymail.sutd.edu.sg
- Soumik Mondal
ST Electronics-SUTD Cyber Security Laboratory, Singapore University of Technology and Design, Singapore
mondal_soumik@sutd.edu.sg
- Martin Ochoa
Department of Applied Mathematics and Computer Science, Universidad del Rosario, Bogota, Colombia
martin.ochoa@urosario.edu.co
Keywords: Journal of Wireless Mobile Networks, Ubiquitous Computing, Dependable Applications
Abstract
In this paper we present open research questions and options for data analysis of our previously designed
dataset called TWOS: The Wolf of SUTD. In specified research questions, we illustrate the
potential use of the TWOS dataset in multiple areas of cyber security, which does not limit only
to malicious insider threat detection but are also related to authorship verification and identification,
continuous authentication, and sentiment analysis. For the purpose of investigating the research questions,
we present several state-of-the-art features applicable to collected data sources, and thus we
provide researchers with a guidance how to start with data analysis. The TWOS dataset was collected
during a gamified competition that was devised in order to obtain realistic instances of malicious insider
threat. The competition simulated user interactions in/among competing companies, where two
types of behaviors (normal and malicious) were incentivized. For the case of malicious behavior,
we designed two types of malicious periods that was intended to capture the behavior of two types
of insiders – masqueraders and traitors. The game involved the participation of 6 teams consisting
of 4 students who competed with each other for a period of 5 days. Their activities were monitored
by several data collection agents and producing data for mouse, keyboard, process and file-system
monitor, network traffic, emails, and login/logout data sources. In total, we obtained 320 hours of
active participation that included 18 hours of masquerader data and at least two instances of traitor
data. In addition to expected malicious behaviors, students explored various defensive and offensive
strategies such as denial of service attacks and obfuscation techniques, in an effort to get ahead in
the competition. The TWOS dataset was made publicly accessible for further research purposes. In
this paper we present the TWOS dataset that contains realistic instances of insider threats based on a
gamified competition. The competition simulated user interactions in/among competing companies,
where two types of behaviors (normal and malicious) were incentivized. For the case of malicious
behavior, we designed sessions for two types of insider threats (masqueraders and traitors). The game
involved the participation of 6 teams consisting of 4 students who competed with each other for a
period of 5 days, while their activities were monitored considering several heterogeneous sources
(mouse, keyboard, process and file-system monitor, network traffic, emails and login/logout). In total,
we obtained 320 hours of active participation that included 18 hours of masquerader data and
at least two instances of traitor data. In addition to expected malicious behaviors, students explored
various defensive and offensive strategies such as denial of service attacks and obfuscation techniques,
in an effort to get ahead in the competition. Furthermore, we illustrate the potential use of
the TWOS dataset in multiple areas of cyber security, which does not limit to malicious insider threat
detection, but also areas such as authorship verification and identification, continuous authentication, and sentiment analysis. We also present several state-of-the-art features that can be extracted from
different data sources in order to guide researchers in the analysis of the dataset. The TWOS dataset
is publicly accessible for further research purposes.