Mouse Dynamics Based User Recognition Using Deep L
Mouse Dynamics Based User Recognition Using Deep L
DOI: 10.2478/ausi-2020-0003
1 Introduction
1
https://github.com/norbertFejer/AFE_Project
Mouse dynamics biometric 41
2 Related works
Several behavioural biometrics are already implemented in operational authen-
tication systems. These methods are most often used to continuously verify
the user’s identity. On-line courses use keystroke dynamics to continuously
verify the identity of the registered users. While keystroke data may contain
sensitive personal information, such as names or passwords, mouse dynamics
do not contain sensitive data at all. In contrast to physiological biometrics
which require the usage of a special sensor by the user, usually behavioural
biometric data can be collected without the consent of the user.
One of the first studies regarding the performance of mouse dynamics au-
thentication was written by Gamboa and Fred [8]. They implemented a mem-
ory game as a web application and collected the mouse interactions of the
game users. Mouse interactions were segmented into so called mouse strokes
defined as mouse movements performed between successive clicks. A set of 63
handcrafted features were extracted from these strokes. The feature extraction
phase was followed by the learning phase which consisted of the estimation of
the probability density functions of each user interaction. The system perfor-
mance based on a sequence of 10 strokes was 11.8% EER (Equal Error Rate).
Unfortunately, this data set is not publicly available.
The first publicly available mouse data set was published in 2007 by Ahmed
and Traore [1], although this data set does not include raw data, but segmented
and processed data. The data set contains general computer usage mouse data
of 22 users, that is, users performed their daily work on their computers. Raw
mouse data was segmented into three types of action: PC - point and click:
mouse movement ending in a mouse click; MM - general mouse movement;
DD - drag and drop. Histogram-based features were extracted from sequences
of consecutive mouse actions. They reported on their data set of 22 users
2.46% EER using 2000 mouse actions for user authentication. The authors
extended their data set to 48 users and published a new study on continuous
authentication based on this extended data set [2].
Shen et al. published three papers in the topic of user authentication based
on mouse dynamics [10], [11], [12]. Two data sets were also collected, one for
static (57 subjects) and one for continuous user authentication (28 subjects)
through mouse dynamics. Several machine learning and anomaly detectors
were tested. Authentication performance having low equal error rates (below
1% EER) were obtained by using a large amount of mouse movement data
(e.g. 30 minutes).
42 M. Antal, N. Fejér
Zheng et al. also investigated the user authentication problem in their stud-
ies [13], [14]. They proposed some novel features such as angle based metrics.
They obtained 1.3% EER using a sequence of 20 mouse actions. Unfortunately,
their data sets containing general mouse usage data are also private.
Another study was conducted by Feher et al. [6]. They also collected their
own dataset containing data from 25 subjects. Their best performance was
8.53% EER using a sequence of 30 mouse actions. All these studies were based
on classical machine learning algorithms using some handcrafted feature sets.
The first study to use deep neural networks for mouse dynamics was pub-
lished by Chong et al. [5]. They investigated one and two-dimensional con-
volutional neural networks (CNN) for mouse dynamics. While 1D-CNN net-
work was trained by using the mouse movement trajectory’s time series, the
2D-CNN network was trained using images of mouse movement trajectories.
Despite the loss of time information in the case of 2D-CNN, this model out-
performed both 1D-CNN and SVM models using handcrafted features. They
extended their study [4] by considering Long Short-Term Memory (LSTM)
and hybrid CNN-LSTM networks as well. Among these models the 2D-CNN
model performed best resulting in a 0.96 average AUC (Area Under the Curve)
for the Balabit data set.
3 Methods
3.1 Data preprocessing
A mouse dynamics data set consists of several log files containing mouse events
with the following information: the x and y coordinates, the timestamp and
the type of event. Based on the type of event we distinguish mouse move,
mouse click, drag and drop and scroll actions. Usually a sequence of mouse
movement events is ended in a mouse click, but there are mouse movement
sequences without the ending click. A drag and drop operation performed by
a user results in a sequence of drag mouse events. All mouse events contain
the x and y coordinates of the mouse pointer with the exception of the mouse
scroll event. Therefore, scroll events were not considered.
Mouse events were segmented into sequences. A sequence was ended when
the time difference between two consecutive mouse events exceeded a thresh-
old. These sequences were segmented into fixed sized blocks. When the length
of the sequence is not a multiple of the block size we end up in a few shorter
sequences. These shorter sequences can be dropped or can be concatenated
to obtain full length blocks. Both cases were evaluated in our measurements.
Mouse dynamics biometric 43
3.2 1D-CNN
One dimensional convolutional neural networks (1D-CNNs) are used for time
series modeling. As mouse movement sequences x(t), y(t) are one dimensional
time series, 1D-CNN models are well suited for modeling this type of signal.
Our 1D-CNN architecture can be seen in Figure 1.
A tower model was used with different kernel sizes, which helped the net-
work to learn input sequences on different time scales. We used the sigmoid
activation function and a dropout layer with 0.15 probability to avoid overfit-
ting. The network was trained in Keras [9] using the Adam optimizer (learning
rate: 0.002, decay: 0.0001, loss function: binary cross-entropy). 16 epochs were
used for training and a batch size of 32.
common practice to use well-proven models from the published literature. This
means that both the architecture and the parameters of the model are reused.
In this study we used transfer learning in a slightly different way. As a first step
we developed our own model architecture. Thereafter we trained our model
on a large data set and saved the model. This pre-trained model was reused
for all the measurements performed on another data set. In conclusion, we
transferred only the representation learning that is the knowledge of extracting
the features.
4 Experiments
4.1 Data sets
In this study we used two public data sets: the Balabit Mouse Challenge data
set [7] and the DFL data set [3].
The Balabit Mouse Dynamics Challenge data set contains timing and posi-
tioning information of mouse pointers. As the authors of the data set state, it
can be used for evaluating the performance of user authentication and iden-
tification systems based on mouse dynamics. The data set contains mouse
dynamics data of 10 users, and is divided into training and test sessions where
the training sessions are much longer than the test sessions.
The DFL data set contains mouse dynamics data of 21 users (15 male and 6
female). The raw data format is similar to the Balabit data set therefore it con-
tains timing and positioning information of mouse pointers. A data collector
application was installed on the users’ computers which logged their mouse
dynamics data, therefore the acquisition of the data was uncontrolled. The
sessions of this data set are not divided into training and test sessions. The
details of the data set are available at: https://ms.sapientia.ro/~manyi/
DFL.html.
Table 1 shows the quantity of data available for training using the two types
of settings presented in the 3.1 section. The second column of the table shows
the number of blocks available for each user of the data set when we drop the
short sequences, and the third column contains the number of blocks in the
case of concatenating the shorter sequences into full-size blocks.
Table 1: Number of blocks for each user of the Balabit data set. Each block
contains 128 mouse events.
when measuring the performance of a classifier, it is not always the best choice,
e.g. when the data set is highly imbalanced. A commonly used metric when
measuring the performance of biometric systems is the Receiver Operating
Characteristics (ROC) curve. This curve plots the true positive ratio (TPR)
against the false positive ratio (FPR), and the area under the curve (ROC
AUC) is often used to compare the performances of different biometric systems.
From the point of view of training the models, we distinguish three cases: (i)
models trained from scratch using the training data from the Balabit data set
– PLAIN models; (ii) models using the transfer learning - the models were pre-
trained on the DFL data set – TRANSFER1 models; (iii) models initialised with
transfer learning, then updating the weights using the training data from the
Balabit data set –TRANSFER2. This case is similar to the PLAIN one. While
in the first case we start with random weights, here we adjust the weights
obtained from the TRANSFER1 model.
In the case of the identification measurements, we trained a single classifier
using the training data (balanced - using the same number of blocks from each
user or imbalanced using all the available data from each user), then we used
the same number of test data from each user for computing the evaluation
metrics.
In the case of the authentication measurements, we trained a separate model
to each user using the same number of positive and negative data. In the first
case (300), we took 300 positive blocks of data from a given user, then the
same number of negative data was selected from the remaining users. The only
user not having 300 blocks of data is user35 (see Table 1). In order to increase
the number of training examples we used data augmentation. We added a
random noise drawn from a uniform distribution in the range [−, ] to each
signal (we used = 0.2). Data augmentation was performed independently on
x(t) and y(t) signals. In the second case (ALL), we considered all the positive
data available from a given user, then the same number of negative data was
selected from the remaining users.
Regardless of the measurement type we always separated 70 blocks of data
from each user for evaluating the model. Therefore, all types of training were
evaluated using the same amount of test data.
We used a single pre-trained model for transfer learning. This model was
trained on the DFL data set. Therefore, we transferred the learned data rep-
resentation from one data set to another.
All the evaluations were performed in Python 3.6.8 (Anaconda distribution)
using Keras [9].
4.4 Results
4.4.1 Biometric identification
The results shown in the Table 2 were obtained using full sized mouse events
blocks by dropping the shorter mouse event sequences (see subsection 3.1).
The measurements were repeated for the other case where the training data
included concatenations of shorter series. Table 3 shows the comparative re-
sults for the two cases.
We should also notice that using all the available positive data for training
(ALL) the models resulted in better performances for all types of training (see
Figure 2). Not only are the average AUC values higher but the standard devi-
ations are much more lower. This means that there are negligible differences
in performance between users.
We compared our best results with other results obtained on the Balabit
data set using approximately the same size of mouse sequences for predicting
the authenticity of the users. The comparison is shown in Table 6. It can
be seen that our model has brought a significant improvement compared to
Chong et al.’s [4] 1D-CNN model, moreover it is better than their optimized
2D-CNN model performance.
5 Conclusions
In this study we proposed a novel 1D-CNN model for user authentication based
on mouse dynamics. The advantage of our model over the classical machine
learning model is that there is no longer need for ad-hoc features; the model is
Mouse dynamics biometric 49
0.95
0.90
0.85
AUC
0.80
0.75
0.70
300
0.65 ALL
PLAIN TRANSFER1 TRANSFER2
Training type
Figure 2: Authentication results for the Balabit dataset. Training data: 300 vs.
all. Training methods: PLAIN, TRANSFER1, TRANSFER2. Each box shows
the distribution of users’s performances (AUC) using the given training data
and method.
able to learn the features from raw data. However, we also demonstrated that
transfer learning or learning the data representation on an independent large
data set could improve the performance of the authentication system. The
results show that our 1D-CNN model performs better than the other CNN
models proposed for the same task.
Acknowledgements
The work of Norbert Fejér was supported by Accenture Industrial Solutions.
References
[1] A. A. E. Ahmed, I. Traore, A new biometric technology based on mouse dy-
namics, IEEE Transactions on Dependable and Secure Computing 4, 3 (2007)
165–179. ⇒ 41
[2] A. A. E. Ahmed, I. Traore, Dynamic sample size detection in continuous authen-
tication using sequential sampling, In Proceedings of the 27th Annual Computer
Security Applications Conference ACSAC ’11, pp. 169–176, New York, NY, USA,
2011. ACM. ⇒ 41
50 M. Antal, N. Fejér