Unsupervised marine vessel trajectory prediction using LSTM network and wild bootstrapping techniques

. Increasing intensity in maritime trafﬁc pushes the requirement in better prevention-oriented incident management system. Observed regularities in data could help to predict vessel movement from previous vessels trajectory data and make further movement predictions under speciﬁc trafﬁc and weather conditions. However, the task is burden by the fact that the vessels behave differently in different geographical sea regions, sea ports, and their trajectories depends on the vessel type as well. The model must learn spatio-temporal patterns representing vessel trajectories and should capture vessel’s position relation to both space and time. The authors of the paper proposes new unsupervised trajectory prediction with prediction regions at arbitrary probabilities using two methods: LSTM prediction region learning and wild bootstrapping. Results depict that both the autoencoder-based and wild bootstrapping region prediction algorithms can predict vessel trajectory and be applied for abnormal marine trafﬁc detection by evaluating obtained prediction region in an unsupervised manner with desired prediction probability.

1 Introduction more in comparison with the same quarters in 2016. Totally, more than 90 % of cargo is transported by sea [27]. Increasing intensity in maritime traffic pushes the requirement in better prevention-oriented incident management system. One of the control technique of this complex management system is the abnormal vessel movement detection. Detection is based on predicting of vessel trajectory by navigational data sequences analysis and search of irregular, illegal, and other anomalous appearances in trajectory/navigational data [9]. The vessel traffic anomaly detection task can be defined as a task of outlier detection, where vessel traffic data are being analyzed as multiple standalone vessels positions/navigational vectors (point-based) or in trajectory-based manner, where vessel's vectors are structured to time series sequences [19]. Automated marine traffic data gathering systems returns huge vessels trajectory/navigational data sets, which are challenging for human-based analysis and traffic anomaly detection [28]. In general, marine traffic is a dynamic system, where vessel's traffic properties change in space and time. Such type of data can be defined as spatio-temporal time series. However, despite advances in prediction of spatio-temporal data with deep neural network, the authors do not propose prediction/confidence interval evaluation that is crucial for marine traffic anomaly detection with this method. Cruz at al. [7] has proposed univariate solution for LSTM prediction interval estimate by joint supervision, but is not sufficient because marine traffic is defined by multivariate coordinates and the proposed method need to be improved. Recently published works take advantages of extended LSTM (Long Short Term Memory) neural networks to learn spatio-temporal dependencies (see [12,15,20]). This paper investigates and proposes a method based on LSTM autoencoder [18] to predict vessel trajectory and evaluate prediction region. The paper extends previous authors' investigations in field of marine traffic anomaly detection [21][22][23][24][25][26]. The paper is structured as follows: Section 2 describes proposed methodology for abnormal marine traffic detection. Section 2.1 describes data preparation before trajectory patterns learning. Section 2.2 depicts LSTM network architecture used for vessel movement prediction. Next, Section 2.2 describes a method for LSTM prediction region evaluation training, custom loss functions. Then Section 2.2 description of wild bootstrapping is presented with the view to be able to compare predictions. Section 3 reveals the experimental investigation of proposed methods, depicts the parameters and other settings that were used for model results validation. Finally, Section 4 concludes the paper.
2 Experiment setup 2.1 Data set preparation Data structuring. The source of the vessel traffic data is AIS. Navigational information includes data about vessel current geographical location coordinates in WGS84 geodetic system, heading, course over ground (COG), speed over ground (SOG), maritime mobile service identity (MMSI) for unique vessel identification, navigational status, vessel type, destination port, vessel length/width, draught, callsign, name, cargo type, and estimated time of arrival (ETA).
The raw data is stored in a flat structure, where each record consists of the vessel navigational data at certain time. The data structure can be represented by X = {x 1 , x 2 , . . . , x i , . . . , x o−1 , x o }, (1) , x (2) , . . . , x (j) , . . . , where x i , i = 1, 2, . . . , o, is a single vessel navigational data record that consist x (j) , j = 1, 2, . . . , f , parameters as: vessel unique identifier MMSI, "latitude", "longitude", SOG, COG, ship type, timestamp of the data being received. Whole data set contains o number of records, where each record is set of f parameters of vessel navigational vector. Vessels send navigational data periodically, thus received data instance is stored in ordered manner by the timestamp as the data was received. This way structured raw data is complicated to work with distinctive vessels multiple navigational data that forms vessel's sea path over time. In order to develop a model that recognizes the unusual behavior of vessel's traffic, we need to train model to predict single vessel path. To achieve that, authors restructure data per unique vessel, vessel's navigational data is grouped by unique vessel identifier MMSI and ordered by the timestamp: where S(X) is a data restructuring function that restructures the navigational data vectors of each ship into matrix rows according to the ship's MMSI parameter x (MMSI) . Each row of the matrix consists of a set of navigational data vectors s 1 , s 2 , . . . , s v of an individual vessel, where v is the number of distinct vessels. The sets navigational data of each vessel are compound of the navigational vectors x (v,bv) , and b v is the number of navigational vectors of a specific vessel. The vectors of each vessel in the set are sorted by the data acquisition time parameter when x (time) (v,bv) . Vectors received earlier in time are marked with lower index number. Data cleaning. AIS systems interconnect many participants such as vessels, vessel traffic services (VTS), receiving base stations, and collection databases. The Danish maritime agency has collected in territorial waters during year 2019 more than 1 TB and 10 9 vessel navigational vectors [1]. The problem is that enormous quantity of vessels, for example, can have transponders manufacturer by different vendors communicating with AIS through VHF radio band that is sensitive to external noise and collision with other vessel data transmission.
To overcome these problems, the gathered vessel data from the AIS needs to be cleaned. All vectors that are outside of analyzed marine area has to be dropped (the selected areas of the sea traffic will be depicted in Section 3.2), all duplicated data and anchored vessel navigational vectors are removed. Short sequences of vessel vectors (paths) shorter as predefined training sequence length n are removed as well. The cleaning of vessel path gaps are depicted in the paragraph Splitting to sequences of vessels https://www.journals.vu.lt/nonlinear-analysis navigational vectors below, ensuring that the generated sequence contains successive vectors between which the time interval of acquisition ∆t = x (time) (v, bv−1) is lower than predefined threshold T ∆t , otherwise, sequences are discarded. The threshold is described in process of method tuning in Section 3 of this paper.
Filling missing values. The collected vessel traffic data may have missing features values in navigational vectors. Depending on the type of missing feature, a different strategy for filling in the missing values may be chosen. Below, authors present the variety missing feature value types and further depict the strategies that were applied to solve the missing value issue.
Static features. Features that are static and belong to same vessel (data with same MMSI). Such physical properties cannot change during a time. As example, it can be a vessel type, length, or other physical property sometimes being distorted by the radio transmission.
Dynamic features. Feature values that change in time within same vessel navigational vectors. It can be vessel location, heading, or other data from on board vessel sensors.
Partially missing values of static features. Wrong static feature values x (f ) are available at least in few of s v vessel navigational vectors. Examples of such features could be the type of vessel, the length of the vessel, or other physical parameters of the vessel. Above mentioned discrepancies happen because of the inconsistent input of data into the AIS transmission equipment.
Completely missing values of static features. Feature values of the x (f ) that are absent in the entire set of vectors s v of particular vessel.
Strongly correlated missing values of dynamic feature. Features that strongly correlate with other features within the same vessel sent data (properties of vessel are very similar, or vessel's AIS transceiver don't have information from all on-board sensors). As an example, this could be heading, which strongly correlates with course over ground (COG). Also, on some vessels, for unknown technical reasons, Heading field is not sent or is replaced with the values of COG.
Weakly correlated missing values of dynamic feature. Features that has weak or very weak correlation with other features. As example, it can be vessel rate of turn (ROT), estimated time of arrival at the port of destination, etc.
For each group of missing value types, a different value correction strategy is applied.
Partially missing static features values -missing values are filled in by searching actual value withing same vessels' navigational data. After it is found, the rest of vectors are filled with that value, otherwise, is treated as completely missing static features.
Completely missing static features values -depends on type and property of feature and results in two techniques: to fill missing values with predictive model that based on vessel trajectory and can predict missing feature in significant accuracy, that is, higher than the critical threshold value T acc . In this paper, 0.95 threshold were used. Such approach is proposed in previous work [22]; either discard the entire attribute from all data for all vessels, or if the number of missing values low and mainly relates to small amount of vessel, then we drop particular vessels from data set.
Strongly correlated missing dynamic feature values -missing features that are strongly correlated with other features may be discarded for all vessels if they are missing in significant number of vessels. Otherwise, if only 1 % of vessels are missing of such feature, then all navigational data of particular vessels that miss values is dropped from data set.
Weakly correlated missing dynamic feature values. If less than 1 % of vessels are lacking of particular feature, these vessels data is excluded from further analysis. If percent is higher that 1 %, then only that specific feature is excluded from feature list.
The minimum set of features must be maintain such as longitude, latitude, Speed over ground (SOG), Course over Ground (COG), Wind direction, Wind speed, Wave direction, and Height.
Vessel data down-sampling. A vessel's AIS transceiver sends the data every 2 to 10 seconds and depends on a vessel's speed while underway or each 3 minutes while a vessel is anchored. In practice, databases typically store data at various time intervals between subsequent registration of ship position in the AIS system. Registration interval may vary from 2 seconds to 10 minutes and depends on data provider. With the view to set up the experiment, it is necessary to set the same time interval for all the positions of all ships in same training data set. Proposed method down-samples a vessel subsequent navigational vectors to predefined interval ∆T interval . To achieve this, the nearest neighbour algorithm is applied to select nearest navigational vector (Euclidean distance) according to vessel's data vector sent to AIS timestamp.
Feature engineering. The lack of constant data intervals results in variation of sailed distance at same speed while analysing a vessel's subsequent vectors. In order to solve the issue, a new differential features are introduced. First, a new feature introduced is a time difference between sequential vectors' timestamps expressed by where (v,bv) is particular navigational data vector's time of same vessel, and x (time) (v,bv−1) is previous data vector receive time. v is particular vessel's data set number, b v is vessels position vectors number in sequence as in formula (2). Two more introduced features are extracted to express vessel movement differential in time for latitude and longitude as follows: Here x (δLat) (v,bv) and x (δLon) (v,bv) are newly constructed features based on latitude and longitude differences in subsequent vectors x In addition to that, earlier works has shown that meteorological data has significant influence to a marine traffic models [22]. This data includes information about wind direction, wind strength, swell direction, swell height, swell period, day/nigh, tide level. https://www.journals.vu.lt/nonlinear-analysis Above mentioned features are as well artificially added to each vessel data vector that were registered by AIS system. Meteorological data is taken periodically from the European Centre for Medium-Range Weather Forecasts (ECMWF) grid. ECMWF provides data in certain geographical interpolated resolution. Vessel position accuracy is much higher than meteorological data grid, thus the assignment of particular grid point to vessel navigational vector has to be accomplished. It can be several meteorological data providers. Used in this research described in Section 3. Meteorological data is assigned to a navigational vectors by using the algorithm of the nearest neighbour in locationwise and time-wise manner. At first, distance to all meteorological locations is calculated by using haversine formula [4] to WGS84 geodetic system coordinates of vessel and meteorological locations. Closest meteorological location is assigned based on calculated distances, and then closest in time forecast is picked from all forecasts for this location. This meteorological data is assigned to vessel position vector.
Splitting to sequences of vessels navigational vectors. In upcoming section, for vessel position prediction, the algorithm of artificial deep neural network (DNN) is described. In order to obtain predictions, the data must be in certain three-dimensional format, thus a sliding window approach for data slicing is applied. Slicing algorithm takes data set structured by S(X) function (2). Window processes each vessel's data set s v separately and slices x (v,1) , x (v,2) , . . . , x (v,bv) set in to sequences in length ofñ+n , whereñ is DNN output sequence (prediction) length, and n is input (for prediction) sequence length. The window slide according to predefined step of η each time producing a new sequence. Each sequence is divided into two parts. The first part in length of n is assigned to input matrix, and the second part in length ofñ is written into output matrix. The obtained matrices can be defined by expressions where χ is model's input, Y is model's output, N is number of vessel navigational vectors sequences, n -single sequence length for input,ñ -navigational vectors sequence length for output, χ and Y are matrices of vessels' navigational vector sequences formed by sliding window process while assigning a vector x (v,b) from current window position to a sequences matrices (χ, Y ). Obtained matrices for further model creation are split into subsets that are used for model training, validation, and testing by random row selection at ratio 60 : 20 : 20 percents accordingly.

Method for trajectory prediction
For vessel trajectory prediction, the paper applies the deep neural network. The deep neural network input is the previous specific vessel's navigational trajectory data, then the prediction of the vessel's subsequent position is calculated by the algorithm. If the prediction obtained by the algorithm falls within defined limit, the vessel's expected location is considered as normal, otherwise, abnormal. The more detailed scheme of the prediction evaluation is presented in Sections 3.3 and 3.4.
Long short term memory neural network. Fully connected dense artificial neural networks does not assure history retrospective. Recurrent neural networks (RNN) was proposed to overcome this challenge and to overcome practical challenges vanishing gradients when longer sequences of input data is used, the paper proposes to use long short term memory (LSTM) network [13] that performs significantly better in other applications: speech recognition [3], handwriting recognition [10], reinforcement learning [11], and many other fields. LSTM structure implements modified back-propagation approach of gradient-descent method that solves vanishing gradient problem and the network can learn complex nonlinear patterns. LSTM network architecture represents interconnected cells. LSTM cells transmit cell state c(t) that is passed to a network with minimal linear operations. Such passed information is often called LSTM cell memory. h(t) is hidden state of cell, and it is the same as cell output y(t). From previous cell h(t − 1) this cell receives hidden state, c(t − 1) cell state, and x(t) cell input. Then the cell computes what information should be kept for further calculations and what has to be forgotten [13]. By interconnecting such cells the architecture of LSTM autoencoder can be obtained.
LSTM autoencoder. An autoencoder is a type of artificial neural network used to learn efficient data encoding in an unsupervised manner [14]. A typical autoencoder is compound of three parts: encoder, latent vector, and decoder. During a training, encoder as well as decoder learn to reduce input and to reconstruct output through compressed latent vector in such way that the network input would be as close as possible to the network output. The main difference of the LSTM autoencoder is that main blocks of the network architectures are LSTM cells. This paper investigates the LSTM autoencoder by providing the sequences of vessel's navigational vector data. Encoder compresses input data X equation (5) to latent space, and decoder forecasts sequence of next vessel positions Y with limited set of spatial features, like latitude and longitude. In this paper, authors use a multivariate multistep LSTM autoencoder shown in Fig. 1. The main parts of a proposed LSTM autoencoder are: input layer, encoder layers, a vector of encoded latent representation, decoder layers, and an output/reconstructed sequence layer. The input layer receives structured navigational vectors sequences χ defined by (5) where L l s is the loss function for l-type of model, N -number of training sequences in the training data set. LSTM prediction region learning.In order to determine the abnormal vessel traffic, authors investigate the ability to check real vessels navigational vectors against predicted ones by the model in two-dimensional space. The assumption is as follows: if vessel's true position vector lays outside of prediction region (multivariate case of prediction interval), it is interpreted as abnormal vessel movement, and all vessel traffic vectors that results inside the prediction region are interpreted as normal vessel movement. The authors investigate and compare two methods for prediction region calculation. The first method, based on LSTM autoencoder training, is described further, and the second, based on a wild bootstrapping method, is described in the subsequent Section 2.2.
LSTM autoencoder in typical configuration calculates only most like-hood forecast (crisp). In order to determine prediction regions, a method proposed by Cruz at al. [7] is used with modification to support multivariate and multistep-type LSTM networks. Prediction region is composed of an upper and a lower bounds in which the prediction/ reconstruction output is found with certain probability α [16]. The region is learned by training two LSTM autoencoders with combined classical MSE loss function (6) with the second metric of region loss function as presented in [6]. The specific loss function for upper and lower bounds is defined as follows: where L upper and L lower is specific loss function for upper and lower bounds, respectively, ReLU is rectified linear unit function defined by ReLU(x) = 0 for x < 0, x for x 0.
As presented in paper [6], data points Y i,j larger than L upper applie a cost equivalent to the squared difference between the real data point and its upper bound prediction/ reconstruction in accordance with Eq. (7). Likewise, data points Y i,j lower than L lower is penalized as defined in Eq. (8). Data points Y i,j that are in prediction region (below upper and above lover bounds) has no cost with a help with ReLU function (9). In combination of the upper and lower loss functions, a higher loss value is applied for Y i,j points that are outside of the prediction region. This regions are learnt by using the same target input data during training process. The overall loss function is defined as the weighted sum of the MSE (6), and the region loss functions (7), (8) -for upper and lower bounds, respectively [6,7]: L lower total = L lower s + λL lower , where L upper total is overall upper loss function, L lower total -overall lower loss function, λa tuneable parameter that represents the relative importance of proposed classical/common and region loss functions [7]. The crisp model's output is learned by using only a MSE loss function (6): where L crisp total is loss function for crisp model. With these loss functions, the minimization of prediction region area is achieved. If functions is not applied, the prediction region loss functions (L ) increase the region area introducing a trade off between the number of points that fall into the region and its area, which can be regulated by modifying the parameter λ in Eqs. (10) and (11).
With the view to evaluate the quality of the prediction region, two indicators were used. The first is the prediction region coverage probability (PICP) that quantify the number of measured values that fall within the region defined by the model [7] and modified to support multivariate features and multistep predictions: The second metric is prediction region normalized average width (PINAW) that is used to measure the area of the region [7] also modified for multistep and multivariate features: where R is the distance between the maximum and minimum values measurements max(Ŷ upper g,i,j −Ŷ lower g,i,j ) in the data set [6,7]. The algorithm to train the network iterative way is shown [6]. In this approach, the λ parameter is increased iteratively to force a wider region area in each of the iteration as the coverage probability increases. In each iteration, the PICP is estimated by Eq. (13) [6]. When the desired coverage probability α has been achieved, the algorithm stops the λ parameter increment. Few more iterations are calculated using fixed λ parameter in order to compensate the random initialization of the initial algorithm weights [6]. Wild bootstrap prediction region. The main advantages of the bootstrapping techniques is that it does not require to make any assumptions on the distribution of the data set being investigated. Traditionally, bootstrap method resamples the initial data to produce more data samples that could be used in repetitive experiment. However, instead of generating bootstrap samples that consist of resampling the original data or residuals, the wild bootstraps combine the data with random variables drawn from a known distribution to form a bootstrap sample. Bootstrap usage in this paper can be summarised by applied further steps: (i) Prepare data as described in Section 2.1. (ii) Calculate data set's variance for every feature type. (iii) Generate multivariate normal random variables while keeping the same dimension, the mean equal to zero, and the variance the same as that of the input data. (iv) Element-wise sum the initial data set with the newly generated, i.e. add noise to the data with mean and variance calculated from initial data set. After the application of the proposed above scheme, the matrix with predicted values is obtained. Then, as point predicted value, the mean vector of k replicates is chosen for each feature and each prediction step. Thus 100(1 − α) % prediction region for the mean (average predicted value) of a p-dimensional normal distribution is the ellipsoid determined for unknown µ such that (see [2]) x i,j,r -the mean vector for each of the feature j ∈ {1, . . . , f } at each prediction step r, S -sample covariance matrix, and F p, k−p (1 − α) is an 1 − α-level critical value of a Fisher distribution with p and k − p degrees of freedom.

Experiments
With the view to validate and test prediction approaches being presented, the experiments were performed with real marine traffic data. The data was prepared as described above. Further in this section, the experiment setup is being described together with the obtained results. The LSTM networks were trained, and wild bootstrapping technique was applied, predictions regions calculated.

Description of the data sets
AIS data set. The AIS data set for this research are obtained from Danish Maritime Authority [1] and contains historical AIS maritime vessel traffic in Danish waters data from 2006 year to 2020 year. Records of this data sets contain single vessel navigational vector and presented by Eq. (1). Single record has such data fields: Timestamp -timestamp from the AIS base station; Type of mobile describes what type of target this message is received from; MMSI -MMSI number of vessel; Latitude -latitude of message report (e.g. 57,8794); Longitude -longitude of message report (e.g. 17,9125); Navigational status -navigational status from AIS message if available, e.g.: "Engaged in fishing", "Under way using engine", and etc.; ROT -rot of turn from AIS message if available; SOG -speed over ground from AIS message if available; COG -course over ground from AIS message if available; Heading -heading from AIS message if available; IMO -vessel identifier provided by International Maritime Organization; Callsign -callsign of the vessel; Name -name of the vessel; Ship type -describes the AIS ship type of this vessel; Cargo type -Type of cargo from the AIS message; Width -width of the vessel; Length -lenght of the vessel; Type of position fixing device -type of positional fixing device from the AIS message; Draught -draugth field from AIS message; Destinationdestination from AIS message; ETA -estimated time of arrival if available; Data source type -data source type, e.g. AIS.
Meteorological data set. Meteorological data set obtained from World Weather Online service API [17] (application programming interface) in the European Centre for Medium-Range Weather Forecasts (ECMWF) grid. This data contains information about wind direction, wind strength, swell direction, swell height, swell period, day/nigh, tide level. Meteorological data is provided periodically in 3 hours periods. Data collected in period from 2019 November 1 to 2020 June 31.

Data preprocessing
For method validation, the data time interval is taken from 2019 November 1 till 2019 November 30, and geographical region -between 54.2620−54.8292 • of latitude and 10.6897−12.9694 • of longitude. The region was chosen in order to have intense marine traffic and variety of different vessel types. Both vessel AIS navigational and meteorological data is registered and used in experiment from time period and geographical region described above with no duplicates.
In earlier research performed by the authors, was found that different vessels types introduce different traffic patterns [26]. Each vessel type has unique traffic patterns. By taking into account that each vessel acts according different pattern and the navigational data sets representing each vessel type are unbalanced, authors decided to investigate separate models for each vessel type. Authors selected "Cargo" vessel type because of largest data amount compared to other vessel types. Data was stored in Ψ defined by (1) data structure, containing number of navigational vectors r for that time period is 7331756, and number of initial parameters f was 22. Then data set restructured per unique vessel identifier MMSI and ordered by the timestamp by structure defined in (1). The preparation of the data resulted in 1144 distinct vessels v. The predefined calibrated parameter ∆T interval is chosen 2 minutes. This parameter can be calibrated based on awareness requirements of Vessel Traffic Service (VTS). This research is based on assumption that anomaly detection will performed in middle range of vessel activity, that is, average 30 % of activity in region of interest. In average, the vessels pass this "Fehmarnbelt" region in 5-10 hours. Minimum time to detect trajectory anomaly required is 1.6−2.4 hours. If n = 50, then 2 minutes falls in this range. During down-sampling, the nearest neighbour algorithm was used to select down sampled values.
As described in Section 2.1, the vessels with very few navigational vectors emerge because of noise introduced by the nature of AIS, and those vessels that just entered region of interest. All vessels with vectors that are shorter than n +ñ and for this case is chosen n = 50,ñ = 50, are removed. This made because we are not interested in vessel that are leaving geographical region of interest. Another issue related with the data is speed over the ground. It is observed that some data contains vectors with 0 knots speed. As we are interested in only moving vessel new position in space predictions, vectors having SOG equal to 0 are removed from the data. Only moored or anchored vessels has 0 value of SOG. Since this paper analyzes anomaly of moving vessel behaviour, the moored or anchored is removed from data set to balance data set accordingly.
After data cleaning, the filling missing values and removing unnecessary features takes part. Fields "Type of mobile", "Type of position fixing device", and "Data source type" are removed because contains data only related to data transmission, but to vessel traffic. Fields "IMO" and "Callsign" are removed as they represents unique identifier of the vessel same as MMSI. Fields "ROT", "Cargo type", and "Destination" is removed because majority of vessels are missing these data. Then all missing values are filled based on description in Section's 2.1 Filling missing values paragraph. Additional features were engineered by applying Eqs. (3), (4) to express vessel movement differential in time for latitude and longitude, then each vessel navigational vector enriched with meteorological information as described in Section 2.1.
Splitting to sequences of vessels navigational vectors performed by window approach described in Section 2.1 under paragraph Splitting to sequences of vessels navigational vectors. Authors took data structure as described by Eq. (2) and spited the data with moving window by applying step µ = 50 to set of sequences X and Y in n = 50, n = 50. During splitting, algorithm follows a few additional rules: firstly, one sequence must contain only subsequential vectors of same vessel; secondly, if time difference of two subsequent vectors ∆t = ψ time v,bv − ψ time v, bv−1 is larger than predefined threshold T ∆t = 10 min, the sequence is discarded. This step allows to filter sequences with region of interest crossing vessels. After data split, total number of sequences obtained is N total = 26214 that further was randomly shuffled and divided to training/validation and testing data subsets while keeping ratio 80 : 20 (N = 20971 to N test = 5243) accordingly. The test data set is kept untouched until final evaluation of model. The training/validation data set is split randomly before each training epoch in similar ratio 80 : 20 to training/validation set (N train = 16777, N validation = 4194).

Prediction region estimation
LSTM autoencoder network is applied as that described in Section 2.2. The input shape of network set as input sequence length n = 50, number of features f = 18, and batch size of 512. The first LSTM layer has 128 units and forms 50 × 128 sequential output that connected with the second LSTM layer with 16 units and forms not sequential output. The encoded latent representation vector has 16 units. Third LSTM layer has 16 units and last has 128 units. Output layer size is being used two-dimensionalñ = 50 and f = 2. For the network output, only spatial features latitude and longitude are used. For all LSTM cells, TANH activation function were applied. To ensure cross validation and regularization before each epoch an initial training/validation, data set is randomly divided into training and validation data sets with ratio 80 : 20 percents of training/validation data. Before each epoch, the data set instances are reshuffled and samples assigned to training and validation nonoverlapping data sets. 300 epoch are set for network training. As an optimiser, the Adam algorithm [8] is used with calibrated parameters: learning rate α Adam = 0.001, exponential decay rate for the first moment β 1 = 0.9, exponential decay rate for the secondmoment β 2 = 0.999, and = 10 −8 . LSTM networks were configured with different loss functions: crisp network has configured to use loss function (12); lower and upper bounds LSTM networks configured by loss function (11) and (10) accordingly. All the models were trained by the algorithm described in Section 2.2, and results are shown in Table 1. Initial value of tuneable parameter λ = 10 is chosen experimentally and were increased in each step by ∆λ = 5 while reached PICP = α = 0.95. Figure 2 depicts change of PICP while training networks at different value of λ. When PICP reaches desired value α = 0.95, the trained model for each specific network with that particular λ value are chosen for further investigation. It is worth to note that training upper and lower models start with slow progress and only after 60th epoch loss starts to drop significantly. After all three models types (crisp, lower, upper) were trained, prediction part take place. In order to predict and evaluate traffic anomaly, a timed sequence of navigational vectors are taken for that vessel. This sequence prepared as described in Section 2.1. The first n = 50 prepared vectors forms input X of the models. Lower and upper models produce prognosesŶ upper andŶ lower . A vessel movement considered as normal when true value Y of is between Y upper > Y >Ŷ lower , and otherwise, considered as abnormal. As the alternative to the LSTM prediction region learning method, the wild bootstrapping method was tested as well. Method was described in Section 2.2. The steps were repeated k = 100 times.

Results
Trained models were tested on test data set. In Fig. 3 obtained by lower bound model at time steps 25, 52. Red dashed lines depict boundaries of prediction region for particular movement forecast. Blue ellipse bounds prediction region calculated by wild bootstrap method. Trained models were tested on test data set. Figure 4 shows randomly selected cases of normal vessel trajectory.   shown with green, blue stars and red doted rectangle. True value lay inside the rectangle and the vessel trajectory should be considered as normal according to LSTM prediction model. If true value lay inside blue ellipse, then the vessel trajectory considered as normal by wild bootstrap method. Both black triangles is in red rectangle, but only one is inside blue ellipse. This situation illustrates the case when the LSTM prediction region method indicates traffic as normal while bootstrapping method as abnormal. It is observed that the narrow marine traffic area is the smaller region models learns and https://www.journals.vu.lt/nonlinear-analysis  wise versa. This is seen in Fig. 3(d), where the first prediction region is smaller because it is on vessel's routes junction. In opposite, if vessels routes splits up, the prediction region become wider by covering almost all possible routes of the specific vessel type. Figures 4 depict abnormal vessel traffic cases. Both methods, LSTM prediction region and wild bootstrapping has classified those as abnormal. Figure 4(a) shows anomalous case where a cargo vessel unexpectedly turned around by changing direction 180 degrees due captains decision to return port for repair a engine malfunction. The first 50 vessel's navigational vectors are given into model's input. The model by using both methods had predicted regions where true position of the vessel is expected. Because vessel made sharp change in direction, the true values were outside of predicted regions by both methods. The same figure 4(a) shows that true 25th and 50th vessel positions (black triangles) are outside prediction regions by LSTM region prediction method (red rectangles) and wild bootstrapping method (blue ellipses). The methods classify such vessel traffic as abnormal because it does not fall in α = 95 % prediction region. Figure 4(b) depicts a drifting cargo ship due of broken engine. It is actual navigational vectors are outside of prediction region. Figures 4(c) and 4(d) show other abnormal cases. The first is unexpected turn to minor port, and second unplanned stop due engine failure. Table 1 summarizes results of marine traffic prediction and its evaluation. LSTM prediction region method performs better with the parameter α = 0.95 for both training and test data set. Wild bootstrapping performs much worse as set α. Also, due to nature of algorithm, this method cannot be trained and tested on only test data set, it needs whole set to calculate predictions.

Conclusions
This paper investigates vessel movement prediction and prediction evaluation techniques that can be applied for traffic abnormality detection. The literature review revealed that the most algorithms of trajectory prediction are supervised or semisupervised. The authors of the paper proposes new unsupervised trajectory point prediction and prediction regions at the arbitrary probability. The paper depicts two methods -the LSTM prediction region learning and the wild bootstrapping. The LSTM method is based on learning of prediction region with the view to reach required confidence level by learning the parameters of custom loss function. The prediction region is defined by multivariate LSTM models according to different loss functions for learning the upper and lower bounds that produce prediction region of trajectory point in the shape of hyperrectangle. By investigating experimentally it was observed that the 95 % LSTM prediction region is wider than that obtained by wild bootstrapping technique. And it is assumed that the traffic outside the prediction region is abnormal.
The second proposed method is based on statistical wild bootstrap approach that estimates 95 % prediction region. Nevertheless, during the testing, it was noticed that only 83 % of true vessel trajectory point values are inside wild bootstrapping prediction region. Thus, the method provides narrower confidence regions than those obtained by LSTM. The approach is recommended to use where strict control of marine traffic is required such as sea ports, seaport surroundings, or other sea regions with limitations induced by designated geographical locations.
Results shows that both the LSTM and wild bootstrapping algorithms for estimation of prediction regions can be used for abnormal marine traffic detection. The experiments with the data shows that algorithms, by evaluating prediction region, can detect different types of abnormal marine traffic such as: vessel slowdown, turn around, sharp direction change, unplanned stop, traffic not on seaway, etc. in an unsupervised manner.