Electroencephalogram spike detection and classification by diagnosis with convolutional neural network

This work presents convolutional neural network (CNN) based methodology for electroencephalogram (EEG) classification by diagnosis: benign childhood epilepsy with centrotemporal spikes (rolandic epilepsy) (Group I) and structural focal epilepsy (Group II). Manual classification of these groups is sometimes difficult, especially, when no clinical record is available, thus presenting a need for an algorithm for automatic classification. The presented algorithm has the following steps: (i) EEG spike detection by morphological filter based algorithm; (ii) classification of EEG spikes using preprocessed EEG signal data from all channels in the vicinity of the spike detected; (iii) majority rule classifier application to all EEG spikes from a single patient. Classification based on majority rule allows us to achieve 80% average accuracy (despite the fact that from a single spike one would obtain only 58% accuracy).


Introduction
Machine learning (ML) methods are used in many applications [19] ranging from financial market forecasting [16] to biometric security [2], speech recognition [6], image classification [22] and many others. These applications of ML algorithms include medical signals and images [18]. The object of this study was the application of ML based methodologies in neurology, in particular, to EEG. The main goal of this study is to propose an improved methodology for EEG classification by diagnosis over previous studies [12].
EEG is an electrophysiological monitoring method to record the electrical activity signals of the brain. EEGs recorded in patients diagnosed with various forms of epilepsy contain characteristic EEG spikes (see Section 2). These spikes can be detected with various computer vision [4,9,15] and other methodologies. In this study, EEGs are classified by using automatically detected EEG spikes obtained by applying morphological filter based methods.
EEG analysis for patient diagnosis (for example, epilepsy type) classification is an under-explored topic in modern computer science [12]. There are some studies that classify ill vs healthy patients [8] or alcoholic vs nonalcoholic subjects [14].
This work is follow-up publication by the same group of authors. In our previous study [12], the framework for EEG classification was developed. In this study, we present the substantially reworked version of the better performing algorithm, which is composed of the following steps: • EEG spike detection using morphological filter [4,9,15] (see Sections 3.2 and 3.3).
• Instead of EEG spike feature extraction (upslope and downslope [10,12] corresponding to the steepness of the EEG spike), whole EEG signal fragment, where the spike was detected, is passed to the CNN classifier, thus eliminating the need for EEG spike parameter calculation. • All 21 EEG channel data (standard channels of 10-20 international EEG system) are used instead of a single channel, where the EEG spike was detected. • Object wise standard scaling of each individual EEG spike data is being performed before the actual classification. • Instead of classification of EEGs by diagnosis, using artificial neural network (ANN) based classifier [12], CNN based classifier is used. • Instead of a fixed number of EEG spikes requirement, any number of EEG spikes can be used in majority rule classification of an EEG.
Our experience shows that EEGs of patients with diagnoses (defined in Section 2) can be classified using the perceptron based ANN with classification accuracy ranging around 75% (with current dataset [12]). One of the goals of this study was to change and improve the algorithm presented in reference [12] along accuracy and other classifier quality metrics. Since CNN based classifier performs well in other signal processing applications, like speech recognition [6], the hypothesis was raised that a CNN based classifier could prevail over a feed-forward ANN and other classifiers [11] in the task of classifying EEGs by diagnosis.
CNN is a well-established method of image processing and analysis, as well as image classifier, however, this is the first attempt (to the best of our knowledge) of using CNN in classification by diagnosis of patient groups under investigation. It was suggested to use CNN for EEG signal snapshot matrix-like (or image-like) input [5][6][7]. This hypothesis was confirmed by achieving classification accuracy of 58% for the single spike classification and 80% with majority rule classification in our study, as well as solid performance according to other classification metrics (see Section 4).
This article has the following structure: EEG data and problem of our research are described in Section 2; the algorithm for EEG classification is outlined in Section 3; experimental results and discussion of algorithm, as well as its performance metrics, are presented in Section 4; conclusions are presented in Section 5.

EEG data
The EEG database employed in this study was obtained from the department of neurology, Children's Hospital, Affiliate of Vilnius University Hospital Santaros Klinikos. All the EEGs were recorded using the 10-20 international EEG system with 256 Hz discretisation frequency. Only EEGs containing recordings of all standard 21 signals of the 10-20 system were included in the analysis. The data set contains 216 patients (3)(4)(5)(6)(7)(8)(9)(10)(11)(12)(13)(14)(15)(16)(17) year-old children). Two patient groups were included in the data set: • Group I consists of patients with benign childhood epilepsy with centrotemporal spikes (frequently referred as BCECT, rolandic epilepsy or self-limited epilepsy with centrotemporal spikes) exhibiting the EEG telltale sign -the benign epileptiform discharge (BED, further in text -spike) frequently appearing as so called rolandic discharge [3]. Mostly it is represented by the sharp wave or spike detected over the centrotemporal brain regions (T3, T4, C3, C4 EEG regions). Other regions such as central, centroparietal or centrofrontal, can also be involved [1]. • Group II is a set of patients diagnosed with structural focal epilepsy: patients with dysplastic brain lesion, cerebral palsy, gliosis etc., following ILAE (International League Against Epilepsy) criteria [20]. Only patients with EEGs containing the visually similar electroencephalogram (EEG) pattern (benign epileptiform discharges) were included. The data set cases were distributed unevenly across the diagnoses: about 73% of EEGs were in Group I, while the remaining 27% were in Group II. Since our dataset was imbalanced, we presented accuracy as well as other metrics discussed in Section 4.
Although, in some cases, Group I and Group II EEGs are obviously different, only the cases that were difficult for neurologists to distinguish visually, with no access to clinical records, were classified in this study. This is the main reason our dataset was unbalanced: Group II EEGs that are similar to Group I EEGs are rarer. The exact diagnosis for each EEG was known from the patient's clinical record.
Each patient was strictly assigned either to the training or to the testing pool of the classification algorithm.

Algorithm for EEG classification by diagnosis
The algorithm had two main steps of EEG classification [12]: 1. EEG spike detection (see Section 3.2). 2. Classification by machine learning based classifiers (see Section 3.4).

Preprocessing
The preprocessing of EEG signals was done in two steps in this work. The first step was taken before the EEG morphological filter was run (see Section 3.2). Before the http://www.journals.vu.lt/nonlinear-analysis morphological filter was applied, the utility frequency (50 Hz, electric frequency in Europe) was removed with the FFT based band removal filter removing frequencies from 49 Hz to 51 Hz. Object wise standard scaling (or standardisation) was performed before the classification step occurred.

EEG spike detection
The EEG spikes are detected by a morphological filter based algorithm (for details, see [4,9,10,15]). The premise of operation of the morphological filter is that normal brain activity (e.g., brain rhythms) are filtered out, while abnormal brain activity (e.g., EEG spikes) was left out [4]. Any values of filtered signals that are higher than the detection limit are considered to be spike candidates [9]. The spike detection algorithm is implemented employing a combination of morphological filters and operations. The operations used to detect spikes can be expressed through morphological grey erosion and dilation.
We employ these notations: the signal in an EEG channel investigated is signified by f (t), the structuring element is denoted by g(t), while reflection of the structuring element is g s (t) = g(−t). D denotes the domain of signal f (t). Then erosion is Dilation can be defined as Employing expressions (1) and (2), opening and closing operators can be defined. Opening: The closing operator is defined as EEG spikes can exhibit both positive and negative amplitudes, thus both open-closing and close-opening operations are needed to compensate for that. Employing formulas (3) and (4), these operators can be defined. Open-closing: Close-opening is defined as Both OC and CO have an impact of the same absolute value, but different signs on the average value of signal. Thus, to eliminate the change, we employ averaging out the value of (5) and (6) equations: Expression (7) denotes the value of the morphological filter. In order to apply it, we still need to define the structuring elements employed (see Eqs. (5) and (6)): where k i is coefficient used in optimisation (see Section 3.3) with default value of 1, a i , and b i are defined as Here W is an array of EEG signal arc lengths [4]. Since brain activity of the patient changes with time, coefficients defined in Eq. (9) need to be recalculated every t r = 5 s. Every part of the EEG that goes over certain detection limit L is considered to be an EEG spike candidate: Here k L is the coefficient used for optimisation (see Section 3.3) with default value of 1, f filtered is the filtered signal, which can be defined as Fig. 1 for visualisation of OC, CO and OCCO filter operation).

Optimisation of the parameters of the EEG spike detection algorithm
As noted in Section 3.2, the EEG spike detection algorithm has some constants (e.g., in Eqs. (9) and 10)) that were introduced in previous studies [4,15]. Some of them were optimised manually by [9]. However, this study has a different goal compared to these previous studies [4,9,15]: instead of just detecting spikes, we tried to classify EEGs by diagnosis. This means that different metrics (e.g., accuracy, specificity and sensitivity) of EEG detection algorithm might be important. Thus the need to optimise the algorithm by these metrics was introduced. For mathematical convenience of optimisation, several coefficients were introduced: k 1 and k 2 in Eq. (8), k L in Eq. (10) and k r (the value, which is multiplied with t r ) were introduced. The default starting value of all these coefficients was 1.
Since multiple experiments were done with various fitness functions (accuracy, sensitivity, specificity) and their combinations, any mathematical properties of the fitness function can be guaranteed. It can be presumed that the fitness function is discontinuous since k r and k L values cannot be negative. Furthermore, each evaluation of the fitness function is time and resource consuming. For these reasons, genetic algorithm (GA) was employed in order to optimise the parameters mentioned.
A genetic representation of an individual can be written in the following way: [k 1 , k 2 , k L , k r ]. The initial values were generated randomly using normal (Gaussian) distribution with mean µ = 1 and variance σ 2 = 1. This value generation gave us a selection of new genetic individuals scattered around known good solution of [1, 1, 1, 1].
Crossover was implemented by splitting two individuals at a randomly chosen index, swapping the second part and recombining both individuals. Mutation was implemented by modifying a random property of an individual using normal distribution with mean equal to current value and variance σ 2 = 1. Elitism of the selection was applied by carrying over 10% of the best individuals of the current selection to the next one.
Due to the high computational cost of the evaluation of the fitness function of an individual, a population size of 100 individuals was selected. Probability of mutation was 2%. The GA was terminated after 10 populations did not improve the best found solution. For each fitness function, the GA was run 5 times in order to ensure that it arrived to the same (within margin of error) solution. The results are presented in Table 1.
We determined that for EEG classification by diagnosis Min(sensitivity, specificity) fitness function displayed optimal classification results (82% accuracy (see Section 4). We speculate that the reason this metric works the best is due to both high sensitivity (many EEG spikes are detected) and high specificity (high amount of candidate spikes detected are EEG spikes). High sensitivity fitness function resulted in 52% accuracy (of the majority rule voting classifier), and high specificity fitness function resulted in 79% accuracy.

CNN architecture
Despite the fact that each spike was identified on a single specific channel, we have assumed that important diagnostic features might be reflected in or supported by the adjacent channels. The dataset of 216 patients was constructed by selecting the region of 38 data points around each spike position. In total, 51,627 spike snapshots were identified, each featuring 21 stacked EEG channels (standard number channels of the International  10-20 EEG system) and 77 data points wide (21 matrix for each spike) (see Fig. 2). We concluded that 77 data points width is optimal due to the following reasoning: the EEG spike can be no longer than 200 ms, but in order to be guaranteed that the whole spike with nearest surrounding signal is within the window, it was widened to 300 ms. Since the sampling rate of all EEGs is 256 Hz, this means that 256 Hz · 0.3 s = 76.8 ≈ 77 signal elements. While trying to investigate the hypothesis of diagnostic invariance and to solve the binary classification problem, we have decided to use the CNN classifier. Multiple known CNN architectures were tried [17], VGG-16 [21] type architecture displayed the best values of classification metrics. The architecture was further optimised manually with the resulting architecture presented in Table 2. Training and validation metrics of accuracy and loss for a single cross-validation iteration are presented in Fig. 3. The easy stop technique was employed in order to reduce overfitting and achieve the highest detection rate possible. Figure 3 shows that CNN performance saturates after 300 iterations, thus a hard cap of 300 training epochs was implemented.

Experiments and results
Experiments with the classifier and their results are described in this section.
In order to investigate the validity of the classifier created, leave one patient out analysis of a single spike EEG classifier was performed. The original data set with P number of patients was taken, and P training subsets were derived from it in following way: one different patient was removed from each "leave one patient out" training sets giving us P number of training sets. Then algorithm was trained with each of P training data sets and tested on corresponding left out patient resulting in the same (within margin of statistical error on average) results as shown in Table 4. Results of this analysis also imply that results of this study can be generalised to a new dataset of patients. Figure 4 shows ROC (receiver operating characteristic) graph for leave one patient out validation summary results. Table 3 shows performance metric results. Table 4 presents the confusion matrix.
The results show that a single EEG spike cannot be decisively classified (58% accuracy) as belonging to either Group I or Group II. Thus, the majority rule voting classifier was proposed. Each detected spike belonging to a patient was classified using CNN. Each classification result of 0.5 or below was assigned as vote for assigning a patient to the Group I, and each result above 0.5 was a vote assigning patient to the Group II. Figure 5 demonstrates the voting results against the real diagnosis of the patient. This did lead to a significant improvement in the average classification accuracy of 80%, which was a 7% increase over previous studies, or 82% (9% increase) if patients having less than 100 spikes are excluded from analysis as in previous studies [12].
A high accuracy value does not necessarily represent high quality of classification. Therefore, additional investigation is needed to accurately evaluate the quality of the CNN majority rule classifier. This is crucial since our dataset is unbalanced: patients belonging to Group II are much rarer when compared to patients from Group I, resulting in an unbalanced dataset (see Section 2). Figure 5 shows that the majority rule classifier is highly likely to classify both Group I and Group II EEGs correctly (81% and 79%, respectively). More metrics are presented in Tables 4 and 5.  The proposed methodology had a further advantage over the ANN based classifier proposed in previous studies: a fixed amount of spikes in each EEG was no longer required in order to classify an EEG by diagnosis since each EEG spike was classified separately by CNN and the final classification result was based on majority rule of all EEG spikes classified. However, a higher number of EEG spikes was still preferred since rejecting EEGs with less than 100 spikes produced an average accuracy of 82%.
Another important difference was that the approach to spike data collection was changed in this study as compared to our previous work [12]. In our previous study, upslope and downslope of spikes were extracted [12], while in this study, a preprocessed EEG snippet containing signals from all 21 EEG channels (from the vicinity of an EEG spike) was classified. This change did lead to degraded performance of the ANN based classifier [13] with accuracy ranging at about 59%, which was 14% worse than the original algorithm [12]. The CNN based classifier with majority rule prevails over both of these results using the new approach with 80% average accuracy and both high TPR and TNR (see Fig. 5 and Table 5). It should be pointed out that the approach presented in the current study takes into consideration nonlinear features of EEG spikes, which were omitted (through linearisation by upslope and downslope fitting) in [12]. This result was achieved due to the fact that many classification errors of the CNN classifier are spike specific, but not EEG specific. Figure 5 is the majority rule average vote result histogram. It demonstrates that almost all EEGs had spikes classified incorrectly, however, 80% of EEGs on average had the majority of spikes detected correctly leading to correct classification by the majority rule classifier.

Conclusions
In this study, we presented an improved (compared to [12]) algorithm for EEG classification by diagnosis. During this work, classification accuracy of 80% (or 82% if EEGs with less than 100 spikes are not analysed as in previous studies) between EEGs of Group I and Group II patients was achieved, compared to 75% achieved by [12] with this dataset. The proposed methodology has a further advantage when compared to our previous research that it can classify EEGs by diagnosis with any number of spikes without the CNN based classifier needing to be retrained. That combined with high accuracy achieved by proposed methodology leads to conclusion that the new methodology prevails over previous ones [11,12]. Leave one patient out validation results show that classification quality should remain equal with bigger datasets.
The improvement of accuracy was achieved when all of following parts of the algorithm [12] were reworked in the study: • A CNN classifier was used instead of ANN [12] or other classifiers [11]; • The majority rule classifier was employed; • EEG spike parameter evaluation (e.g., upslope and downslope [12]) was omitted, and EEG spike data was employed instead; • Data from all 21 channels of a standard 10-20 system EEG were considered in the classification process, instead of just the channel where the spike was detected [11][12][13].
These results confirm the initial hypothesis that EEG spikes contain information, which allowed us to classify the epilepsy type. EEG spike presents some nonlinear formation (see Figs. 1(a) and 2). Mathematically, this formation can be simplified -approximating it by two or more piecewise linear functions, e.g., upslope and downslope, dealt with in our previous study [12]. However, the current analysis suggests that in order to maximise epilepsy type classification accuracy (up to 10% gain can be achieved), nonlinearity of EEG spikes should be taken into account.
Although EEG spike detection inaccuracies are mitigated by the CNN based majority rule classifier, one could assume that it is possible to improve EEG classification accuracy even more by revamping EEG spike detection (achieving higher sensitivity and specificity of EEG spike detection).
In this study, only cases that were difficult to visually distinguish for human neurologists cases were analysed (see Section 2). So we hypothesise that a similar methodology could be applied to classify other types of epilepsy.