An Enhanced Classification Model for Likelihood of Zero-Day Attack Detection and Estimation

DOI: http://dx.doi.org/10.24018/ejece.2021.5.4.350 Vol 5 | Issue 4 | August 2021 69 Abstract — The growing threat to sensitive information stored in computer systems and devices is becoming alarming. This is as a result of the proliferation of different malware created on a daily basis to cause zero-day attacks. Most of the malware whose signatures are known can easily be detected and blocked, however, the unknown malwares are the most dangerous. In this paper a zero-day vulnerability model based on deep-reinforcement learning is presented. The technique employs a Monte Carlo Based Pareto Rule (Deep-RL-MCB-PR) approach that exploits a reward learning and training feature with sparse feature generation and adaptive multi-layered recurrent prediction for the detection and subsequent mitigation of zero-day threats. The new model has been applied to the Kyoto benchmark datasets for intrusion detection systems, and compared to an existing system, that uses a multilayer protection and a rule-based ranking (RBK) approach to detect a zero-day attack likelihood. Experiments were performed using the dataset, and simulation results show that the Deep-RL-MCB-PR technique when measured with the classification accuracy metrics, produced about 67.77%. The dataset was further magnified, and the result of classification accuracy showed about 75.84%. These results account for a better error response when compared to the RBK technique.


I. INTRODUCTION
A zero-day vulnerability represents serious and potential threat to many organizations, particularly in our modern day society. In zero-day vulnerability a hole in software that is unknown to the vendor is left undetected for quite a period as it extends beyond the expected instantaneous or roughly daily instantaneous detection. This security hole is typically exploited by hackers before the vendor even becomes aware or hurries to fix it. Thus, this exploit is typically referred to as a zero-day attack. Since the major concern for network administrators is to prevent attacks before they occur, zeroday attacks can pass undetected through conventional defences for a long period. This further exposes the networks to harmful agents and makes the administrator's job very difficult [1].
Some adverse impact of zero-day attacks can come in the form of data theft, unauthorised control/account takeover, reputation damage, loss of production and productivity and financial loss. Hackers can exploit the vulnerability to take unauthorised control and access to a network, website, server, Submitted  program or any other system [2]. In most cases, if the attack goes public, whether there is a patch for the vulnerability or not, it can harm the organisation's brand reputation in a big way. It sends out a public message that the cybersecurity measures are not in place and that the organisation's data and systems are highly susceptible to breaches.
Most existing approaches use datasets captured from honeypot traffic to generate an attack graph. This allows the use of very large datasets of malware samples, including all the dataset features, thereby making the possible distinct set of vulnerabilities to increase exponentially. The analysis of malware samples with its full dataset including all the dataset features poses a problem known as "curse of dimensionality" when analysing and organising data in high dimensional space [3], [4]. This means that as the size of the dataset increases, the possible set of expected outcome increases exponentially. Some other limitations present within the existing models include the use a single-pass detection phase which will not unravel the hidden patterns and exposed the network to zero-day vulnerability distribution (this is probably due to the nature of the data feed). Also, malware is only analysed to discover if they can cause a zero-day exploit, but without prioritizing the level of exploit for which they are capable of causing. This is what this paper seeks to do.
A typical enterprise use firewalls, antivirus and intrusion detection systems to secure its IT infrastructure which offer good first-level protection, but despite their best efforts, they are unable to protect enterprise infrastructure against zeroday attacks and are not very effective in keeping up with the various types of malwares which is rapidly increasing. In this paper, a technique based on deep reinforcement learning and sparse feature generation is proposed. The system is applied to the modelling of exploit using readily available network intrusion dataset with the aim of accurately determining the possibility that a new feature instance of network access in the dataset is zero-day vulnerable. In addition to identifying unknown vulnerabilities, the unknown malwares files will be ranked based on the priority for which they can cause harm, so as to easily trace and contain the threat.

II. RELATED WORKS
Different literatures have been considered which have previously studied how malware are analysed as a means of discovering a zero-day attack. Such works are: [5] proposed a supervised learning approach by employing several data mining techniques to detect and classify zero-day malware based on the frequency of Windows API calls. Eight classifiers were used to develop a machine learning framework. These include Naïve Bayes (NB), k-Nearest Neigbour (KNN), and Sequential Minimal Optimization (SMO) algorithm with 4 different kernels (SMO-Normalized Polykernel, SMO-Polykernel, SMO-Puk, and SMO-Radial Basis Function (RBF)), Backpropagation Neural Networks, and J48 decision tree algorithms. The system proves to be better, compared to similar signature-free techniques that detect polymorphic malware and unknown malware based on analysis of Windows APIs. [6] designed Honey farm, a hybrid scheme that combines anomaly and signature detection with honeypots. The advantages provided by existing detection techniques is applied to develop an effective defence against Internet worms. The system works on three levels. At first level, signature-based detection is used to filter known worm attacks. At second level, an anomaly detector is set up to detect any deviation from the normal behaviour. In the last level, honeypots are deployed to detect zero-day attacks. Low interaction honeypots track attacker activities while high interaction honeypots analyse new attacks and vulnerabilities.
[7] also proposed a hybrid technique, Suspicious Traffic Filter (STF), for detecting zero-day polymorphic worms. The two techniques combined were signature-based and anomalybased, which fell into the category of behaviour-based. Their technique first tries to detect zero-day polymorphic worms and then tries to quarantine them. "STF observes all network traffic at an edge network and the internet. The traffic is passed simultaneously to both Honeynet and IDS/IPS (Intrusion Detection System/Intrusion Prevention System) sensors through a port mirroring switch.
[8] proposed a hybrid real-time zero-day attack detection and analysis system which combines anomaly-based detection, behaviour-based detection, and signature-based detection techniques to analyse zero-day attacks in real-time. It is a layered architecture where all layers work together in parallel to provide solution to the system. The system is implemented and evaluated against various standard metrics like True Positive Rate (TPR), False Positive Rate (FPR), F-Measure, Total Accuracy (ACC) and Receiver Operating Characteristic (ROC) curve. The overall results were very promising, a detection rate of nearly 98% with 0.02 false positive rate and in the worst case, detection rate was 89% with 0.03 false positive rate.
[6] in their paper "Zero-day attacks defence technique for protecting against unknown vulnerabilities" proposed ZDAR (Zero Day Attack Remedy). Their approach, which senses the organization's network and monitors the behavioural activity of zero-day exploit at each and every stage of their life cycle can detect zero-day attacks using feature extraction and transformation by sensing suspicious network connections which do not match known attack signatures at run-time. The feature transformation module discovers the suspicious connections which differentiate between the behaviour of known attacks and anomalous activities. The anomaly detection technique is used to discover anomalies and thus to identify types of zero-day attack using an assigned anomaly score.
[9] in their work "Malware Detection Using Machine Learning and Deep Learning", used supervised and unsupervised learning for malware classification. The purpose of their work was to detect and classify various malware that can cause zero-day attacks using different machine learning algorithms and deep learning models. Classification models were built using two approaches namely, Random forests and Deep learning. The results from their experiments show that Random forests outperformed the deep learning models and achieved the highest accuracy of 99.7%. Random forest also produced the second highest accuracy with no feature reduction. Between different deep leaning models, DNN-3L and DNN-7L both combined with AE-1L attained an accuracy of 98.99%. [10] proposed an approach for file signature generation for differentiating between malware of known and unknown families. In their work, "DeepOrigin: End-to-End Deep Learning for Detection of New Malware Families", they introduced a method for visualizing the signatures in a low dimensional space for improved malware analysis. Using an extensive dataset that consists of thousands of variants of malicious files, their approach achieved 97.7% accuracy when classifying between known and unknown malware families.
[11] in their work "Transfer Learning for Image-Based Malware Classification" used image analysis to detect and classify malware. To do this, they converted executable files to images and apply image recognition using deep learning models. To train these deep learning models, they employ transfer leaning, relying on models that have been pre-trained on large image datasets. The performance of this technique was compared with k-Nearest neighbours machine learning technique which indicated that Deep Learning models are better able to generalise data as they outperformed the k-NN in simulated zero-day experiments. [12] designed a scalable approach towards discovery of unknown vulnerabilities. They designed a hybrid architecture framework for zero-day attack detection based on a Rule-Based Ranking (RBK). Their proposed solution consists of three phases namely, zero-day attack path analyser, Risk analyser and physical layer.
The different literatures reviewed all presented malware as a threat involved in a zero-day attack without indicating which malware samples provides a higher or lower level of risk. It is however important to identify the malware with higher risks so as to be able to focus more attention on combating it. Our approach therefore uses a Deep Reinforcement learning technique to assess the risk level and also prioritize the zero-day binaries with respect to the likelihood of an attack.

III. MATERIALS AND METHODS
The model presented in this paper is similar to the work of [13] but with some key modifications. The existing system is primarily based on the Rule based Ranking (RBK) which follows from simple decision trees or logic. In the existing technique, a decision rule is used to filter out unknown malicious packet and update the database. In a data mining paradigm, this resorts to using a trained model to locate a malicious class based on novel test data. This is, of course achieved with some initial training datasets and predefined malicious rule checks.
The concept of attack rank is derived from the Page Rank principle, an idea earlier proposed in [14]. In this paper, this concept is adapted to dataset processing using the diagram as shown in Fig. 1. In the concept model, the attack paths are captured as anomalies or vulnerabilities while nodes are the corresponding exploited Protocol feature units. By default, all nodes (Attacker and System Host Protocol nodes) are assigned probabilities equal to 1/N where N is total number of nodes. Then a particular node-Host-Attacker probability can be identified by taking the ratio of the total number of all interconnections of Attacker-Nodes to respective Host-nodes to the total number of possible instances of these interconnections. For example, to determine the probability of Node Attacker-TCP (Node-A: TCP), we need to compute the ratio of Node1-4 and Node 3-4 interconnections and add it up as follows: PR (Node-A: TCP) = 5/7 → (no. of links in NodeA to Node TCP)/ total no. of links scanned which is a high chance of vulnerable node. This model comes after the input states have been filtered and the User Nodes (based on their behavioural responses) are identified as anomalies.
The dataset used in the experimentation was obtained from the Kyoto University benchmark data for real-time Intrusion Detection Systems (IDS) and is available at [15]. It contained 1000 anomalous instances and was adapted for anomaly detection by separating a sample of known (existing) and unknown (new) exploits from the population of labelled exploits. A sample of the anomalous dataset is shown in Fig.  2.  Fig. 3 shows the architecture of the system. It is composed of three layers namely: Zero-day Attack Path Generator which emulates acts of intrusion and its consequent sparse detection using snort rule and PB-MCFR -it creates the zeroday attack paths (new exploit indexes) scenario for subsequent analysis by Risk Analyzer. The Risk Analyzer Layer creates a host-centric graph of the zero-day attack paths generated in the previous layer. Based on the in-coming and out-going exploit indexes (host-centric attack paths) an exploit-centric graph is generated and passed over to a modified attack rank algorithm via a Deep-RL block. The function of Deep-RL is to reinforce the scores of the paths whose exploits re-occur and then predict the score in the next time step. The modified attack rank algorithm uses an ordered (ascending-order) quick-sort algorithm to rank the scores computed by the Deep-RL stage and this is followed with a search for the matching indexes. The matching indexes are then preserved in a cellular matrix from which the actual matching exploit features may be decoded. Dataset Layer simply serves as a container from which the intrusion (zeroday exploits) features are obtained. Fig. 3. System Architecture. The key components of the new system architecture are the inclusion of the Pareto-Based Monte Carlo Filtering Rule (PB-MCFR) and the Deep Reinforcement Learning which are modifications done within the zero-day attack path generator layer and Risk Analyser Layer respectively. The PB-MCFR is used to reduce the dimensionality of the dataset for accurate analysis; while the Deep-RL is used to predict the index of the new exploit feature and reinforce the corresponding detected signal based on the frequency of occurrence.

EJECE, European Journal of Electrical Engineering and Computer
The key processes in the model include, highly sparse filtering, which is done optimally to extract unknown anomalous paths in the network by using a Pareto-Based Monte Carlo Filtering Rule (PB-MCFR). This allows only 20% of the network traffic data to be randomly analysed at a time; and for a finite number of trial runs (called the Monte Carlo observation space), a filtering of unknown anomalous paths is also performed. Also, a ranking is automatically determined via deep reinforcement predictive learning with numeric encoding. If an attack is known and is detected, its corresponding numeric encoding label is incremented. For a set of such instances, a deep reinforcement learning with multiple layers of representations is then adaptively used to estimate in advance which attack path is most likely to be featured in the network graph; then sorting is performed to build an ordered representation of the mined graph. The Deep reinforcement predictive learning for zero-day detection is shown in algorithm 1.  Table I shows the specification indexed representation for the simulation input data. For the task development, seventeen (17) instances of the input data were obtained from the dataset collected, and the expected anomaly state for each instance were recorded. The following indices ids 6,8,9,10,11, and 12 were seen to contain instances whose anomaly state is not known. These unknown anomalies are responsible for the zero-day attacks. The remaining instances contain known anomalous data, which is expected to be filtered by the snort filtering.

IV. EXPERIMENTS AND RESULTS
In this experiment, the conditioning threshold is varied in steps of 10 from a value of 50 to 100, with the number of simulations runs kept at 20. The initial threshold value of 50 was chosen after some prior experiments, which found 50 to be a more realistic threshold setting for the zero-day analysis routine. The variation in the threshold value is an incremental approach that will be used to identify when the threshold value may no longer have any impact on the prediction performance of the unknown anomaly. For each case scenario, the detected new exploit reinforced indices and their corresponding anomaly correctness score (1 for correct or 0 for incorrect) are as shown in Table II.
As shown in Table II, when the threshold value stands at 50 units, the model can predict accurate estimates of the anomalous pattern instance. That is, the malware instances that are capable of causing an attack can accurately to be identified. Also, as the threshold value is increased beyond 70 units, it becomes clear from the table that there is no significant change in the prediction pattern; thus, the range for expected performance is deemed to be narrow because no substantial changes are noticeable. Fig. 4 shows the simulation interface of the zero-day detection application, which displays the comparison of the simulation procedure and output for both existing and the new models. This is done by launching the application programme developed. It is then followed by uploading the training and testing datasets using the buttons shown in the application. The Deep-RL application was initiated with the "DEEP-RL ZDAY" button which displays the output on its accompanying result window (RESULT SUMMARY). The existing model which is Attack-Rank algorithm is executed by clicking the "STANDARD Z-DAY" button. The outputs are also displayed on its accompanying "RESULTS SUMMARY" window. This procedure (1-3) is repeated for different 20 trial runs taking particular note of the reported metric, i.e., classification accuracy.
In the experiment, the Deep-RL model is compared to the existing system based on the influence of the Monte Carlo Based Pareto Rule (MCB-PR) for the base dataset and a magnified version of the dataset. To build the magnified dataset, the original dataset was duplicated 5 times.
The interfaces shown in Fig. 5-7 are the screenshots showing the results of launching the application program, which shows the comparison interfaces of both the existing and new models. Fig. 5 displays the results of classification accuracy and runtime values of both the proposed and existing models using the base datasets for the first iteration. Fig. 6 and 7 shows the screenshots of the same values (CA and Runtime) of both models for second and third iterations respectively. The values indicated in the screenshots are extracted into a table and they correspond to the values represented in Table III.   TABLE I: INDEXED REPRESENTATION AND EXPECTED ANOMALY STATE   TABLE II: INDEXED REPRESENTATION FOR THRESHOLDS AND EXPECTED ANOMALY Table III Index id  1  2  3  4  5  6  7  8-12  13-17  Expected  Anomaly State  Known  known  Known  known  known  Unknown  known  Unknown  known Other values shown are as the experiments were conducted.    Table  III. Also, the results based on the classification (CA) and runtime values of both the proposed and existing systems for magnified datasets are shown in Table IV. The results represented in Table III show that the mean classification accuracy for the proposed DEEP-RL-MCB-PR model, as can be traced from Fig. 5-7 for the first three simulation runs. Using the base dataset, the DEEP-RL-MCB-PR performed better by producing a mean classification accuracy of 67.77% surpassing that of the existing model which produced 53.10%. For the magnified dataset, the mean classification accuracy for the new model as represented in table IV is 75.84% while the existing model produced 53.10%. This implies that the new model performed better to accurately predict a malware-based zero-day attack. These results make it clear that the prediction performance of the new model is about 3 times better than the existing model while the run-time of existing model is roughly 2 times better. Fig. 7 shows the graph of classification accuracy for both the new and existing models. It clearly shows that the Deep-RL model performs better as all the bars have higher peaks compared to the existing model. This can be concluded that using a Deep-learning approach for zero-day detection and prioritization is better compared to using a rule-based approach.

VI. CONCLUSION
Zero-day vulnerability refers to the 'hole' in the software or network system that is not yet known to the user, or the manufacturers have not yet released a patch for it. Malware play very import ant roles in leveraging these vulnerabilities to cause zero-day attacks; and as such they cannot be underestimated. In this paper, we have analysed the operations of malware as regards to how they target hardware and software systems in order to cause an attack. From the results obtained, it has been shown that the Deep-RL zeroday exploit technique based on the LSTM is dependent on the threshold setting and the depth of the LSTM learning network. The experimental results show that a correct prediction pattern sequence can be obtained when the threshold setting is about 50. Conversely, as the threshold value grows beyond 70, the result of the prediction tends to narrow because no significant changes were noticed. The depth of the learning network does not necessarily improve the log-likelihood error response rather too much depth may degrade the performance.
From experiments performed, it was observed that the prediction performance improves as the number of monte carlo trial runs is increased but there is a limit to desired pattern change.