Electronic devices with networking capabilities make up the critical network infrastructure for governments, businesses, academic agencies and human social interaction on a daily basis. The rapid advancement of technology has seen it increasingly affect every aspect of daily life from simple tasks like sending an encrypted e-mail in work and operating the smart heating system at home, to satellite communications controlling overseas military operations (Stanfield, 2016). These developments have transformed the face and pace of business by increasing the speed of accessing personal information, buying and receiving goods and communicating globally with ease (McGrath, 2008). However, these digital assets are now prone to attack, and with the advantages comes added pressure to maintain adequate security levels.
The past two decades have seen a rapid growth in the number of digital devices and a subsequent rise in human-digital dependency. This has shaped a digitalised society with an increasing demand for network ready devices, as seen with in the arrival of the internet of things (IoT). Individuals and businesses are progressively storing and sending vast amounts of valuable and sensitive data, trusting that it will remain safe from cyber threats. This increase in value has shifted the incentive behind cyber-attacks from the pursuit of public notoriety, from such hacktivist groups as Anonymous, to the pursuit of financial profit or political gains (Wandhofer, 2015). This has resulted in diverse network attacks being committed across multiple platforms and on various scales.
Progressively, global business value and personal information are rapidly migrating into digital form across open and worldwide interconnected technology platforms. As this happens, the number and complexity of cyber-attacks increase and subsequently so does the global risk they pose to businesses utilising digital. This argument is supported by the UK’s Minister for Digital and Culture Matt Hancock, who during his speech at the Institute of Directors conference (2017) stated that:
“It’s absolutely crucial UK industry is protected against this (cyber) threat – because our economy is a digital economy… We know the scale of the threat is significant: one in three small firms and 65% of large businesses are known to have, experienced a cyber-breach or attack in the past year. Of those large firms breached, a quarter were known to have been attacked at least once per month… My message today is clear: if you’re not concentrating on cyber, you are courting chaos and catering to criminals.”(Hancock, 2017)
This statement supports the demand for increased awareness and action towards business cybersecurity on which this dissertation is based.
During his speech, Hancock continued to warn businesses of the potential destruction that cyber-attacks could inflict, highlighting the apparent gap between awareness and action. Government research indicates that only half of businesses currently take action to address cyber risks (Hancock, 2017). Hancock introduces the UK National Cyber Security Strategy (2016-2021), which emphasizes the need for increased levels of security to critical national infrastructures such as train networks, national health services and the national energy grid (HM Government, 2016). A warning that has become increasingly relevant to governments and businesses the World over when considering the recent WannaCry ransomware attack, which will be discussed later in this dissertation.
In a bid to ensure the safety of large businesses, such as critical national infrastructures, new plans have been proposed by the UK government confirming that organisations will soon face fines of up to £17m or 4% of their global turnover should they fail to protect themselves from cyber-attacks (BBC News, 2017). These penalty fines are significantly larger than those currently imposed by the Information Commissioner’s Office (ICO) for serious data breaches. This is a clear indication of the rise in data security concerns and begs the question: how can organisations begin to implement the necessary IT security measures to adequately protect their data at all times?
Digitally stored data is of vital importance to the World’s most powerful organisations such as high street banks and hospitals. Ensuring that the systems which contain this data remain protected should be of paramount importance. Organisations should now have the ability to better understand the ecosystem and prevent imminent attacks by implementing the necessary security measures. Quantitatively predicting attacks should be a part of an Organizations’ risk management process (Jaganathan, Cherurveettil, & Sivashanmugam, 2014).
Change is the one constant within the cyber-threat landscape that enables attackers to stay ahead. Network attacks evolve and transform faster than many security specialists can anticipate, making them a constant threat. Highly skilled attackers are continuously developing new methods to exploit systems and expose software vulnerabilities. These vulnerabilities are subsequently bought and sold on the dark web by underground groups for profit. Corresponding attack tools will take advantage of these novel vulnerabilities, aiming to compromise unsuspecting hosts for an attacker to ultimately control. Certain malware attacks will reach unsuspecting hosts via spam emails containing malicious attachments or links. Upon clicking the attachment or link, the user will unknowingly cause the malware to install and the host to be compromised, thus creating a zombie machine or ‘bot’. The attacker will use a large group of zombie machines, collectively known as a botnet, to perform simultaneous cyber-attacks e.g. Distributed Denial-of-Service (DDoS), distributed stealthy scans or mass sending of spam emails (Zhou, Leckie, & Karunasekera, 2010).
Botnets have various uses but are most commonly exploited by cyber criminals for sending vast amounts of spam emails across large networks. According to BotLab research, botnet generated spam accounts for 85% of the 100+ billion spam emails sent each day (John, Moshchuck, Gribble, & Krishnamurthy, 2009). Not only is spam a nuisance for users, it also causes extensive damage when used to propagate phishing campaigns. Such campaigns are used for identity theft and to further spread viruses so as to compromise more hosts, increasing the size and power of the botnet (John, Moshchuck, Gribble, & Krishnamurthy, 2009).
As shown in Figure 1, published in the Citi bank cyber threats strategy report, shows the trend of cyber-attacks over time from 1980 to 2014 (Wandhofer, 2015). The chart indicates that as attack sophistication progressively increases as time goes by, the amount of knowledge required to launch the attacks progressively decreases. Therefore, cyber threats are becoming more sophisticated, accessible and threatening each day (Wandhofer, 2015). The decrease in intruder knowledge indicates that attacks are most likely to be generated from compromised botnet zombie hosts using automated attack tools.
Figure 1.1 – Attack sophistication vs. Intruder technical knowledge
According to Symantec (Symantec Corporation, 2014), botnets are also used as a core tool for administering compromised hosts that are then rented to third parties for malicious purposes (Symantec Corporation, 2014). Additionally, Symantec discovered that attacks are becoming more specific and targeted for acquiring particular sensitive data, demonstrating increased levels of sophistication. This is in addition to random virus attacks or worm propagation (Symantec Corporation, 2014).
To better understand the impending future developments of cyber-attacks and the risks they pose to the business sector, the next part of the dissertation will analyse some key milestone attacks throughout the years.
One of the first generation network attacks dates back to 1988 when Robert Tapan Morris released the Morris worm. This was a self-replicating computer program that was originally written with the good intention of gauging the size of the Internet predecessor of the time, the ARPANET (Radware, 2017). However, once it was released the Morris worm caused denial-of-service (DOS) to 6’000 of the 60’000 ARPANET connected machines. The worm spread by exploiting known vulnerabilities in the UNIX sendmail, finger, rsh/rexec and by successfully guessing weak passwords (Radware, 2017).
Although unintentional, the Morris worm managed to cause extensive amounts of damage by systematically running checks to determine whether a target machine had already been infected. If the target had already been infected, the worm would re-infect it 1 in 7 times. This recurrence of infection ensured that a user could not avoid a Morris worm infection through creating a fake Morris worm process i.e. pretend their machine was already infected. Subsequently, the Morris worm infected target machines multiple times, causing more processes to run than the machine could handle. Once too many processes were running it would drain all computing resources and the machine would begin to malfunction (Radware, 2017). It was this spreading mechanism and multiple re-infection process that transformed Robert Morris’ potentially harmless experiment into a powerful denial of service attack.
In 1991, The United States v. Morris court case saw the first conviction under the 1986 Computer Fraud and Abuse Act (US Congress, 1986). Robert Morris received a three-year probation sentence, 400 hours of community service and a $10,000 fine (Radware, 2017). Despite being convicted, Morris’ code contained no commands that intended to harm any computer on which it ran. The commands only intended to exploit any vulnerabilities which consequently permitted the code to replicate and spread.
Publicly demonstrating exploitable vulnerabilities, or penetration testing has become a well-recognised method of practice in the field of computer security. Upon writing a program that publicly exploited the above mentioned vulnerabilities, Morris objectively forced them to be fixed and today would be considered a white-hat hacker. However, whilst the program was not designed to cause harm, for example steal documents or information, it did cause the Internet of the time to drastically slow down and eventually cease. In today’s World this would result in the loss of business and money for those who rely on the Internet to operate, such as e-commerce companies.
Stuxnet is a 500-kilobyte computer worm that was discovered in June 2010 after it was used to sabotage Iran’s uranium enrichment program (Landesman, 2016). Stuxnet had also spread across 14 other industrial sites in Iran before its detection. The way in which Stuxnet originally infiltrated Iran’s nuclear facilities remains unconfirmed, however theorists suggest that it was either introduced to the target environment on an infected USB drive, or installed onto intercepted industrial control system equipment that was bound for Iran (Clayton, 2014). Stuxnet worm targets the industrial control systems most commonly found in critical infrastructure facilities such as power plants, gas lines and water treatment facilities (Landesman, 2016).
The role of Stuxnet is to programmatically alter the Programmable Logic Controllers (PLCs) that are found within the industrial control systems of these facilities. The PLCs control and automate industrial type tasks such as maintaining temperature control and regulating flow rates to maintain pressure (Landesman, 2016). To ensure security, many of the hardware devices found within industrial controls systems are not internet or network connected. To override this, Stuxnet incorporates a number of sophisticated methods of propagation (Landesman, 2016). The end goal is for the worm to reach and infect STEP 7 project files that are used to program PLC devices.
Whilst the programmable logic controller is a machine-language device that is not windows based, the Stuxnet worm initially targets computers that are running Windows operating systems (Landesman, 2016). Stuxnet secretly navigates through windows computers with the objective of gaining access to the systems that manage the programmable logic controllers, upon which it renders its payload (Landesman, 2016).
In order to reprogram the PLC, Stuxnet first locates and infects STEP 7 project files which are used by Siemens SIMATIC WinCC. This is a supervisory control and data acquisition and human-machine interface that is used to program the programmable logic controllers (Landesman, 2016). The Stuxnet worm completes various routine enquiry checks in order to identify the correct PLC model, as machine level instructions vary across different PLC devices.
Once that target PLC has been identified and infected, Stuxnet takes control to intercept and spy on all data flowing through the programmable logic controller. It then uses the gathered information to take control of the centrifuges, causing them to spin into failure. All whilst providing false feedback to outside controllers, ensuring that they won’t discover any malfunction until it’s too late (Kushner, 2013).
Whilst no one has taken responsibility for Stuxnet, it is believed to have been created incognito by the United States and Israel to attack Iran’s nuclear facilities. This statement was made by the former contractor for the CIA, turned whistle-blower and international human rights defender, Edward Snowden. Who during his interview confirmed that the National Security Agency and Israel had co-written the Stuxnet worm (Snowden, 2013).
While Iran has never released details regarding the exact damage caused by Stuxnet, it is estimated that the worm destroyed 984 uranium enriching centrifuges. By estimations made by Stanford University research, this constituted a 30% decrease in enrichment efficiency (Holloway, 2015)
The Stuxnet code remains an open source download available for white-hat and black-hat hackers to access, analyse and edit, which makes it one of the most dangerous and readily available viruses.
A more recent example of large scale data breaches was that which targeted global e-mail provider Yahoo. In December 2016, security analysts were investigating a separate Yahoo cyber-attack from 2014 in which over 500 million Yahoo accounts were compromised. It was then that they uncovered a far larger attack that had gone unnoticed during 2013. In a statement provided by Yahoo, they claimed that:
“…an un-authorized third party, in August 2013, stole data associated with more than one billion user accounts.”(Yahoo, 2016)
The accused third parties have since been identified as 22-year-old Canadian Karim Baratov, along with three Russian nationals Dmitry Aleksandrovich Dokuchaev, 33, Igor Anatolyevich Sushchin, 43, and Alexsey Alexseyevich Belan, 29 (Mining Awareness, 2017).
Baratov et al was able to access and steal user account information such as names, email addresses, telephone numbers, dates of birth, passwords and encrypted or unencrypted security questions and answers (Yahoo, 2016). It has since been revealed that the hackers accessed the user accounts by launching a sophisticated cookie forging attack, where the user’s password is not needed to gain access to their accounts (Yahoo! Inc, 2017). Yahoo has stated that the cookie forgery was made possible due to a security flaw within Yahoo’s mail service (Yahoo! Inc, 2017).
Cookie forgery involves an attacker faking the authentication of a legitimate user in order to unlawfully gain access to the user’s account. When a cookie is created it stores a small amount of data from a website that has been visited by the user. In future, the cookie will remind the website that the user has previously visited and that their session identifier has accessed it. Thus saving the user time which they usually spend re-entering usernames and passwords (Khan, 2017).
The session identifier is a unique, temporary, server controlled code about the session duration of the user i.e. when the user started and ended their session. To forge the cookie and gain illegitimate access to accounts, an attacker must initially hack into the email system and decipher the cookie generation process (Khan, 2017). This can be achieved by utilising one of the following methods:
- Exploiting a flaw in the cookie formation and cryptography application (Khan, 2017).
- Exploiting a flaw in the comparison logic of the server (Khan, 2017).
By exploiting a flaw in the cookie formation and cryptography application an attacker is able to guess or create a valid cookie that impersonates a legitimate user (Khan, 2017). By exploiting a flaw in the comparison logic of the server an attacker can manipulate the server’s comparison logic into accepting invalid cookies as valid ones (Khan, 2017).
It is through Yahoo’s security flaw discovered within the cookie information and cryptography application that attackers were able to decipher and exploit their cookie generation process (Khan, 2017). In reference to this, Yahoo released the following statement in their 2016 annual report, which was filed with the US Securities and Exchange Commission (SEC):
“Based on the investigation, we believe an unauthorized third party accessed the company’s proprietary code to learn how to forge certain cookies…”
“…The outside forensic experts have identified approximately 32 million user accounts for which they believe forged cookies were used or taken in 2015 and 2016. We believe that some of this activity is connected to the same state-sponsored actor believed to be responsible for the 2014 security incident.”(Yahoo! Inc, 2017)
Since this incident has come to light, it has affected Yahoo’s deal with the global communication company Verizon (Verizon, 2017). Verizon is currently in the process of buying Yahoo for $4.8 billion, but since receiving the news regarding the recent hacks they have stated the following:
“We will review the impact of this new development before reaching any final conclusions.” (Yahoo, 2016)
The 2014 attack, which was disclosed in September of 2016, has already threatened to derail the deal or result in a sale price reduction of $250 million (Moritz, Sherman, & Womack, 2017). Not only has Yahoo faced a substantial financial loss here but their reputation has also been damaged by this colossal data breach.
The frequent attacks on Yahoo systems have created concerns from regulators and prompted lawsuits (Moritz, Sherman, & Womack, 2017). If the appropriate cyber-attack prevention methods had been implemented as part of a strong Yahoo cybersecurity policy, these vulnerabilities could have been anticipated and avoided.
The WannaCry ransomware attack infamously affected the entire globe during May 2017. The WannaCry ransomware crypto worm spread rapidly and aggressively, and targeted computers running Microsoft Windows operating systems. The worm encrypted files and displayed an on-screen ransom note demanding a ransom payment of $300 to be paid in Bitcoin cryptocurrency in return for the file’s decryption (Symantec Security Response, 2017).
After its initiation on the 12th May, WannaCry had reportedly infected over 230’000 computers in over 150 countries within 24 hours (BBC News, 2017). Symantec described WannaCry as being far more dangerous than any other common ransomware types of the time (Symantec Corporation, 2017). This is due to its ability to spread itself across organisation networks by exploiting critical vulnerabilities found in Windows computers. Such vulnerabilities were patched by Microsoft in March 2017 (Microsoft, 2017), however a flaw remained in the Windows Server Message Block (SMB) service which is used by windows computers to share files and printers across local networks (Brenner, 2017) . Eternal Blue is the name of the exploit that was released as part of a series of leaks in April 2017. Professional hacker group Shadow Brokers are believed to be behind the exploit leak and claim to have stolen the data from the Equation cyber espionage group (Symantec Security Response, 2017).
WannaCry’s initial source of infection within an organisation network remains unconfirmed as the source code did not display any of the common traits associated with common ransomware attacks, such as a outlook.exe files associated with phishing emails containing a malicious attachment or hyperlinks (Brenner, 2017). Instead, all that was discovered was a compromised Windows SMB driver. This suggests that the WannaCry ransomware was remotely executed as the first stage of the following three-pronged assault strategy (Brenner, 2017):
- Execution of malicious code from remote location
- Gained advanced user privileges
- Lateral movement
- Unpacking files
- Environment preparation
- Payload execution
- Encryption of documents
- Deletion of known local backup files
- Ransom note GUI display
The WannaCry worm spread by generating random IP addresses and once they were defined, sent malicious Server Message Block packets to the remote host (Brenner, 2017).
Whilst it is still unclear how many organisations in total were affected with the WannaCry ransomware, it is clear that it hugely affected many National Health Service hospitals in the UK, resulting in delayed operations and the turning away of patients from their local GP surgeries (Martin, Kinross, & Hankin, 2017). This highlights the poor state of the cybersecurity infrastructure within the NHS, and its failure to acknowledge this type of security as a fundamental matter of patient safety (Martin, Kinross, & Hankin, 2017).
This attack has demonstrated the extremely fragmented governance of cybersecurity in the NHS where effective cybersecurity requires good governance. In order to build resilience against cyber threats, the NHS must reduce vulnerabilities and implement efficient cybersecurity polices to reduce the risk of future security breaches.
A Distributed Denial of Service (DDoS) attack attempts to make a website or online service unavailable by overwhelming it with traffic from multiple sources (Whittaker, 2016). The attacker will tend to target a variety of important resources, such as high street banks and create a major challenge by ensuring that important information cannot be published or accessed. It is not uncommon for attackers to issue a DDoS ransom notice, requesting the target to pay a fee to prevent an attack from taking place or stop an attack as it is happening.
DDoS attacks can be broadly divided into two types; application layer attacks and network layer attacks (Incapsula, 2017).
- Application layer DDoS attacks include HTTP floods, slow attacks, zero day attacks, and those targeting vulnerabilities in operating systems, web applications and communication protocols (Incapsula, 2017). Made up of apparently legitimate requests, their scale is usually measured in requests per second (RPS). The goal of the attack is to overwhelm a target application with more requests than the application can handle. This creates high processing and memory usage and causes the application to crash (Incapsula, 2017).
- Network layer DDoS attacks include UDP floods, SYN floods, DNS amplification and IP fragmentation. These attacks are typically performed by botnets which create high-capacity traffic bombardments that are measured in gigabits per second (Gbps). The goal of these attacks is to consume the target’s upstream bandwidth, subsequently saturating the network (Incapsula, 2017).
A recent example of a prolific DDoS attack was that on the BBC in January 2016. The attack was conducted by The New World Hackers, a self-proclaimed hacktivist group who described it as ‘a test of power’ (Whittaker, 2016). The attack targeted the BBC’s entire domain, including its on-demand television and radio i-player services. The BBC domain was down for more than three hours and continued to have residual issues for the remainder of the day (Whittaker, 2016).
The leader of the group responsible claimed that the attack reached 602 Gbps in size and that the traffic came from an Amazon cloud service. This makes this DDoS attack the largest to date and extremely rare due to the level of skill required to sustain it.
The Internet of Things (IoT) is internet-enabled embedded computing devices such as smart TVs, CCTV cameras and smart home heating systems, which nowadays are ubiquitous (Ahmed & Hyoungshick, 2017). With IoT comes many advantages for society, such as improved machine-to-machine communication between physical devices and improvements in the speed and quality of global communications (Sannapureddy, 2015). IoT provides monitoring for many different purposes, such as measuring how much electricity a household is using on a daily basis via smart home meters. The devices encourage optimum utilisation of energy and resources (Sannapureddy, 2015). With the use of IoT, this information is more readily available, which saves time and money as decisions can be made in real-time.
The sudden shifts between desktop PC, mobile and IoT has brought increased security risks as securing IoT systems has proven to be challenging. This is due to the multiple points of vulnerability that exist which have been highlighted by recent hacks and security breaches (Ahmed & Hyoungshick, 2017). The most recent incident was the network layer DDoS attack that was carried out in October 2016, targeting the Dyn server (Dyn, Inc, 2016). This attack caused major websites like Twitter, Spotify, Reddit, Netflix and Amazon.com to become unreachable. The DDoS attack was carried out by a botnet that was made up completely of IoT devices. An attack of this type was made possible by the exponential growth of IoT devices around the globe, which resulted in unprecedented traffic volumes that were impossible for the Dyn server to manage (Ahmed & Hyoungshick, 2017).
There was a number of contributing factors to a successful DDoS attack of this magnitude. As the number of IoT devices continues to grow at an astounding rate, the management of IoT networks continues to be poorly maintained and there is a distinct lack of stringent security measures. This is alarming considering growth predictions from technology research company Gartner Inc., who expects there to be over 26 billion connected devices by 2020 (Gartner, 2015). Additionally, the tremendous volumes of traffic observed on enterprise networks, along with the memory and complexity constraints associated with network devices, make it extremely difficult for Anomaly Detection Systems (ADS) to examine every packet thoroughly (Ahmed & Hyoungshick, 2017). Consequently, packet flow sampling approaches are implemented to reduce the amount of data to be analysed. Whilst this helps to decrease the near-impossible workload, it also results in the bypassing of malicious traffic from compromised IoT devices towards the target servers or hosts (Ahmed & Hyoungshick, 2017). Furthermore, the discovery of the Mirae malware has presented a major security threat. This malware continuously scans the internet for IoT devices that are protected by factory default setting usernames and passwords, a list of which is available in the Mirai source code (Krebs, 2016).
The Mirai botnet was found to be behind a wave of major DDoS attacks during 2016 and consisted of various IoT devices such as infected household routers and security cameras, both of which are low-powered and poorly secured (Ashford, 2017). Using botnets like Mirai, attackers can turn relatively benign devices and software into cyber-weapons that can be used to devastating effect (Ashford, 2017). According to the most recently issued Symantec security threat report (Symantec Corporation, 2017), cyber attackers reached new levels of motivation during 2016. Overwhelming attacks were carried out including multi-million-dollar virtual bank heists and overt attempts to disrupt the US electoral process by state-sponsored groups. Additionally, some of the largest distributed denial of service (DDoS) attacks to date were recorded, all of which were driven by a botnet consisting of Internet of Things (IoT) devices.
By implementing more dynamic cybersecurity methods, organisations can assess risks and update policies accordingly and in real-time. Thus saving huge amounts of time and money by ensuring that the network and data stored within it remains safe and secure at all times.
Successfully combating against colossal, sophisticated cyber-attacks requires advancement on numerous fronts including intrusion detection, attack modelling and prediction. The overall aim of this study is to analyse existing security methods employed by businesses and to propose a more dynamic approach towards cybersecurity. The results from which can be used to better security policy writing by ensuring a consistently up-to-date security document. The main objective is to uncover ways that may help organisations implement the necessary IT security measures to adequately protect their data at all times. This is completed through extensive research of existing security methods and deep analysis of literature testing these methods.
After analysis of intrusion detection techniques, attack modelling and spam filtering, the study proposes a more preventative machine learning approach towards cybersecurity. The study also presents a ‘feed-back loop’ system where the big data processed by the machine learning prediction algorithms can be used by businesses for a more dynamic cybersecurity strategy.
Chapter 2 examines existing cybersecurity methods that can be implemented within organisations. Such methods can be used to gain knowledge of past and present attacks with a view to predicting and preventing future attacks.
Chapter 3 is machine learning – the studies
Chapter 4 is cybersecurity
Chapter 5 is dynamic cybersecurity and proposing using the ML big data to feed-back data to analysts to aid in the security policy writing, generating a more dynamic cybersecurity environment that will enhance cyber protection.
Chapter 6 is the conclusion and recommendations for future work.
Varying network attacks can be captured and observed using Intrusion Detection Systems. The overall objective of an Intrusion Detection System is to differentiate malicious traffic activity from normal traffic activity whilst concurrently generating alerts of suspicions to network security analysts for analysis. Due to the ever-changing cyber environment, there are continuous efforts made towards the advancement of intrusion detection of anomaly-based and signature-based systems (Kumar, Chandak, & Dewanjee, 2014). Anomaly-based detection methods model behaviour that is normal to that of the user or systems and report events or observations which do not conform to this expected behaviour pattern. Alternatively, signature-based detection methods maintain a database of known malicious signatures and will detect malicious actions by utilising pattern matching techniques. Anomaly-based methods can be extremely effective in detecting new and novel attacks but typically suffer from high numbers of false positives. Alternatively, signature-based methods prove to have more accurate detection rates for known signatures but can be unsuccessful on novel attacks. Details of different intrusion detection methods can be found in many research papers, such as (Mallissery, Prabhu, & Ganiga, 2011).
Installation of Intrusion Detection Systems (IDS) is generally the first line of defence for a company’s network security. Whilst a firewall protects to keep out malicious attacks, an IDS is present to detect suspicious or nefarious activity. Intrusion detection is considered as a challenging area of study due to the variety of normal behaviour types found in an ever-changing cyber environment, as well as evolving attacks and novel vulnerabilities. Whilst intrusion detection methods continuously change and improve, analysis of the results continues to be difficult for analysts due to the number of heterogeneous alerts created. As a result, alert correlation is used to define a set of intrusion detection system (IDS) alerts, where each set corresponds to a sophisticated description of the attack.
Alert correlation tasks can be categorised into the following four stages (Sadoddin & Ghorban, 2006):
- Normalisation (Norm.)
- Aggregation (Agg.)
- Correlation (Corr.)
- Strategy Analysis (SA)
Normalisation is applied to organise the format of the IDS alerts that have been collected from the heterogonous detection sensors. Aggregation is then used to combine alerts with the same root-cause. To give an example, alerts that originate from the same source IP address or attack the same target. The correlation stage takes the aggregated alerts and maps them into an attack scenario. The scenario is used for attack strategy analysis to determine the attacker’s intention (Sadoddin & Ghorban, 2006).
An attack scenario example is given in (Valeur, Vigna, Kruegel, & Kemmerer, 2004) and Table 1 shows the 7 alerts that have been generated during the attack. The network has the following 4 heterogeneous intrusion detection sensors:
- Network sensors – N1, N2
- Host-based sensor – H
- Application-based sensor – A
As seen in Table 1, the attacker (IP: 184.108.40.206) initiates the attack by performing a port scan against 10.0.0.1 and discovers a vulnerability in the Apache server (Alert ID 2, Alert ID 3). During the scan, a separate worm attack, IP 220.127.116.11, attempts to exploit a Microsoft IIS vulnerability but fails (Alert ID 1).
Table 2.1 – Alert correlation of an Attack Scenario example
|1||IIS Exploit||N1||12.0/12.0||18.104.22.168||10.0.0.1, port 80|
|4||Apache Exploit||N1||22.0/2.0||22.214.171.124||10.0.0.1, port 80|
|5||Bad Request||A||22.1/22.1||Local host, Apache|
Once the scan is complete, the attacker successfully performs an Apache buffer overflow exploit (Alert ID 4, Alert ID 5) and gains user privilege access to the server. The attacker then uses a local exploit against the Linux system administration tool linuxconf and elevates their privileges, becoming root. This exploit is executed twice with different parameters before being successful (Alert ID 6, Alert ID 7).
Effective alert correlation would ideally group together Alert ID 2 and Alert ID 3 as malicious scanning, Alert ID 4 and Alert ID 5 as vulnerability exploit attempts and Alert ID 6 and Alert ID 7 as Linux privilege escalation. Alert ID 1 would be listed as irrelevant. Once the aggregation of alerts is complete, the malicious scanning, vulnerability attempts and privilege escalation should be correlated and presented to the security analysts for investigation. This example indicates that after alert correlation, analysts can more effectively process the high-level attack descriptions rather than individual alerts. Analysing network attacks based on description provides a more triage type approach and could help to defend against more dangerous attacks quickly by preventing time spent analysing potentially irrelevant alerts.
The methods commonly used in correlation engines consist of similarity-based clustering and casual relationship-based reasoning (Sadoddin & Ghorban, 2006). The attack scenario models can be pre-defined or automatically learnt from raw data. Uncertainties are dealt with by Bayesian networks, which will be discussed later on in this dissertation. Table 2.2 shows a selection of demonstrative alert correlation approaches.
Table 2.2 – Summary of alert correlation methodologies
|||Aggregation, Correlation||Neural Networks, Decision Trees|
|||Normalisation, Aggregation, Correlation||Examine source-target relationship|
|||Aggregation, Correlation||Pre/Post condition-based correlation|
|||Correlation||Bayesian networks based inference|
|||Correlation||Data mining for frequent patterns|
The correlation process combines the components of raw IDS alert data and creates attack scenarios, thus providing better situation awareness to security analysts. However, as previously discussed, the ever-changing cyber environment e.g. novel exploits, new IDSs and software patches make alert correlation challenging. Additionally, attack-scenario based approaches are accurate but not scalable for large amounts of varied and novel attacks, similar to the limitations found with signature-based intrusion detection. In that, an IDS system is really only as good as its signature library. Attackers continuously come up with sophisticated ways to evade IDS detection, using techniques like obfuscation, fragmentation, Denial of Service, and application hijacking (SANS Institute, 2003). Whilst it is impossible to create an IDS product that can prevent all of these attack types, organisations should incorporate adaptable security policies which can evolve like the attacks they are created to defend against. Thus creating a more dynamic policy writing approach towards cybersecurity.
Botnet detection and host clustering are useful tools for analysts to better understand the scale of larger attacks. Going beyond examining the attack action level and alert correlations, host clustering allows the grouping of large numbers of hosts that share similar behaviour whether that be malicious or legitimate.
Passive internet backbone traffic or malicious traffic is typically input for host clustering (Xu, Zhang, & Bhattacharyya, 2008). The generated outputs are host clusters that present similar behaviour patterns, which are useful visual aids for anomaly detection and attack profiling. Malicious behaviours including DDoS attacks, worms and scanning can be detected using clustering methods, examples of which can be found in (Kong, et al., 2016). The zombie hosts that make up a botnet will typically display behaviour activities such as scan activity, spam activity, binary downloads and exploit activities. Through observation of host clusters that exhibit these behaviours, analysts can better understand the structure of a botnet.
The variances found in different host clustering methods can be understood as selecting different attributes to form a multidimensional data point and objective function for clustering them (Du, 2014). Figure 2.1 shows an example of a data point taken from (Xu, Zhang, & Bhattacharyya, 2008) representing the attacking host which contains several security features on security context. Features that are typically used are source IP and destination IP statistics from flow information, and TCP/IP protocol, source port and destination port information. This example of host clustering represents one attacking host and shows relative uncertainty (RU) for the following three features;
- source port – RU(srcPort)
- destination port – RU(dstPort)
- destination IP – RU(dstIP)
These features are taken into account in order to characterise an attacker’s behaviour.
Figure 2.1 Data point for host clustering example
Once the features have been selected, the next step is to cluster the data points which will optimise objective function, where different objective functions will group data points based on different principles (Du, 2014). For example, K-means clustering is an iterative method which aims to minimise the distance between data points and cluster centre. Alternatively, spectral clustering is a graph cut technique used to separate points that are distinct from each other (Du, 2014).
Host clustering can analyse vast amounts of attacking sources by grouping together similar attack sources and extracting patterns of clusters. Whilst this process may be ineffective for discovering new attacks that might utilise multiple attack sources, the clustering analysis can be extremely helpful when handling overwhelming numbers of alerts. Further, the overwhelming attack data gathered contains various feature types. This has led to challenges in data extraction, comprehension and prediction of various attacks and the intentions of the attack strategies.
In the 1990s, Fred Cohen addressed the problem of lacking modelling and simulation research for the field of information protection. Cohen stated that this could be due to the extreme complexity of the cyber-attack and defence problem. In his work, Cohen created attack simulations capable of providing meaningful results (Cohen, Simulating Cyber Attacks, Defences and Consequences, 1999). His exhaustive work in network protection and computer viruses ultimately created one of the pioneering network attack model frameworks. Cohen used cause and effect models to deduce 37 threat behaviours, 94 attacks both physical, e.g. fire damage, and cyber, e.g. Trojan horse, and 140 defence mechanisms (Cohen, Information System Attacks: A Preliminary Classification Scheme, 1997). His works have provided a comprehensive understanding of different types of cyber-attacks and the effect they have on networked systems.
Utilising captured attack data to improve the security and survivability of systems should be a continually completed business task. Historically, businesses and governments have been reluctant to disclose information relating to attacks on their systems for fear of losing public confidence or fear that other attackers would exploit the same or similar vulnerabilities. As new vulnerabilities are discovered and novel attacks created, security analysts work against the clock to patch vulnerabilities and repair the damage caused. Whilst attack modelling can be used for uncovering system vulnerabilities, probabilistic modelling is used to predict the probability of an attack occurring.
Bayesian networks are commonly used to model uncertainty in a network security environment. This is because the conditional dependency fits perfectly with pre/post condition and attack scenarios. Work carried out by (Qin & Lee, 2004) applies Bayesian networks for predicting high-level goals of attacks. Utilising Bayesian networks to model high-level attack intentions requires mapping specific alerts to attack categories. Figure 2.2 is an example of a Bayesian network model being applied to the security context.
In this simplified attack scenario, 4 random variables represent the following stages of an attack;
- B – Install backdoor on the system
- C – Compromise application
- M – Monitor confidential transactions
- S – Successfully obtain confidential data
The conditional dependencies of the 4 variables is displayed below in Figure 2.2, a directed Bayesian network example for predicting attacks.
This example illustrates that the success of compromising the application (C) and the success of monitoring the confidential transactions (M) are independent given their parent which is successfully installing the backdoor (B). Meanwhile, the overall success of obtaining confidential data (S) depends on the joint success of C and M.
As well as the structure of the model, the parameters it represents, i.e. the conditional probabilities, could potentially be derived from security knowledge. Particularly, a complete model will specify P(B), P(C|B), P(M|B) and P(S|C|M). So, according to the conditional dependencies, the joint distribution P(B, C, M, S) can be formulated as P(B)P(C|B)P(M|B)P(S|C, M). Using simplified joint probability, analysts can perform any inference on interested events.
However, many challenges exist when using Bayesian networks for cyber-attack modelling and predictions. The fundamental problem is the assumption of the Bayesian model. In contrast to other fields, the Bayesian model structure and parameters have significantly high uncertainty for multistage cyber-attacks (Du, 2014). There is little ground truth that exists for the varying stages a cyber-attack goes through, so training using up-to-date multistage data is almost impossible. Manually specifying the structure and parameters for a large network scenario could easily generate errors (Du, 2014).
Based on the limitations of scalability, Bayesian networks are typically used for predicting the high-level goal of an attack, such as whether the intention of the attacker is to compromise an application account and password (Qin & Lee, 2004). Methods for predicting attacks with more detailed intentions, such as predicting the attacker’s next target or service, include sequence modelling (Soldo & Markopoulou, 2010).
Despite the continual advances in communication technology, email has remained an ever-popular method of communication since its invention in 1972 by Ray Tomlinson. The vast expansion and continual use of email systems are due to them being open, free and flexible. This means that anyone with internet access and an active email account can communicate in a cost effective, simple way without authentication. Although these features make business and personal communication extremely easy, they can also be considered as a system vulnerability.
Email systems are continuously exploited by cybercriminals. Subsequently, email attacks and phishing campaigns are now considered an everyday normality. This has created an era in which unsolicited emails account for more than half of global email traffic (SpamTitan, 2017). With such a colossal amount of spam in circulation, implementing an email filtering system is less of an option for businesses and more of a necessity. Email filters process incoming and outgoing emails and determine whether it is legitimate or malicious before making a decision on how it should be handled.
An email filtering system uses a set of protocols to determine legitimate emails from spam emails. There are various types of spam filters available, all of which use different scanning mechanisms to achieve this (Aslinger, 2013). These include:
- Content filters – Content filters review the content within the email to determine whether it is spam (Aslinger, 2013).
- Header filters – Header filters analyse the email header in search of falsified information. Header analysis is performed to uncover useful information about where an email originates and the path it has taken (Aslinger, 2013).
- Blacklist spam filters – blacklist filters block all emails that have been sent from a blacklisted file of known spammers. For example, blocking IP addresses that are known to regularly send spam (Aslinger, 2013).
- Rule-based spam filters – rule-based filters block spam based on user-defined criteria such as specific sender address or wording found in the subject line or body of the email (Aslinger, 2013).
- Permission filters – permission filters require the email message sender to be initially approved by its recipient. This prevents all mail from unknown/unapproved sources (Aslinger, 2013).
- Challenge-response (C/R) spam filters – C/R filters automatically issue a reply to the sender, challenging them to enter a passcode in order to gain authentication to send the email (Aslinger, 2013).
Although routinely implemented by businesses and email providers, email filtering systems often fall short and can generate large numbers of false positives, resulting in legitimate emails being directed to spam email folders.
Bayes theory can be successfully applied to email content filtering systems as a means to improve the number of false positives that are so often generated. Bayesian spam filters work by calculating the probability of an email being spam based on the contents of that email (Tschabitscher, 2016). Unlike the usual content-based email filters, Bayesian spam filtering learns from spam and legitimate emails. This creates a robust, adapting and efficient spam defence mechanism that returns a very small number of false positives.
The underlying principle behind Bayesian spam filters is the Naïve Bayes Classifier, shown relative to the problem in the equation below:
This probabilistic method categorises spam by initially representing an email as an array x = (x₁, x₂, x₃… xn), where (x₁… xn) are characteristics, such as typical spam related words e.g. “Congratulations!” If this characteristic exists within the email message, then let Xⁱ = 1. If characteristic Xⁱ does not exist, then let Xⁱ = 0 (Androutsopoulos, Koutsias, Chandrinos, Paliouras, & Spyropoulos, 2000).
While considering all possible characteristics, in order to find the words that are most significant in detecting spam, a variable called mutual information is determined for each candidate word ‘X’ (Androutsopoulos, Koutsias, Chandrinos, Paliouras, & Spyropoulos, 2000). The equation below is used for calculating mutual information:
MIX;C=∑x∊0,1, c∊spam, legitimatePX=x,C=c⋅logPX=x, C=cPX=x⋅PC=c
Where C is a variable that represents a category, for example, spam or legitimate.
The mutual information equation calculates the probability of a word belonging to a spam email based on its frequency of appearance ratio in a training set (Androutsopoulos, Koutsias, Chandrinos, Paliouras, & Spyropoulos, 2000). Here, the training set is a collection of thousands of emails that have been categorised as either spam or legitimate. By calculating the mutual information for words found in email messages assigned to both email categories, the spam filter generates spam probabilities which are modified accordingly for the user. Words that have the highest mutual information are chosen and used as the characteristics in detecting the likelihood that an email is spam (Androutsopoulos, Koutsias, Chandrinos, Paliouras, & Spyropoulos, 2000).
After the spam characterising words have been recognised, Bayes’ rule can be applied to learn the probability of an email belonging to a category c, either spam or legitimate. As Bayes theorem is being applied, it must be assumed that the appearance of certain words within an emails content are independent events. However, realistically some words are more likely to be found together than others, such as “Congratulations” and “Won”. However, the assumption here is justifiable based on studies of the Naïve Bayesian Classifier (Langley, Iba, & Thompson, 1992), which have found it to be surprisingly effective despite the fact that its independence assumption is usually over-simplistic (Androutsopoulos, Koutsias, Chandrinos, Paliouras, & Spyropoulos, 2000).
To find the probability that an email is spam i.e. in category c, and given the array x = (x₁, x₂, x₃… xn), Bayes theorem is applied and gives the following equation:
PC=c|X=x=P C=c· ∏i=1nPXi=xiC=c∑k∈spam,legitimatePC=c⋅∏i=1nP(Xi=xi|C=c)
Here P(Xi |C) (representing the probability that the email has a particular word given that the email is spam) and P(C) (representing the probability that the email is spam), are relative frequencies that are estimated using the training set. So, for the entire email, which contains an array x words, these probabilities are multiplied (Androutsopoulos, Koutsias, Chandrinos, Paliouras, & Spyropoulos, 2000).
The final phase of the Naïve Bayes Classification is determining a threshold for classification. Misidentifying a legitimate email as spam could be considered as more of a severe problem than mistakenly permitting spam through the filter as legitimate. Assuming that the former error is λ times costlier than the latter, then an email can be sorted into the spam category if:
Given that an email has the array x words, the above equation states that, in order for an email to be transferred to the spam folder, the ratio of the probability that it is spam to the probability that it is legitimate must be greater than λ. Modern spam filters utilise the Naïve Bayesian technique and produce sufficiently accurate results which prevent inboxes being filled with spam.
Intrusion detection systems and Bayesian spam filters can be extremely successful methods of obtaining attack data, which business analysts will later examine and use to determine future threats. The outputs from these activities are subsequently used by businesses to inform policy making within the IT operational risk department. Whilst these methods are useful for patching system vulnerabilities and monitoring incoming spam, an ideal security environment would involve preventing these attacks at the outset, rather than responding to attacks as and when they occur. A more resilient security system would learn from past attacks and improve the system accordingly to prevent future attacks from being successful.
The next part of the dissertation will discuss the concept of machine learning and how it can be used in cybersecurity.
Machine learning is a branch of artificial intelligence that allows computer systems to learn from examples, data, and experience through processes that do not require them to be explicitly programmed (Rouse, 2017). The basic principle behind machine learning is to build algorithms that can obtain input data, perform statistical analysis on that data and predict an output value within an acceptable range (Rouse, 2017). Machine learning focuses on the development of computer programs that can adapt when exposed to new data, creating systems that automatically improve with experience.
Through machine learning, software can gain the ability to learn from past observations in order to make inferences about future behaviour and towards making educated guesses to outcomes of new scenarios. For example, thermostats that self-adjust to optimise heating based on past user preference, self-driving vehicles that customise journeys based on passenger location, and advertising agencies looking to keep relevant advertisements displayed for individual users. Facebook for instance, uses machine learning to personalise a user’s feed based on past likes. Because of its wide-ranging uses, machine learning has found a niche in all aspects of daily life (Kanal, 2017).
In order to better understand the concept of machine learning, close attention should be made to the input data that makes it possible. For example, as previously discussed in this dissertation, an email spam detection algorithm. Where generic spam filters simply blacklist certain addresses and allow other mail through, machine learning is used to enhance the process by comparing verified spam emails with verified legitimate emails through utilising a Naïve Bayes classifier (Kanal, 2017). This process observes specific features that are present more frequently in one i.e. spam than the other i.e. legitimate. For example, specific words like “Congratulations!” or the presence of hyperlinks to known malicious websites, also virus-laden attachments are typical features indicative of spam emails (Kanal, 2017). This process of automatically deducing a label for an email as either spam or legitimate is called classification.
Classification is one of the major applications of machine learning, of which there are broadly two types of classification techniques, namely supervised learning or unsupervised learning (Kanal, 2017). The difference between each technique lies with the data that they accept i.e. the input.
Supervised learning refers to algorithms that are provided with a set of labelled training data, with the task being to learn what differentiates the labels. Here, the term labelled represents input training data examples, where each example is a pair consisting of an input object and a desired output value. For instance, the above mentioned email example is a scenario where the task is to differentiate between two labels, spam and legitimate, where the dataset has been trained with examples of spam mail and legitimate mail examples. However, other scenarios could contain far more labels, such as image recognition algorithms like those used by Google Image search. These have learnt to accurately distinguish thousands of objects, additionally modern facial recognition algorithms surpass the performance of humans (Kanal, 2017). Upon learning what makes a category unique, a supervised algorithm can process new, unlabelled data and apply a correct label to it. However, implementing an appropriate training dataset is crucial for insuring sound results. In that, if the training data contains only pictures of cows and sheep, but the new photo is a horse, then the algorithm will not be able to assign it a proper label.
Unsupervised learning refers to algorithms that are provided with unlabelled training data, with the task being to infer the categories itself. In other words, unsupervised algorithms are provided with input data but not provided with a desired output. In some instances, labelled data is rare and the task of labelling data can be very difficult. For example,
By learning what makes each category unique, the algorithm can then be presented with new, unlabelled data and apply a correct label. Note the criticality in choosing a representative training dataset; if the training data contains only dogs and cats, but the new photo is a fish, the algorithm will have no way of knowing the proper label.
During 2016, advancements in artificial intelligence witnessed the arrival of self-driving cars, language translation and big data. However, according to Malwarebytes’ State of Malware report, that same year also witnessed the rise of ransomware, botnets and attack vectors as popular methods of malware attack. Cybercriminals are continually expanding their methods of attack through such means as attached scripts to phishing emails (Malwarebytes, 2017).
In efforts to complement the skills and capabilities of a human analyst, machine learning is being employed by businesses to handle and process colossal volumes of data, with hopes of providing a more forceful deterrent (Kanal, 2017).
Machine learning can be used to combat this by learning from past attacks
The term cybersecurity refers to the protection of information systems’ hardware and software infrastructure, the data stored on them, and the service they provide from unauthorised personnel who wish to exploit it or cause harm (HM Government, 2016).
Utilising technology to reach business objectives attracts risk, where within the business risk management environment, a risk has the potential for either damaging or advancing business objectives, such as reputation (NCSC, 2016). Therefore, cybersecurity has been consistently considered as a top investment priority for both small and large businesses.
Operating with adequate cybersecurity is nowadays more critical for ensuring business success than it has it has ever been. The escalating scale of threats reported by Cisco in their 2017 annual report revealed that during 2016, there was a 40% increase in security incidents and a 125% increase in the number of detected cyber-vulnerabilities (Cisco, 2017). These figures are outstanding, and the overwhelming force of cyber threats does not show any signs of slowing. This is evident from the earlier mentioned Wannacry ransomware attack which almost brought the NHS to a standstill.
Whilst this can be perceived as a lack of adequate cybersecurity within the NHS infrastructure, the detected cyber-vulnerability figures reported by Cisco suggest that the lack of adequate security is in fact a global issue. Therefore, changes should be made to the way in which businesses defend against attacks in order to minimise the damage. This could be achieved through proactively building a defence mechanism to identify and contain such incidents.