During the last few decades, we have seen the dramatically rise of the Internet and its applications to the point which they have become a critical part of our lives. Internet security in that way has become more and more important to those who use the Internet for work, business, entertainment or education.
Most of the attacks and malicious activities on the Internet are carried out by malicious applications such as Malware, which includes viruses, trojan, worms, and botnets. Botnets become a main source of most of the malicious activities such as scanning, distributed denial-of-service (DDoS) activities, and malicious activities happen across the Internet.
1.2 Botnet Largest Security Threat
A bot is a software code, or a malware that runs automatically on a compromised machine without the user’s permission. The bot code is usually written by some criminal groups. The term “bot” refers to the compromised computers in the network. A botnet is essentially a network of bots that are under the control of an attacker (BotMaster). Figure 1.1 illustrates a typical structure of a botnet.
A bot usually take advantage of sophisticated malware techniques. As an example, a bot use some techniques like keylogger to record user private information like password and hide its existence in the system. More importantly, a bot can distribute itself on the internet to increase its scale to form a bot army. Recently, attackers use compromised Web servers to contaminate those who visit the websites through drive-by download . Currently, a botnet contains thousands of bots, but there is some cases that botnet contain several millions of bots .
Actually bots differentiate themselves from other kind of worms by their ability to receive commands from attacker remotely . Attacker or better call it botherder control bots through different protocols and structures. The Internet Relay Chat (IRC) protocol is the earliest and still the most commonly used C&C channel at present. HTTP is also used because Http protocol is permitted in most networks. Centralized structure botnets was very successful in the past but now botherders use decentralized structure to avoid single point of failure problem.
Unlike previous malware such as worms, which are used probably for entertaining, botnets are used for real financial abuse. Actually Botnets can cause many problems as some of them listed below:
i. Click fraud. A botmaster can easily profit by forcing the bots to click on advertisement for the purpose of personal or commercial abuse.
ii. Spam production. Majority of the email on the internet is spam.
iii. DDoS attacks. A bot army can be commanded to begin a distributed denial-of-service attack against any machine.
iv. Phishing. Botnets are widely used to host malicious phishing sites. Criminals usually send spam messages to deceive users to visit their forged web sites, so that they can obtain users’ critical information such as usernames, passwords.
1.3 Botnet in-Depth
Nowadays, the most serious manifestation of advanced malware is Botnet. To make distinction between Botnet and other kinds of malware, the concepts of Botnet have to understand. For a better understanding of Botnet, two important terms, Bot and BotMaster have been defined from another point of views.
Bot – Bot is actually short for robot which is also called as Zombie. It is a new type of malware  installed into a compromised computer which can be controlled remotely by BotMaster for executing some orders through the received commands. After the Bot code has been installed into the compromised computers, the computer becomes a Bot or Zombie . Contrary to existing malware such as virus and worm which their main activities focus on attacking the infecting host, bots can receive commands from BotMaster and are used in distributed attack platform.
BotMaster – BotMaster is also known as BotHerder, is a person or a group of person which control remote Bots. Botnets- Botnets are networks consisting of large number of Bots. Botnets are created by the BotMaster to setup a private communication infrastructure which can be used for malicious activities such as Distributed Denial-of-Service (DDoS), sending large amount of SPAM or phishing mails, and other nefarious purpose [26, 27, 28]. Bots infect a person’s computer in many ways.
Bots usually disseminate themselves across the Internet by looking for vulnerable and unprotected computers to infect. When they find an unprotected computer, they infect it and then send a report to the BotMaster. The Bot stay hidden until they are announced by their BotMaster to perform an attack or task. Other ways in which attackers use to infect a computer in the Internet with Bot include sending email and using malicious websites, but common way is searching the Internet to look for vulnerable and unprotected computers . The activities associated with Botnet can be classified into three parts: (1) Searching – searching for vulnerable and unprotected computers. (2) Dissemination – the Bot code is distributed to the computers (targets), so the targets become Bots. (3) sign-on – the Bots connect to BotMaster and become ready to receive command and control traffic.
The main difference between Botnet and other kind of malwares is the existence of Command-and-Control (C&C) infrastructure. The C&C allows Bots to receive commands and malicious capabilities, as devoted by BotMaster. BotMaster must ensure that their C&C infrastructure is sufficiently robust to manage thousands of distributed Bots across the globe, as well as resisting any attempts to shutdown the Botnets. However, detection and mitigation techniques against Botnets have been increased [30,31]. Recently, attackers are also continually improving their approaches to protect their Botnets. The first generation of Botnets utilized the IRC (Internet Relay Chat) channels as their Common-and-Control (C&C) centers. The centralized C&C mechanism of such Botnet has made them vulnerable to being detected and disabled. Therefore, new generation of Botnet which can hide their C&C communication have emerged, Peer-to-Peer (P2P) based Botnets. The P2P Botnets do not experience from a single point of failure, because they do not have centralized C&C servers . Attackers have accordingly developed a range of strategies and techniques to protect their C&C infrastructure.
Therefore, considering the C&C function gives better understanding of Botnet and help defenders to design proper detection or mitigation techniques. According to the C&C channel we categorize Botnets into three different topologies: a) Centralized; b) Decentralized and c) Hybrid. In Section 1.1.4, these topologies have been analyzed and completely considered the protocols that are currently being used in each model.
1.4 Botnet Topologies
According to the Command-and-Control(C&C) channel, Botnet topology is categorized into three different models, the Centralized model, the Decentralized model and Hybrid model.
1.4.1 Centralized Model
The oldest type of topology is the centralized model. In this model, one central point is responsible for exchanging commands and data between the BotMaster and Bots. In this model, BotMaster chooses a host (usually high bandwidth computer) to be the central point (Command-and-Control) server of all the Bots. The C&C server runs certain network services such as IRC or HTTP. The main advantage of this model is small message latency which cause BotMaster easily arranges Botnet and launch attacks.
Since all connections happen through the C&C server, therefore, the C&C is a critical point in this model. In other words, C&C server is the weak point in this model. If somebody manages to discover and eliminates the C&C server, the entire Botnet will be worthless and ineffective. Thus, it becomes the main drawback of this model. A lot of modern centralized Botnets employed a list of IP addresses of alternative C&C servers, which will be used in case a C&C server discovered and has been taken offline.
Since IRC and HTTP are two common protocols that C&C server uses for communication, we consider Botnets in this model based on IRC and HTTP. Figure 1.2 shows the basic communication architecture for a Centralized model. There are two central points that forward commands and data between the BotMaster and his Bots.
126.96.36.199 Botnets based on IRC
The IRC is a type of real-time Internet text messaging or synchronous conferencing . IRC protocol is based on the Client Server model that can be used on many computers in distributed networks. Some advantages which made IRC protocol widely being used in remote communication for Botnets are: (i) low latency communication; (ii) anonymous real-time communication; (iii) ability of Group (many-to-many) and Private (one-to-one) communication; (iv) simple to setup and (v) simple commands. The basic commands are connect to servers, join channels and post messages in the channels; (vi) very flexibility in communication. Therefore IRC protocol is still the most popular protocol being used in Botnet communication.
In this model, BotMasters can command all of their Bots or command a few of the Bots using one-to-one communication. The C&C server runs IRC service that is the same with other standard IRC service. Most of the time BotMaster creates a channel on the IRC server that all the bots can connect, which instruct each connected bot to do the BotMaster’s commands. Figure 1.3 showed that there is one central IRC server that forwards commands and data between the BotMaster and his Bots.
Puri  presented the procedures and mechanism of Botnet based on IRC, as shown in Figure. 1.4.
Bots infection and control process :
i. The attacker tries to infect the targets with Bots.
ii. After the Bot is installed on target machine, it will try to connect to IRC server. In this while a random nickname will be generate that show the bot in attacker’s private channel.
iii. Request to the DNS server, dynamic mapping IRC server’s IP address.
iv. The Bot will join the private IRC channel set up by the attacker and wait for instructions from the attacker. Most of these private IRC channel is set as the encrypted mode.
v. Attacker sends attack instruction in private IRC channel.
vi. The attacker tries to connect to private IRC channel and send the authentication password.
vii. Bots receive instructions and launch attacks such as DDoS attacks.
188.8.131.52 Botnet based on HTTP
The HTTP protocol is an additional well-known protocol used by Botnets. Because IRC protocol within Botnets became well-known, internet security researchers gave more consideration to monitoring IRC traffic to detect Botnet. Consequently, attackers started to use HTTP protocol as a Command-and-Control communication channel to make Botnets become more difficult to detect. The main advantage of using the HTTP protocol is hiding Botnets traffics in normal web traffics, so it can easily passes firewalls and avoid IDS detection. Usually firewalls block incoming and outgoing traffic to not needed ports, which usually include the IRC port.
1.4.2 Decentralized model
Due to major disadvantage of Centralized model-Central Command-and-Control (C&C)-attackers tried to build another Botnet communication topology that is harder to discover and to destroy. Hence, they decided to find a model in which the communication system does not heavily depending on few selected servers and even discovering and destroying a number of Bots.
As a result, attackers take advantage of Peer-to-Peer (P2P) communication as a Command-and-Control (C&C) pattern which is much harder to shut down in the network. The P2P based C&C model will be used considerably in Botnets in the future, and definitely Botnets that use P2P based C&C model impose much bigger challenge for defense of networks.
In the P2P model, as shown in Fig. 1.6, there is no Centralized point for communication. Each Bot have some connections to the other Bots of the same Botnet and Bots act as both Clients and servers. A new Bot must know some addresses of the Botnet to connect there. If Bots in the Botnet are taken offline, the Botnet can still continue to operate under the control of BotMaster.
P2P Botnets aim at removing or hiding the central point of failure which is the main weakness and vulnerability of Centralized model. Some P2P Botnets operate to a certain extent decentralized and some completely decentralized. Those Botnets that are completely decentralized allow a BotMaster to insert a command into any Bots. Since P2P Botnets usually allow commands to be injected at any node in the network, the authentication of commands become essential to prevent other nodes from injecting incorrect commands.
For a better understanding in this model, some characteristics and important features of famous P2P Botnets have been mentioned:
Slapper: Allows the routing of commands to distinct nodes. Uses Public key and private key cryptography to authenticate commands. BotMasters sign commands with private key and only those nodes which has corresponding public key can verify the commands . Two important weak points are: (a) its list of known Bots contains all (or almost all) of the Botnet. Thus, one single captured Bot would expose the entire Botnet to defenders  (b) its sophisticated communication mechanism produces lot traffic, making it vulnerable to monitoring via network flow analysis.
Sinit: This Bot uses random searching to discove other Bots to communicate with. It can results in an easy detection due to the extensive probing traffic .
Nugache: Its weakness is based on its reliance on a seed list of 22 IP addresses during its bootstrap process .
Phatbot: Uses Gnutella cache server for its bootstrap process which can be easily shutdown. Also its WASTE P2P protocol has a scalability problem across a long network .
Strom worm: it uses a P2p overnet protocl to control compromised hosts. The communication protocol for this Bot can be classified into five steps, as describes below :
i. Connect to Overnet – Bots try to join Overnet network. Each Bot initially has hard-coded binary files which is included the IP addresses of P2P-based Botnet nodes.
ii. Search and Download Secondary Injection URL – Bot uses hard-coded keys to explore for and download the URL on the Overnet network .
iii. Decrypt Secondary Injection URL – compromised hosts take advantages of a key(hard coded) to decrypt the URL.
iv. Download Secondary Injection – compromised hosts attempt to download the second injection from a server(probably web server). It could be infected files or updated files or list of the P2P nodes .
1.4.3 Hybrid model
The Bots in the Hybrid Botnet are categorized into two groups:
1) Servant Bots – Bots in the first group are called as servant Bots, because they behave as both clients and servers, which have static, routable IP addresses and are accessible from the entire Internet.
2) Client Bots – Bots in the second group is called as client Bots since they do not accept incoming connections. This group contains the remaining Bots, including:- (a) Bots with dynamically designated IP addresses; (b) Bots with Non-routable IP addresses; and (c) Bots behind firewalls which they cannot be connected from the global Internet.
1.5 Background of the Problem
Botnets which are controlled remotely by BotMasters can launch huge denial of service attacks, several infiltration attacks, can be used to spread spam and also conduct malicious activities . While bot army activity has, so far, been limited to criminal activity, their potential for causing large- scale damage to the entire internet is immeasurable . Therefore, Botnets are one of the most dangerous types of network-based attack today because they involve the use of very large, synchronized groups of hosts for their malicious activities.
Botnets obtain their power by size, both in their increasing bandwidth and in their reach. As mentioned before Botnets can cause severe network disruptions through huge denial- of-service attacks, and the danger of this interruption can charge enterprises big sums in extortion fees. Botnets are also used to harvest personal, corporate, or government sensitive information for sale on a blooming organized crime market.
1.6 Statement of the Problem
Recently, botnets are using new type of command-and-control(C&C) communication which is totally decentralized. They utilize peer-to-peer style communication. Tracking the starting point and activity of this botnet is much more complicated due to the Peer-to-Peer communication infrastructure.
Combating botnets is usually an issue of discovering their weakness: their central position of command, or C&C server. This is typically an IRC network that all bots connect to central point, however with the use of P2P method; we cannot find any central point of command. In the P2P networks each bots in searching to connect other peers which can receive or broadcast commands through network. Therefore, an accurate detection and fighting method is required to prevent or stop such dangerous networks.
1.7 Research Questions
a. What are the main differences between centralized and decentralized botnets?
b. What is the best and efficient general extensible solution for detecting non-specific Peer-to- Peer botnets?
1.8 Objectives of the Study
i. To develop a network-based framework for Peer-to-Peer botnets detection by common behavior in network communication.
ii. To study the behavior of bots and recognizing behavioral similarities across multiple bots in order to develop mentioned framework.
1.9 Scope of the Study
The project scope is limited to developing some algorithms pertaining to our proposed framework. This algorithms are using for decreasing traffics by filtering it, classifying intended traffics, monitoring traffics and the detection of malicious activities.
1.10 Significance of the study
Peer-to-Peer botnets are one of the most sophisticated types of cyber crime today. They give the full control of many computers around to world to exploit them for malicious activities purpose such as spread of virus and worm, spam distribution and DDoS attack. Therefore, studying the behavior of P2P botnets and develop a technique that can detect them is important and high-demanded.
Understanding the Botnet Command-and-Control(C&C) is a critical part in recognizing how to best protect against the overall botnet threat. The C&C channels utilized by the Botnets will often show the type and degree of actions an enterprise can follow in either blocking or shutting down a botnet, and the probability of success.
It is also obvious that attackers have been trying for years to move away from Centralized C&C channels, and are achieving some success using Decentralized(P2P) C&C channels over the last 5 or so years. Therefore in this chapter we have defined a classification for better understanding of Botnets C&C channels, which is included Centralized, Decentralized, and Hybrid model and tried to evaluate recognized protocols in each of them. Understanding the communication topologies in Botnets is essential to precisely identify, detect and mitigate the ever-increasing Botnets threats.
Before majority of botnets was using IRC (Internet Relay Chat) as a communication protocol for Command and Control(C&C) mechanism. Therefore, many researches tried to develop botnet detection scheme which was based on analysis of IRC traffic . As a result, attackers decided to develop more sophisticated botnets, such as Storm worm and Nugache toward the utilization of P2P networks for C&C infrastructures. In response to this movement, researches have proposed various models of botnets detection that are based on P2P infrastructure .
One key advantage of both IRC and HTTP Botnet is the use of central Command and Control. This characteristic provides the attacker with very well-organized communication. However, the assets also considers as a main disadvantage to the attacker . The threat of the Botnet can be decreased and possibly omitted if the central C&C is taken over or taken down . The method that is starting to come out is P2P structure for Botnet interaction. There is not any centralized centre for P2P botnets. Any nodes in P2P botnet behave as client and server as well. If any point in the network is shut down the botnet still can continue its operation.
The storm botnet is one of the main and recognized recent P2P botnets. It customized the overnet P2P file-sharing application which is based on the Kademlia distributed hash table algorithm  and exploit it for its C&C infrastructure. Recently many researchers specially in the anti-virus community and electronic media concentrated on storm worm [56,57].
2.2 Background and History
A peer-to-peer network is a network of computers that any computer in the network can behave as both a client and a server.
Some explanation of peer-to-peer networks does not need any form of centralized coordination. This definition is more comfortable because the attacker may be interested in hybrid architectures .
The table 2.1 shows a summary of some well-known bots and P2P protocols. The range of time from the first bots, EggDrop, until the Storm Worm P2P bot is newly released. The first non-malicious bot was EggDrop that came up many years ago, and we know it as one of the first IRC bots that came to market. GTBot that have many other categories is another well-known malicious bot, that its variants are IRC client, mIRC.exe.
After a while, P2P protocols have been used for Botnet activities. Napster is one of the first bot that used P2P as its communication. Napster built an platform that permit all bots can find each other and share files with each other in the network. In this bot, file sharing has been done in the centralized server that we can say it was not completely a P2P botnet. Therefore, all bots have to upload an index of their files to the centralized server and also if they are looking for other files among all bots, have to search in centralized server. If it can find any file that looking for, then can directly connect to that bot and download what they want. Nowadays, because Napster has been shutdown as their service recognized as illegal service, many other P2P service focusing on avoiding such finding.
After few years after Napster, Gnutella protocol came up as the first completely P2P services. Actually after Gnutellas , as shown in Table 2.1, many other P2P protocols have been released, such as Kademilia and Chord. This two new p2p service are using distributed hash table as a method for finding information in the peer-to-peer networks.
Agobot is another malicious P2P bot that came up recently and become widespread because of good design and modular code base . Nowadays many researchers are concentrating on P2P bots and there is an anticipation that P2P bots will reach to the stage that Centralized botnets will not been used any more in the future.
Table 2.1: P2P based Botnets
2.3 Peers-to-Peer Overlay Networks
Overlay networks are categorized into two categories: Structured and Unstructured. All nodes in first category can connect to most X peers regarding some conditions for identification of nodes that those peers want to connect. However in unstructured type there is not any specified limit for the number of peers that they can connect, in spite of the fact that there is not any condition for connecting to other peers. Overnet is a good example of structured p2p networks and Chorf is a good example of unstructured P2P networks.
2.3.1 Brief overview of Overnet
One of the popular file sharing networks is Overnet that use for their design use distributed hash table (DHT) algorithm that called Kademlia. Each node produces a 128-bit id for joining the network and also use for sending to other node for introducing itself. Actually each node in the network saves the information about other nodes in order to route query messages.
2.3.2 Brief overview of Gnutella
Gnutellas is a unstructured file sharing network. In this network, when a node like n want to connect to a node like m, use a ping message to inform the other node for its presence. As long as node m received ping message, then send it back to other nodes in its neighbor and also send a Pong message to the sender of ping message that was node n. this transaction among node let them to learn about each other.
2.4 Botnet Detection
In particular, to compare existing botnet detection techniques, different methods are described and then disadvantages of each method are mentioned respectively.
2.4.1 Honeypot-based tracking
Honeypot can be used to collect bots for analyzing its behavior and signatures and also for tracking botnets. But using honeypots have several limitations. The most important limitation is because of limited scale of exploited activities that can track. And also it cannot capture the bots that use the method of propagation other than scanning, such as spam. And finally it can only give report for infection machines that are anticipated and put in the network as trap system. So it means that it can not give a report for those computers that are infected with bot in the network but are not devoted as trap machines. So we can come to this conclusion that generally in this technique we have to wait until one bot in the network infect our system and then we can track or analyze the machine.
2.4.2 Intrusion detection systems
Intrusion detection techniques can be categorized into two categories: host-based and network-based solution. Host-based techniques are used for recognizing malware binaries such as viruses. A good example of this type is anti-virus detection systems. However, we know that anti-virus are good for just virus detection. The most important disadvantages of anti-virus are that bots can easily evade the detection technique by changing their signatures easily, because the detection system cannot update their databases consistency. And also bots can disable any anti-virus tools in the system to protect themselves from detection.
Network- based intrusion detection system is another method for detection that is used in the field of botnet detection. Snort and Bro are the two well-known signature based detection system that are used currently. They use a database as signatures of famous malicious activities to detect botnets or any other malware. Actually if our objective is using this technique for botnet detection, we have to keep updating the database and recognizing all malware quickly to make a signature of it and add to our database. For solving this solving this problem recently researchers are using anomaly based IDS that can detect malicious activities based on behavior of malware or detection techniques.
2.4.3 Bothunter : Dialog correlation-based Botnet detection
This technique developed an evidence-trail approach for detecting successful bot infection with patterns during communication for infection process. In this strategy, bot infection pattern are modeled to use for recognizing the whole process of infection of botnet in the network. All behavior that occur the bot infection such as target scanning, C&C establishment, binary downloading and outbound propagation have to model by this method. This method gathers an evidence-trail of connected infection process for each internal machine and then tries to look for a threshold combination of sequences that will convince the condition for bot infection .
The BotHunter use snort with adding two anomaly-detection components to it that are SLADE (Statistical payLoad Anomaly Detection Engine) and SCADE (Statistical scan Anomaly Detection Engine). SCADE produce internal and external scan detection warnings that are weighted for criticality toward malware scanning patterns. SLADE perform a byte-distribution payload anomaly detection of incoming packets, providing a matching non-signature approach in inbound exploit detection [32 ].
Slade use an n-gram payload examination of traffics that have typical malware intrusions. SCADE execute some port scan analysis for incoming and outgoing traffics. Actually BotHunter has a link between scan and alarm intrusion that shows a host has been infected. When a adequate sequence of alerts is established to match BotHunter’s infection dialog model, a comprehensive report is created to get all the related events participants that have a rule in infection dialog . This method provides some important features:
i. This technique concentrates on malware detection by IDS-driven dialog correlation. This model shows an essential network processes that occur during a successful bot infection.
ii. This technique has one IDS-independent dialog correlation engine and three bot-specific sensors. This technique can automatically produce a report of whole detection of bot, as well as the infection of agent, identification of the computer that has been infected and source of Command and Control centre.
184.108.40.206 Bot infection sequences
Actually understanding bot infection life processes is a challenging work for protection of network in the future. The major work in this area is differentiating between successful bot infection and background exploit attempt. For reaching to this point analysis of two-way dialog flow between internal hosts and external hosts (internet) is needed. In a good design network which uses filtering at gateway, the threats of direct exploitations are limited. However, contemporary malware families are highly flexible in their ability to attack vulnerable hosts through email attachments, infected P2P media, and drive-by download infections .
220.127.116.11 Modeling the infection dialog process
The bot distribution model can conclude by an analysis of external communication traffics that shows the behavior of relevant botnet. Incoming scan and utilize alarms are not enough to state a winning malware infection, as are assumed that a stable stream of scan and exploit signals will be observed from the way out monitor .
Figure 2.1 shows the process of bot infection in BotHunter that used for evaluating network flows through eight stages. This model is almost similar with the model that Rajab et al. presented for IRC detection model. The model that they proposed has early initial scanning that is a preceding consideration happen in form of IP exchange and pointing vulnerable ports. Actually figure 2.1 is not aimed for a strict ordering of infection events that happen during bot infection.
The important issue here is that bot dialog processes analysis have to be strong to the absence of some dialog events and must not need strong sequencing on the order in bound dialog is conducted. One solution to solve the problem of sequence order and event is to use a weighted event threshold system that take smallest essential sparse sequences of events under which bot profile statement can be initiated . For instance, it is possible put weighting and threshold system for the look of each event in a way that a smallest set of event is important prior of bot detection.
18.104.22.168 Design and implementation
More attention devoted for designing a passive network monitoring system in this part which be able of identifying the bidirectional warning signs when internal hosts are infected with b