Chapter 1 Intelligent Software Agent
1.1 Intelligent Agent
An Agent can be defined as follows: “An Agent is a software thing that knows how to do things that you could probably do yourself if you had the time” (Ted Seller of IBM Almaden Research Centre). Another definition is: “A piece of software which performs a given task using information gleaned from its environment to act in a suitable manner so as to complete the task successfully. The software should be able to adapt itself based on changes occurring in its environment, so that a change in circumstances will still yield the intended results” (G.W.Lecky – Thompson).    
An Intelligent Agent can be divided into weak and strong notations.
Table 1.1 shows the properties for both the notations.
Reactivity & Proactivity
Intelligence refers to the ability of the agent to capture and apply domain specific knowledge and processing to solve problems. An Intelligent Agent uses knowledge, information and reasoning to take reasonable actions in pursuit of a goal. It must be able to recognise events, determine the meaning of those events and then take actions on behalf of a user. One central element of intelligent behaviour is the ability to adopt or learn from experience. Any Agent that can learn has an advantage over one that cannot. Adding learning or adaptive behaviour to an intelligent agent elevates it to a higher level of ability. In order to construct an Intelligent Agent, we have to use the following topics of Artificial Intelligence:
- Knowledge Representation
- Learning 
The functionality of a mobile agent is illustrated in 1.1. Computer A and Computer B are connected via a network. In step 1 a mobile Agent is going to be dispatched from Computer A towards Computer B. In the mean time Computer A will suspend its execution. Step 2 shows this mobile Agent is now on network with its state and code. In step 3 this mobile Agent will reach to its destination, computer B, which will resume its execution. 
1.1.3 Strengths and Weaknesses
Many researchers are now developing methods for improving the technology, with more standardisation and better programming environments that may allow mobile agents to be used in products.
It is obvious that the more an application gets intelligent, the more it also gets unpredictable and uncontrollable. The main drawback of mobile agents is the security risk involved in using them.  
The following table shows the major strengths and weaknesses of Agent technology:
Overcoming Network Latency
Reducing Network traffic
Asynchronous Execution and Autonomy
Lack of Applications
Operating in Heterogeneous Environments
Robust and Fault-tolerant Behavior
The followings are the major and most widely applicable areas of Mobile Agent:
- Distributed Computing: Mobile Agents can be applied in a network using free resources for their own computations.
- Collecting data: A mobile Agent travels around the net. On each computer it processes the data and sends the results back to the central server.
- Software Distribution and Maintenance: Mobile agents could be used to distribute software in a network environment or to do maintenance tasks.
- Mobile agents and Bluetooth: Bluetooth is a technology for short range radio communication. Originally, the companies Nokia and Ericsson came up with the idea. Bluetooth has a nominal range of 10 m and 100 m with increased power. 
- Mobile agents as Pets: Mobile agents are the ideal pets. Imagine something like creatures. What if you could have some pets wandering around the internet, choosing where they want to go, leaving you if you don’t care about them or coming to you if you handle them nicely? People would buy such things won’t they? 
- Mobile agents and offline tasks:
1. Mobile agents could be used for offline tasks in the following way:
a- An Agent is sent out over the internet to do some task.
b- The Agent performs its task while the home computer is offline.
c- The Agent returns with its results.
2. Mobile agents could be used to simulate a factory:
a- Machines in factory are agent driven.
b- Agents provide realistic data for a simulation, e.g. uptimes and efficiencies.
c- Simulation results are used to improve real performance or to plan better production lines. [10[  
1.3 Life Cycle
An intelligent and autonomous Agent has properties like Perception, Reasoning and Action which form the life cycle of an Agent as shown in 1.2. 
The agent perceives the state of its environment, integrates the perception in its knowledge base that is used to derive the next action which is then executed. This generic cycle is a useful abstraction as it provides a black-box view on the Agent and encapsulates specific aspects. The first step is the Agent initialisation. The Agent will then start to operate and may stop and start again depending upon the environment and the tasks that it tried to accomplish. After the Agent finished all the tasks that are required, it will end at the completing state.  Table 1.3 shows these states.
Name of Step
Performs one-time setup activities.
Start its job or task.
Stops jobs, save intermediate results, joins all threads and stops.
Performs one-time termination activities.
1.4 Agent Oriented Programming (AOP)
It is a programming technique which deals with objects, which have independent thread of control and can be initiated. We will elaborate on the three main components of the AOP.
a- Object: Grouping data and computation together in a single structural unit called an ‘Object’. Every Agent looks like an object.
b- Independent Thread of control: This means when this developed Agent which is an object, when will be implemented in Boga server, looks like an independent thread. This makes an Agent different from ordinary object.
c- Initiation: This deals with the execution plan of an Agent, when implemented, that Agent can be initiated from the server for execution.    
1.5 Network paradigms
This section illustrates the traditional distributed computing paradigms like Simple Network Management Protocol (SNMP) and Remote Procedure Call (RPC).
Simple Network Management Protocol is a standard for gathering statistical data about network traffic and the behavior of network components. It is an application layer protocol that sits above TCP/IP stack. It is a set of protocols for managing complex networks. It enables network administrators to manage network performance, find and solve network problems and plan for network growth. It is basically a request or response type of protocol, communicating management information between two types of SNMP entities: Manager (Applications) and Agents. 
Agents: They are compliant devices; they store data about themselves in Management Information Base (MIB) (Each agent in SNMP maintain a local database of information relevant to network management is known as the Management Information Base) and return this data to the SNMP requesters. An agent has properties like: Implements full SNMP protocol, Stores and retrieves managed data as defined by the Management Information Base and can asynchronously signal an event to the manager.
Manager (Application): It issues queries to get information about the status, configuration and performance of external network devices. A manager has the following properties: Implemented as a Network Management Station (the NMS), implements full SNMP Protocol, able to Query Agents, get responses from Agents, set variables in agents and acknowledge asynchronous events from Agents.  1.3 illustrates an interaction between a manager and an Agent.
The agent is software that enables a device to respond to manager requests to view or update MIB data and send traps reporting problems or significant events. It receives messages and sends a response back. An Agent does not have to wait for order to act, if a serious problem arises or a significant event occurs, it sends a TRAP (a message that reports a problem or a significant event) to the manager (software in a network management station that enables the station to send requests to view or update MIB variables, and to receive traps from an agent). The Manager software which is in the management station sends message to the Agent and receives a trap and responses. It uses User Data Protocol (UDP, a simple protocol enabling an application to send individual message to other applications. Delivery is not guaranteed, and messages need not be delivered in the same order as they were sent) to carry its messages. Finally, there is one application that enables end user to control the manager software and view network information. 
Table 1.4 comprises the Strengths and Weaknesses of SNMP.
Its design and implementation are simple.
It may not be suitable for the management of truly large networks because of the performance limitations of polling.
Due to its simple design it can be expanded and also the protocol can be updated to meet future needs.
It is not well suited for retrieving large volumes of data, such as an entire routing table.
All major vendors of internetwork hardware, such as bridges and routers, design their products to support SNMP, making it very easy to implement.
Its traps are unacknowledged and most probably not delivered.
It provides only trivial authentication.
It does not support explicit actions.
Its MIB model is limited (does not support management queries based on object types or values).
It does not support manager-to-manager communications.
The information it deals with neither detailed nor well-organized enough to deal with the expanding modern networking requirements.
It uses UDP as a transport protocol. The complex policy updates require a sequence of updates and a reliable transport protocol, such as TCP, allows the policy update to be conducted over a shared state between the managed device and the management station.
A remote procedure call (RPC) is a protocol that allows a computer program running on one host to cause code to be executed on another host without the programmer needing to explicitly code for this. When the code in question is written using object-oriented principles, RPC is sometimes referred to as remote invocation or remote method invocation. It is a popular and powerful technique for constructing distributed, client-server based applications. An RPC is initiated by the caller (client) sending a request message to a remote system (the server) to execute a certain procedure using arguments supplied. A result message is returned to the caller. It is based on extending the notion of conventional or local procedure calling, so that the called procedure need not exist in the same address space as the calling procedure. The two processes may be on the same system, or they may be on different systems with a network connecting them. By using RPC, programmers of distributed applications avoid the details of the interface with the network. The transport independence of RPC isolates the application from the physical and logical elements of the data communications mechanism and allows the application to use a variety of transports. A distributed computing using RPC is illustrated in 1.4.
Local procedures are executed on Machine A; the remote procedure is actually executed on Machine B. The program executing on Machine A will wait until Machine B has completed the operation of the remote procedure and then continue with its program logic. The remote procedure may have a return value that continuing program may use immediately.
It intercepts calls to a procedure and the following happens:
- Packages the name of the procedure and arguments to the call and transmits them over network to the remote machine where the RPC server id running. It is called “Marshalling”. 
- RPC decodes the name of the procedure and the parameters.
- It makes actual procedure call on server (remote) machine.
- It packages returned value and output parameters and then transmits it over network back to the machine that made the call. It is called “Unmarshalling”. 
1.6 Comparison between Agent technology and network paradigms
Conventional Network Management is based on SNMP and often run in a centralised manner. Although the centralised management approach gives network administrators a flexibility of managing the whole network from a single place, it is prone to information bottleneck and excessive processing load on the manager and heavy usage of network bandwidth.
Intelligent Agents for network management tends to monitor and control networked devices on site and consequently save the manager capacity and network bandwidth. The use of Intelligent Agents is due to its major advantages e.g. asynchronous, autonomous and heterogeneous etc. while the other two contemporary technologies i.e. SNMP and RPC are lacking these advantages. The table below shows the comparison between the intelligent agent and its contemporary technologies:
More Autonomous but less than Agent
Network Load Management
Heavy usage of Network Bandwidth
Load on Network traffic and heavy usage of bandwidth
Reduce Network traffic and latency
Packet size Network
Only address can be sent for request and data on reply
Only address can be sent for request and data on reply
Code and execution state can be moved around network. (only code in case of weak mobility)
This is not for this purpose
Network delays and information bottle neck at centralised management station
It gives flexibility to analyse the managed nodes locally
Indeed, Agents, mobile or intelligent, by providing a new paradigm of computer interactions, give new options for developers to design application based on computer connectivity.
Chapter 2 Learning Paradigms
2.1 Knowledge Discovery in Databases (KDD) and Information Retrieval (IR)
KDD is defined as “the nontrivial process of identifying valid, novel, potentially useful and ultimately understandable patterns in data” (Fayyad, Piatetsky-Shapiro and Smith (1996)). A closely related process of IR is defined as “the methods and processes for searching relevant information out of information systems that contain extremely large numbers of documents” (Rocha (2001)).
KDD and IR are, in fact, highly complex processes that are strongly affected by a wide range of factors. These factors include the needs and information seeking characteristics of system users as well as the tools and methods used to search and retrieve the structure and size of the data set or database and the nature of the data itself. The result, of course, was increasing numbers of organizations that possessed very large and continually growing databases but only elementary tools for KD and IR.  Two major research areas have been developed in response to this problem:
* Data warehousing:
It is defined as: “Collecting and ‘cleaning’ transactional data to make it available for online analysis and decision support”. (Fayyad 2001, p.30)
· Data Mining:
It is defined as: “The application of specific algorithms to a data set for purpose of extracting data patterns”. (Fayyad p. 28)
2.2 Data Mining
Data mining is a statistical term. In Information Technology it is defined as a discovery of useful summaries of data.
2.2.1 Applications of Data Mining
The following are examples of the use of data mining technology:
- Pattern of traveller behavior mined: Manage the sale of discounted seats in planes, rooms in hotels.
- Diapers and beer: Observation those customers who buy diapers are more likely to buy beer than average allowed supermarkets to place beer and diapers nearby, knowing many customers would walk between them. Placing potato chips between increased sales of all three items.
- Skycat and Sloan Sky Survey: Clustering sky objects by their radiation levels in different bands allowed astronomers to distinguish between galaxies, nearby stars, and many other kinds of celestial objects.
- Comparison of genotype of people: With/without a condition allowed the discovery of a set of genes that together account for many case of diabetes. This sort of mining will become much more important as the human genome is constructed.   
2.2.2 Communities of Data Mining
As data mining has become recognised as a powerful tool, several different communities have laid claim to the subject:
- Artificial Intelligence (AI) where it is called “Machine Learning”
- Researchers in clustering algorithms
- Visualisation researchers
- Databases: When data is large and the computations is very complex, in this context, data mining can be thought of as algorithms for executing very complex queries on non-main-memory data.
2.2.3 Stages of data mining process
The following are the different stages of data mining process, sometimes called as a life cycle of data mining as shown in 2.1:
- Data gathering: Data warehousing, web crawling.
- Data cleansing: Eliminate errors and/or bogus data e.g.
Patient’s fever = 125oC.
3- Feature extraction: Obtaining only the interesting attributes of the data e.g. “data acquired” is probably not useful for clustering celestial objects as in skycat.
4- Pattern extraction and discovery: This is the stage that is often thought of as “data mining” and is where we shall concentrate our efforts.
5- Visualisation of the data:
6- Evaluation of results: Not every discovered fact is useful, or even true! Judgment is necessary before following the software’s conclusions.   
2.3 Machine Learning
There are five major techniques of machine learning in Artificial Intelligence (AI), which are discussed in the following sections.
2.3.1 Supervised Learning
It relies on a teacher that provides the input data as well as the desired solution. The learning agent is trained by showing it examples of the problem state or attributes along with the desired output or action. The learning agent makes a prediction based on the inputs and if the output differs from the desired output, then the agent is adjusted or adapted to produce the correct output. This process is repeated over and over until the agent learns to make accurate classifications or predictions e.g. Historical data from databases, sensor logs or trace logs is often used as training or example data. The example of supervised learning algorithm is the ‘Decision Tree’, where there is a pre-specified target variable.  
2.3.2 Unsupervised Learning
It depends on input data only and makes no demands on knowing the solution. It is used when learning agent needs to recognize similarities between inputs or to identify features in the input data. The data is presented to the Agent, and it adapts so that it partitions the data into groups. This process continues until the Agents place the same group on successive passes over the data. An unsupervised learning algorithm performs a type of feature detection where important common attributes in the data are extracted. The example of unsupervised learning algorithm is “the K-Means Clustering algorithm”.  
2.3.3 Reinforcement Learning
It is a kind of supervised learning, where the feedback is more general. On the other hand, there are two more techniques in the machine learning, and these are: on-line learning and off-line learning.  
2.3.4 On-line and Off-line Learning
On-line learning means that the agent is adapting while it is working. Off-line involves saving data while the agent is working and using the data later to train the agent.  
In an intelligent agent context, this means that the data will be gathered from situations that the agents have experienced. Then augment this data with information about the desired agent response to build a training data set. Once this database is ready it can be used to modify the behaviour of agents. These approaches can be combined with any two or more into one system.
In order to develop Learning Intelligent Agent(LIAgent) we will combine unsupervised learning with supervised learning. We will test LIAgents on Iris dataset, Vote dataset about the polls in USA and two medical datasets namely Breast and Diabetes.  See Appendix A for all these four datasets.
2.4 Supervised Learning (Decision Tree ID3)
Decision trees and decision rules are data mining methodologies applied in many real world applications as a powerful solution to classify the problems. The goal of supervised learning is to create a classification model, known as a classifier, which will predict, with the values of its available input attributes, the class for some entity (a given sample). In other words, classification is the process of assigning a discrete label value (class) to an unlabeled record, and a classifier is a model (a result of classification) that predicts one attribute-class of a sample-when the other attributes are given. 
In doing so, samples are divided into pre-defined groups. For example, a simple classification might group customer billing records into two specific classes: those who pay their bills within thirty days and those who takes longer than thirty days to pay. Different classification methodologies are applied today in almost every discipline, where the task of classification, because of the large amount of data, requires automation of the process. Examples of classification methods used as a part of data-mining applications include classifying trends in financial market and identifying objects in large image databases. 
A particularly efficient method for producing classifiers from data is to generate a decision tree. The decision-tree representation is the most widely used logic method. There is a large number of decision-tree induction algorithms described primarily in the machine-learning and applied-statistics literature. They are supervised learning methods that construct decision trees from a set of input-output samples. A typical decision-tree learning system adopts a top-down strategy that searches for a solution in a part of the search space. It guarantees that a simple, but not necessarily the simplest tree will be found. A decision tree consists of nodes, where attributes are tested. The outgoing branches of a node correspond to all the possible outcomes of the test at the node. 
Decision trees are used in information theory to determine where to split data sets in order to build classifiers and regression trees. Decision trees perform induction on data sets, generating classifiers and prediction models. A decision tree examines the data set and uses information theory to determine which attribute contains the information on which to base a decision. This attribute is then used in a decision node to split the data set into two groups, based on the value of that attribute. At each subsequent decision node, the data set is split again. The result is a decision tree, a collection of nodes. The leaf nodes represent a final classification of the record. ID3 is an example of decision tree. It is kind of supervised learning. We used ID3 in order to print the decision rules as its output. 
2.4.1 Decision Tree
Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is due to the fact that, in contrast to neural networks, decision trees represent rules. Rules can readily be expressed so that humans can understand them or even directly used in a database access language like SQL so that records falling into a particular category may be retrieved. Decision tree is a classifier in the form of a tree structure, where each node is either:
Leaf node – indicates the value of the target attribute (class) of examples, or Decision node – specifies some test to be carried out on a single attribute value, with one branch and sub-tree for each possible outcome of the test. Decision tree induction is a typical inductive approach to learn knowledge on classification.
The key requirements to do mining with decision trees are:
· Attribute value description: Object or case must be expressible in terms of a fixed collection of properties or attributes. This means that we need to discretise continuous attributes, or this must have been provided in the algorithm.
· Predefined classes (target attribute values): The categories to which examples are to be assigned must have been established beforehand (supervised data).
· Discrete classes: A case does or does not belong to a particular class, and there must be more cases than classes.
* Sufficient data: Usually hundreds or even thousands of training cases. A decision tree is constructed by looking for regularities in data.  
2.4.2 ID3 Algorithm
J. Ross Quinlan originally developed ID3 at the University of Sydney. He first presented ID3 in 1975 in a book, Machine Learning, vol. 1, no. 1. ID3 is based on the Concept Learning System (CLS) algorithm. 
Input: (R: a set of non-target attributes,
C: the target attribute,
2.4.3 Functionality of ID3
ID3 searches through the attributes of the training instances and extracts the attribute that best separates the given examples. If the attribute perfectly classifies the training sets then ID3 stops; otherwise it recursively operates on the m (where m = number of possible values of an attribute) partitioned subsets to get their “best” attribute.
The algorithm uses a greedy search, that is, it picks the best attribute and never looks back to reconsider earlier choices. If the dataset has no such attribute which will be used for the decision then the result will be the misclassification of data.
Entropy – a measure of homogeneity of the set of examples. 
Entropy(S) = – pplog2 pp – pnlog2 pn (1)
2.4.4 Decision Tree Representation
A decision tree is an arrangement of tests that prescribes an appropriate test at every step in an analysis. It classifies instances by sorting them down the tree from the root node to some leaf node, which provides the classification of the instance. Each node in the tree specifies a test of some attribute of the instance, and each branch descending from that node corresponds to one of the possible values for this attribute. This is illustrated in 2.3. The decision rules can also be obtained from ID3 in the form of if-then-else, which can be use for the decision support systems and classification.
Given m attributes, a decision tree may have a maximum height of m. 
2.4.5 Challenges in decision tree
Following are the issues in learning decision trees:
- Determining how deeply to grow the decision tree.
- Handling continuous attributes.
- Choosing an appropriate attribute selection measure.
- Handling training data with missing attribute values.
- Handling attributes with differing costs and
- Improving computational efficiency.
2.4.6 Strengths and Weaknesses
Following are the strengths and weaknesses in decision tree:
It generates understandable rules.
It is less appropriate for estimation tasks where the goal is to predict the value of a continuous attribute.
It performs classification without requiring much computation.
It is prone to errors in classification problems with many class and relatively small number of training examples.
It is suitable to handle both continuous and categorical variables.
It can be computationally expensive to train. The process of growing a decision tree is computationally expensive. At each node, each candidate splitting field must be sorted before its best split can be found. Pruning algorithms can also be expensive since many candidate sub-trees must be formed and compared.
It provides a clear indication of which fields are most important for prediction or classification.
It does not treat well non-rectangular regions. It only examines a single field at a time. This leads to rectangular classification boxes that may not correspond well with the actual distribution of records in the decision space.
Decision tree is generally suited to problems with the following characteristics:
a. Instances are described by a fixed set of attributes (e.g., temperature) and their values (e.g., hot).
b. The easiest situation for decision tree learning occurs when each attribute takes on a small number of disjoint possible values (e.g., hot, mild, cold).
c. Extensions to the basic algorithm allow handling real-valued attributes as well (e.g., a floating point temperature).
d. A decision tree assigns a classification to each example.
i- Simplest case exists when there are only two possible classes (Boolean classification).
ii- Decision tree methods can also be easily extended to learning functions with more than two possible output values.
e. A more substantial extension allows learning target functions with real-valued outputs, although the application of decision trees in this setting is less common.
f. Decision tree methods can be used even when some training examples have unknown values (e.g., humidity is known for only a fraction of the examples). 
Learned functions are either represented by a decision tree or re-represented as sets of if-then rules to improve readability.
2.5 Unsupervised Learning (K-Means Clustering)
Cluster analysis is a set of methodologies for automatic classification of samples into a number of groups using a measure of association, so that the samples in one group are similar and samples belonging to different groups are not similar. The inpu