### CHAPTER 1 : INTRODUCTION

### BACKGROUND

In today’s dynamic retail environment, customers are offered with a tremendous range of choices and their loyalty is increasingly becoming transitory due to the severe impact of competitors’ actions on existing relationships (Reinartz and Kumar, 2000). This increased competition to satisfy the diverse needs of the customer, forces the traditional production and selling focus of the retailers towards customer relationships.

In the context of retail supermarket, this has resulted in large investments in retail information systems to collect the shoppers’ data to understand the customer shopping behaviour (Brijs.T et al 2001). Several tools and technologies of data warehousing, data mining, and other customer relationship management (CRM) techniques are exploited to manage and analyse this data. Especially through data mining, simply means extracting knowledge from large amounts of data which helps the organisations to find the patterns and trends in their customers’ data, and then to drive improved customer relationships (Rygielski, Wang and Yen, 2002).

According to Witten & Frank, (2005), some data mining techniques include decision trees (DT), artificial neural networks (ANN), genetic algorithms (GA), association rules (AR), etc., are usually used to solve problems related with customers in various fields like engineering, science, finance and business. In retail supermarket domain, data mining can be applied to identify useful customer behaviour patterns from large amounts of customer and transaction data (Giudici & Passerone, 2002). Consequently, the discovered information can be used to support better decision-making in retail marketing. Data mining techniques have been mostly adopted to make predictions and describe behaviours.

During the past decade, there has been an array of significant developments in data mining techniques. Some of these developments are implemented in customized service (Chen et al, 2005) which is vital in retail markets to develop customer relationship. Therefore, this research focuses to provide customised service to distinct customer segments in retail supermarkets, by implementing data mining techniques with the help of data mining tools.

### Related Work

Researchers proposed various approaches to mine sales transaction data of a retail supermarket to improve customer relationships. Previously, the customer behavioural variables such as (RFM) Recency-Frequency-Monetary variables are associated with demographic variables to predict customer purchase behaviour (Chen et al, 2005). Current research improved significantly, as Business Intelligence tools and advanced data mining algorithms are implemented to analyse the data in a much more reformed way.

Liao et al, (2008), proposed a methodology based on Apriori and K-means algorithms to mine the customer knowledge from household customers for product and brand extension in retailing. Bottcher et al, (2009), presented an approach which aimed to mine the changing customer segments in dynamic market through deriving frequent itemsets as representations of customer segments at different points of time, which are then analysed for changes.

### Problem Definition

Effective management of sales transaction data is as important as any other asset for a retail supermarket store. The sales transaction data usually contains great amount of information distributed through numerous transactions.

This study focuses on applying data mining techniques to analyse the sales transaction data of a retail supermarket store and suggests recommendations to provide customised service to defined customer segments. This research specifically uses two data mining techniques namely clustering and association rule discovery. The research starts with identifying different customer segments based on their purchase frequencies, in order to find out the differences in their purchase behaviour. The definition of behaviour in retail supermarket domain covers different meanings. For example, retailers often distinguish between light, medium and heavy users or weekday or weekend customers etc (Brijs et al, 2001). In this research, the differences will be discovered by identifying frequently purchased items for each customer segment and comparing their combinations. The retailer may use this information to customize his offer towards those segments and also to further examine the underlying relationships between those items for purposes of pricing, product placement or promotions.

### AIM & OBJECTIVES

The aim of this research is to provide customised service to defined customer segments in a retail supermarket, by implementing data mining techniques on sales transaction data with the help of data mining tools.

### OBJECTIVES

- To conduct a critical review of the literature and present the current research within the discipline.
- Obtain the customer sales transaction dataset, in order to apply the data mining algorithms.
- Based on the literature review, select the appropriate data mining approach to pre-process the dataset and to implement the algorithms on the pre-processed data.
- Analyse the results obtained from the data mining algorithms and propose recommendations to provide customised service.
- Draw conclusions, discuss the limitations of this research and suggest the areas of future research.

### Research Approach

This research follows the quantitative methodology by obtaining the dataset and analysing the data with data mining tools. The dataset for analysis was obtained from ABC retail supermarket store, Canada, which was available online (http://www.statsci.org/datasets.html). The data required for this project is selected and loaded onto data mining tools SPSS (Statistical Package for the Social Sciences) and Weka, the tools selected for this research to mine the data. The data mining algorithms that are selected for this study are k-means algorithm for Clustering and Apriori algorithm for association rule mining, the reason behind the choice of these algorithms is justified in the literature review. These algorithms are implemented on the dataset with SPSS and Weka. The results obtained from these algorithms needs to be justified with the help of charts, tables and graphs. Microsoft Excel is used to plot the charts, tables and graphs. Finally, the recommendations are made based on the analysis of results.

### Dissertation Outline

This chapter presents the essence of this dissertation, highlighting the aim and objectives of this research. The rest of this dissertation is structured as follows

Chapter 2 provides a comprehensive literature review of different aspects relating to the research topic under study.

Chapter 3 discuss in detail about the research methods and the data analysis techniques followed, in order to achieve the aim of this research.

Chapter 4 presents the analysis of the results obtained from the application of data mining algorithms on the data and provides recommendations.

Chapter 5 summarises the entire project and gives insights on limitations of this research and points out the areas of future research.

### CHAPTER 2 : LITERATURE REVIEW

### Introduction

This chapter provides a critical review of literature addressing the application of data mining in retail supermarkets. It begins with an introduction to data mining, followed by its evolution and applications in today’s business world. Then explore the role of data mining in retail supermarkets to improve customer relationships, followed by a discussion about the typical data mining approach. It also discusses the techniques and algorithms implied in this project and the reason for their choice.

### Data Mining: An Introduction

The word ‘mining’ means extracting something useful or valuable, such as mining gold from the earth (Lappas, 2007).The importance of mining is growing continuously, especially in the business world. Data mining is a process of finding interesting patterns in databases for decision-making. It is one of the fast growing and most prominent fields, which can provide a significant advantage to an organization by exploiting the vast databases (Rygielski, Wang and Yen, 2002). Finding patterns in business data is not new; traditionally business analysts use statistical approach. The computer revolution and huge databases ranging from few Giga Bytes to Tera Bytes changed this scenario. For e.g. companies like Wal-Mart stores huge amount of sales transaction data, which can be used to analyze the customer buying patterns and make predictions(Bose and Mahapatra, 2001). Data warehousing technology has enabled the companies to store huge amount of data from multiple sources under a unified schema.

Data mining has been considered to be a tool of business intelligence for knowledge discovery (Wang & Wang, 2008). Many people consider data mining as Knowledge Discovery from Data (KDD), but it is actually a part of the larger process called “knowledge discovery” which describes the steps that must be taken to secure the desired results (Han and Jiawei, 2006). Typical data mining process implicates various iterative steps; the first step is the selection of appropriate data from a single database or multiple source systems followed by cleaning and preprocessing for consistency. The data is then analyzed to find patterns and correlations in the data. This approach compliments the other data analysis techniques like statistics, OLAP (On-line analytical processing) etc, (Bose and Mahapatra, 2001). Every organization follows a different data mining and modelling process to achieve their business imperatives.

### The Evolution of data mining

It all started with the need to store the data in computers and improve the access to it for decision-making. Today the technology enables the users to access and navigate the real time data.

At the beginning of 1960s, the data was collected for the purpose of making simple calculations to answer the business questions like the total average revenue for a specific period of time. In 1980s & 1990s the usage of data warehouses to store data in a structured format emerged, policies regarding the format of data to be used in an organization were implemented (Therling.K, 1998). The data warehouses extended to be multi-dimensional that facilitates the stakeholder to drilldown and navigate through the data.

Nowadays, online analytic tools assist to retrieve the data real-time. Now computers can query data from past to until the current. In recent years many technologies like statistics, AI (Artificial Intelligence) and machine learning have been evolving as core sectors in data mining field(Rygielski, Wang and Yen, 2002). So these technologies combined with relational database systems with data integration provide potential knowledge from the data.

### Data mining applications

Data mining can be implied in many fields depending on the aim of the company. Some of the main areas in today’s business world where data mining is applied are as follows (Apte.C. et al, 2002):

- Finance
- Telecom
- Marketing
- Web analysis
- Insurance
- Retail
- Medicine

### Data mining for CRM in retail supermarkets

Swift (2001) defined CRM as an “Enterprise approach to understanding and influencing customer behaviour through meaningful communications in order to improve customer acquisition, customer retention, customer loyalty, and customer profitability”. According to research by the American management association “It costs three to five times as much to acquire a new customer than to retain the existing one” and is especially evident in services sector (Ennew & Binks, 1996). Therefore it is very important to create a good relationship with the existing and new customer rather than expanding the customer base.

A large number of companies are adopting various tools and strategies to enhance a more effective CRM, in order to gain an in-depth understanding about their customers. Data mining is a powerful new technique, which helps the companies to mine the patterns, trends and correlations in their large amounts of customer, product, or data, to drive improved customer relationships. It is one of the well-known tools given to customer relationship management (CRM) (Giudici & Passerone, 2002). In the context of retail supermarket these patterns not only assists the retailers to offer high quality products and service to their customers, but also helps them to understand the changes in customer needs.

### Data mining applications for CRM in retail supermarkets

Data mining improves customer relationship in retail supermarket, which is a wide area of research interest. Depending on the retailers’ objective, there are various application areas in which data mining can be applied to enhance customer relationship management. Some of the major data mining applications in retail supermarket, identified from literature are as follows:

- Cross-selling (Brijs et al 1999, Feng and Tsang, 1999)
- Product recommendation (Shih and Liu 2005, Li et al 2009)
- Customer behaviour modelling (Baydar.C 2003, Cadez, 2001)
- Shelf space allocation (Chen and Lin 2007, Chen et al 2006)
- Catalogue segmentation (Ester et al,2004, Lin and Hong, 2006)
- Direct marketing (Bhattacharyya, 1999, Prinzie and Poel, 2005)
- Prize optimization (Chen et al 2008, Kitts and Hetherington, 2005)

### THE DATA MINING PROCESS

Ivancsy & Vajk, (2006), defined the three main stages involved in the data mining process which are: (i) preprocessing, (ii) pattern discovery, (iii) pattern analysis/interpretation.

### Preprocessing

Famili .A, (1997), defined data preprocessing as “all the actions taken before the actual data analysis process starts. It is essentially a transformation T that transforms the raw real world data vectors Xik, to a set of new data vectors Yij”.

Yij = T (Xik)

Such that:

- Yij preserves the “valuable information” in Xik,
- Yij eliminates at least one of the problems in Xik and
- Yij is more useful than Xik.

In the above relation:

i=1… n where n = number of objects,

j=1… m where m = number of features after preprocessing,

k=1. . . l where l = number of attributes/features before preprocessing, and in general, m ? l.

The most common data used for mining the purchase behaviour in retail supermarket is customer and transaction data (Giudici and Passerone, 2002).

With a huge collection of customers’ sales transaction data available in the databases, it is necessary to pre-process the data and extract the useful information from it. In the context of retail supermarkets Pinto et al, (2006), suggested four key tasks in data preprocessing, they are data selection, data cleaning, data transformation, and data understanding.

The first preprocessing task is data selection. Here the subset of the data is identified on which pattern discovery is to be performed. This task is especially helpful in solving the problem of large amounts of data through precisely evaluating and categorizing the data into much smaller datasets. Computational requirements necessary for data analysis and manipulation are also hugely reduced by preprocessing large datasets through data selection techniques like clustering or vector quantization (Famili .A, 1997).

The second is data cleaning where basic operations include removing noise and handling missing data (Fayyad et al, 1996). Other issues regarding the data quality like errors and insufficient attributes which may complicate data analysis are also addressed in data cleaning. In most cases missing attribute values are replaced by attribute mean but traditionally, if more than 20% of attribute values are missing, the entire record is eliminated (Famili .A, 1997). To handle the outliers and noise data, techniques like binning (partitioning the sorted attribute values into bins), clustering and regression are applied.

The next preprocessing task is data transformation. “The application of each data mining algorithm requires the presence of data in a mathematically feasible format” (Crone et al, 2006). Inaccuracies in the measurements of input or incorrect feeding of data to the data mining algorithm could cause various problems. Since, operations such as normalization, aggregation, generalization and attribute construction are performed. Normalization deals with scaling the attribute value into a specific range, whereas aggregation and generalization refers to the summary of data in terms of numeric and nominal attributes. Attribute construction handles the replacement or addition of new attributes based on the existing attributes (Markov.Z and Larose.T.D, 2007).

Once issues regarding the data are solved and the data are prepared, understanding the nature of data would be useful in many ways. According to Famili .A, (1997), the majority of the data analysis tools have some limitations regarding the data characteristics; therefore, it is important to recognize these characteristics for appropriate setup of data analysis process. He further pointed out that techniques like visualization and principal component analysis are useful for better understanding the data.

### Pattern discovery

Fayyad et al, (1996), defined that core of the process is the application of specific data-mining methods for pattern discovery and extraction. Pattern discovery is the key stage of the process in this research, which is where the data is mined. Once the data is pre-processed, and the irrelevant information is eradicated, it is then used for mining, using data mining techniques to discover patterns. However, it is not the intent of this paper to describe all the available algorithms and techniques derived from these fields.

This research focuses on two main data mining methods that to helps to mine the data and find patterns. They are Clustering and Association. The reason behind choosing these rules is justified below.

### Clustering

Clustering can be defined as a technique to group together a set of items having similar characteristics (Kuo et.al, 2002). In retail domain, cluster analysis is a common tool to segment the customers on the basis of their similarity on a chosen segmentation base or set of bases (Stewart.D.W and Girish.P., 1983). The actual choice for one or a combination of these bases largely depends on the business question under study (Wind, Y., 1978).

Segmentation can be done on the basis of various variables/bases, such as 1) general or product-specific, and 2) observable or non-observable as classified by wedel M and Kamakura (2000).

General bases for segmentation are independent of products, services or circumstances, whereas product-specific bases for segmentation are related to the product, the customer or the circumstances. Observable segmentation bases can be measured directly, whereas non-observable bases must be inferred. The combination of classification of segmentation bases is shown below.

Twedt, D.W., (1967) as cited in Engel.J.F et.al, (1972), stated that the existence of huge amounts of transaction data in retail supermarket domain provides a great impetus for segmentation on the basis of purchase frequencies. Segmentation based on this divides customers into groups on their intensity of buying a product(s), such as light, medium and heavy buyers. According to Brijs.T, (2002), if customers are classified by their purchase frequency, these segments could then be treated differently in terms of marketing communication (pricing, promotion, product recommendation etc.) to achieve greater return of investment (ROI) and customer satisfaction. Therefore, in this research clustering is employed to segment the customers into various clusters on the basis of their similarity in purchase frequency.

Several algorithms have been proposed in the literature for clustering, such as ISODATA, CLARA, CLARANS, ScaleKM, P-CLUSTER, DBSCAN, Ejcluster, BIRCH and GRIDCLUS (Kanungo.T. et al, 2002). It is not the objective of this research to use all these algorithms for clustering. However, as discussed earlier, k-means clustering algorithm would be used to cluster and its justification is given below.

### k-Means Clustering Algorithm

The K-means has been considered as one of the most effective algorithms in producing good clustering results for many practical applications (Alsabti et.al, 1998). The main reason behind this is, when clustering is done for the purpose of data reduction, the goal is not to find the best partitioning, but simply needs a reasonable consolidation of N data points into k clusters, and, if necessary, some efficient way to improve the quality of the initial partitioning (Faber, 1994). Therefore, k-means algorithm proves to be very effective in data reduction and produces a good clustering output.

The k-means algorithm clusters the data that are similar into various clusters namely Cluster 0, Cluster 1 to Cluster n (Kanungo et.al, 2002). Provided a set of n data points in real d dimensional space (Rd) and an integer k, the aim is to determine k points in Rd, called the centers, so as to minimize the mean squared distance from each data point to its nearest center. This measure is often called as squared-error distortion (Jain & Dubes, 1988).

The diagram below illustrates the standard k-means algorithm. It shows the results during two iterations in the partitioning of nine two-dimensional data points into two well separated clusters. Points in cluster 1 are shown in red, points in cluster 2 are shown in black; data points are denoted by open circles and reference points by filled circles. Clusters are indicated by dashed lines. The iteration converges quickly to the correct clustering; even there was a bad initial choice of reference points.

Lloyd’s algorithm is another popular version for K-means clustering which requires about the same amount of computation for a single pass through all the data points, or a single iteration, like the standard K-means algorithm (Faber, 1994). Lloyd’s algorithm is similar to standard k-means algorithm, except when the cluster centroids are chosen as reference points in subsequent partition; the centroids are adjusted both during and after each partition. However, the k-means algorithm constantly updates the clusters and requires comparatively less iterations than Lloyd’s algorithm, thus, k- means algorithm is considerably faster. This is the key reason that leads to the selection of k-means algorithm, since it can group the customers which have similar purchase frequency into different clusters in less iterations. However, Faber, (1994), pointed two major drawbacks to this algorithm. Firstly, it is computationally inefficient for large datasets. Secondly- although the algorithm will always produce the desired number of clusters, the centroids of these clusters may not be particularly representative of the data.

### Association Rules

Association rule discovery was proposed to find all rules in a basket data to analyze how items purchased by customer in a shop are related (Gery & Haddad, 2003). The rule refers to the discovery of attribute value associations that occur frequently together within a given data set (Han & Kamber, 2001). It is typically used for market basket analysis to discover rules of the form x% of customers who buy item A and B, also buy item C (Zaiane, 2001) and is an implication of the form (A, B) èC.

Some of the key definitions drawn from literature that characterize association rule technique are provided below (Agarwal, Imielinski and Swami, 1993).

Itemset (i) – Set of items that contain in a single transaction (e.g. {milk, sugar, curd})

Support (s) – The support expresses the percentage of transactions in the data that contain both the items in the antecedent and the consequent of the rule.

Confidence (c) – Confidence estimates the conditional probability of B given A, i.e. P (B |A) and it can be calculated as Confidence (c) =s (A & B) / s (A).

Association rule discovery typically involves a two phased sequential methodology (Brijs T., 2002).

### Finding frequent itemsets

The first phase involves looking for so-called frequent itemsets, i.e. itemsets for which the support in the database equals or exceeds the minimum support threshold set by the user. This is computationally the most complex phase because of the number of possible combinations of items that need to be tested for their support.

### Generating association rules

Once all frequent itemsets are known, the discovery of association rules is comparatively straightforward. The general scheme is that, if ABCD and AB are frequent itemsets, then it can be calculated whether the rule AB è CD holds with sufficient confidence by computing the ratio confidence = s (ABCD) / s (AB). If the confidence of the rule equals or exceeds the minconf threshold set by the user, then it is a valid rule. For an itemset of size k, there are potentially 2k-2 confident rules.

Association rules can help to discover frequently purchased combinations of products within a customer segment and provide customised service by promoting certain products or product combinations to the defined segments (Brijs T. et al, 2001). Therefore, in this research, frequent itemsets for each customer cluster will be generated and their combinations are compared to identify the differences in purchase behaviour to provide customised service.

Traditionally, support and confidence are used in association rule discovery, but Aggarwal & Yu, (1998), criticized this support-confidence framework for association rule discovery for the following main reasons.

- First of all, setting good values for the support and confidence parameters in association rule mining is critical. For example, setting the support threshold too low will lead to the generation of more frequent itemsets. But even if they would be statistically significant, their support is usually too low to have a significant influence.
- On the other hand, setting the support threshold too high increases the probability of finding insignificant relations and of missing some important associations between items.

Further Agarwal & Yu, (1998); Brin et al., (1998), as cited in Brijs.T,(2003), introduced the lift (also called interest) measure to overcome the disadvantage of confidence in not taking the baseline frequency of the consequent into account.

Lift/Interest (l) – Lift is computed as the confidence of the rule divided by the support of the right-hand-side (RHS). In other words, lift is the ratio of the probability that A and B occur together to the multiple of the two individual probabilities for A and B.

Lift (l) = s (A & B) / s (A).s (B)

In order to perform predictive analysis, it is useful to discover interesting patterns in the given dataset that serve as the base for future trends. The best and most popular algorithm used for this analysis is called the Apriori algorithm (Varde et.al, 2004).

### Apriori Algorithm

The Apriori algorithm was proposed by Agarwal et.al, (1994) (Varde et.al, 2004). The algorithm finds frequent items in a given data set using the anti-monotone constraint (Petrucelli et.al, 1999), as cited in Varde et.al, 2004).

It works under the principle that ‘all subsets of a frequent itemset must also be frequent’. In other words, if at least one subset of an itemset is not frequent, the itemset can never be frequent anymore. This principle simplifies the discovery of frequent itemsets significantly because for some itemsets, it can be determined that they can never be frequent before checking their support against the data anymore. This is the key reason to select this algorithm, since the association rules for the items can be discovered more quickly and efficiently.

- Given a data set, the problem of association rule mining is to generate all rules that have support and confidence greater than a user-specified minimum support and minimum confidence respectively.
- Candidate sets having k items can be generated by joining large sets having k-1 items, and deleting those that contain a subset that is not large (where large refers to support above minimum support).
- Frequent sets of items with minimum support form the basis for deriving association rules with minimum confidence. For A è B to hold with confidence C, C% of the transactions having A must also have B.

Though the algorithm is very efficient in association rule mining, it has certain drawbacks, found by Margahny & Shakour, (2006).

- After discovering the 4-frequent itemsets this algorithm needs extra data structure and methods to process, since the further itemsets can be obtained by different ways.
- This method is fast only while handling small data.

There are several tools available for clustering and association rule mining such as ARMiner, Clementine (SPSS), Enterprise Miner (SAS), Intelligent Miner (IBM), Decision Series (NeoVista). To mine association rules, WEKA is used, which is a collection of machine learning algorithms for data mining tasks and SPSS statistics 17.0 for clustering. WEKA is an open source software available online and very efficient in mining large datasets, where as SPSS statistics 17.0 is a statistical analysis package available at Brunel university computer labs.

### Pattern Analysis

Pattern analysis means understanding the results obtained by the algorithms and drawing conclusions. This is the last phase in data mining process, where the uninteresting rules or patterns from the set found in the pattern discovery phase are filtered out (Cooley et.al, 2000). The uninteresting patterns are filtered out by applying appropriate methodologies on the results and produce some interesting statistical patterns.

### SUMMARY

This chapter discussed the concept of data mining, its evolution and applications in today’s business world. Then, it provided an overview regarding the role of data mining in retail supermarkets to improve customer relationships, followed by a discussion about the typical data mining approach. It also discussed the techniques and algorithms implied in this project and the reason for their choice. The following chapter will explain about the research approach followed in this dissertation.

### CHAPTER 3 : RESEARCH APPROACH

### Introduction

This chapter will discuss about the research approach employed in this project. It starts with a discussion about the research and literature review methods, followed by the data collection and justification of data mining approach on the data.

### Research Methods

The research approach depends upon the objectives and aim of the study, as it assists the researcher to elicit appropriate responses. Boyatzis (1998) defines research methods as taxonomic procedure used for problem solving where, first data is collected based on the research question, hypotheses are stated, data analysis is carried out using appropriate techniques, results are interpreted and conclusions are derived. According to Hussey et al (1997), research methods can be distinguished in two types they are Qualitative and Quantitative approach. Oates (2006) says that, quantitative research method is the data or evidence on numbers whereas qualitative research method includes all non-numeric.

In this research, quantitative research methodology is used. Quantitative study makes use of the numeric data that has been collected from a group of people interested in the subject area which is then analysed and interprete