Chapter 2. Background and Related Work
During the last decade the amount of literature published in the field of eLearning has grown noticeably, as has the diversity in attitudes and viewpoints of people who work on this subject. The general background presented here with regard to eLearning includes the definition, details of different types and the concept of quality. Information quality within information systems (IS), web mining and information extracting techniques are the main areas on which supporting literature is primarily focused. However, an in-depth explanation of each branch of these research fields is outside the scope of this literature review. The literature presented here is particularly focused on the subtopics of these large research areas which are directly applicable to this research.
The structure of this chapter is divided into three main parts: a general view of eLearning including definitions of eLearning, an overview of eLearning types and the concept of quality in eLearning; information quality (IQ) within ISs; and information extraction methods. Each section includes a number of subsections which address the factors that are relevant to this research.
In this part of the literature review, we focus on eLearning by providing a discussion about the definitions of eLearning, eLearning types and the concept of quality in eLearning. Moreover, in this section we lay the foundation for the general concept of quality in eLearning upon which the research will be based. This section also presents a discussion about the relationships between technology, users and content in an eLearning context.
The term eLearning is used in the literature and in business to describe many fields, such as online learning, web-based training, distance learning, distributed learning, virtual learning, or technology-based training. During recent decades, eLearning has been defined in several instances in different ways. In any publication in the field of eLearning, it is important to ensure that the author’s understanding exactly matches that of the majority of the readers, therefore, the specific definition used should be stated first. Moreover, to reach a clearer understanding of what eLearning is, in this part of the thesis we present numerous definitions of eLearning as mentioned in the literature.
In general, most of the definitions of the term eLearning are used to express the exploitation of technologies which can be used to deliver learning (or learning materials) in an electronic format, most likely via the World Wide Web (WWW). Psaromiligkos and Retalis consider eLearning to be the systems which utilise the WWW as a delivery medium for static learning resources, such as instructional files, or as an interface onto interactive
The previous definitions look at eLearning in general; in more detail, eLearning can be in the form of courses or in the form of modules and smaller learning materials – it also could take various forms. Romiszowski takes these details into account and summarises the definitions encountered in the literature in a way that emphasises that eLearning can be a solitary, individual activity, or a collaborative group activity. It also suggests that both synchronous and asynchronous interactive forms can be engaged. Naidu also takes into consideration the differences in the forms of interaction when trying to formulate a general definition of eLearning:
“… educational processes that utilize information and communications technology to mediate asynchronous as well as synchronous learning and teaching activities.”
The position adopted in this research is that eLearning entails the technology used to distribute the learning materials, the quality of these materials, and the interaction with learners. The definition of eLearning used in this research addresses these dimensions in terms of:
“… the use of new multimedia technologies and the Internet to improve the quality of learning by facilitating access to resources and services as well as remote exchange and collaborations”
As mention earlier, eLearning takes many different forms and includes numerous types of systems. In the extant literature eLearning types are defined following two main axes: the user context (individuals, groups or a community of users) and users’ engagement and interactivity.
Romiszowski takes these details into account and summarises the definitions encountered in the following table, which emphasises that eLearning can be a solitary, individual activity, or a collaborative group activity. It also suggests that both synchronous and asynchronous interactive forms can be engaged.
Looking more deeply at the division of the forms of interactivity used in eLearning systems, there are two main types of eLearning: asynchronous and synchronous, depending on learning and teaching activities. Synchronous eLearning environments require tutors and learners, or the online classmates, to be online at the same time, where live interactions take place between them. In this context, Doherty describes an Asynchronous Learning Network (ALN) as a variety of eLearning systems which distribute learning materials and concepts in one direction at a time. Moreover, Spencer and Hiltz express ALN as a place where learners can interact with learning materials, tutors and other learners, through the WWW at different times and from different places.
The focus of this research will be on a case where students log-in to and use the system independently of other students and staff members, as well as using asynchronous methods regarding learning content, quality management and delivery which fit firmly into the general definition of the asynchronous eLearning environment.
Quality Concept in ELearning
The definition of eLearning adopted in this thesis represents three fundamental dimensions: technology, access and quality. The focus in this research will be on quality, which is considered a crucial issue for education in general, and for eLearning in particular. This section of the literature review will discuss concepts of quality in eLearning generally, and highlight the importance of content as the most critical factor for the overall quality.
Currently, there are two recognised challenges in eLearning: the demand for overall interoperability and the request for (high) quality. However, quality cannot be expressed and set by a simple definition, since in itself quality is a very abstract notion. In fact, it is much easier to notice the absence of quality than its presence.
Despite efforts to reach a comprehensive, universal definition of quality in eLearning, there is still a fundamental ambiguity surrounding the issue. One position is to consider quality as an evaluation of excellence, a stance which is primarily adopted by universities and education institutions. For example, in universities quality teaching and learning are promoted as the top priority, giving less attention to criteria or measurements regarding teaching input into courses, the learning outcomes, and the interactivity with the system.
Another trend is to consider the improvement in quality, where quality is improved by moving beyond the set conceptions applied, and generally moving in the direction of a flexible process of negotiation, which needs a very high level of quality capability from those involved.
Furthermore, quality can be viewed and considered from different aspects. Here, the SunTrust Equitable report illustrates what they perceive to be the value chain in eLearning in the form of a pyramid.
The content is the most critical factor of eLearning. Indeed, to be able to use the internet as a tool to improve learning, the content should not distract learners, but increase their interest for learning. Learning tools and enablers are also important in the learning procedure. In reality, providers of learning platforms and knowledge management systems are key in the successful delivery of content. These companies provide the necessary infrastructure to deliver learning content. Moreover, learning service providers (LSP) are the distribution channels for content providers. One of the challenges facing these knowledge hubs and LSPs is to ensure that the learners are receiving fresh content. Companies focused on educational e-tailing then complete the value pyramid of eLearning.
Looking at the pyramid it can be clearly observed that content is the most critical component of learning through the internet. In a similar manner, Henry stated that eLearning is composed of three main aspects: content, technology and services, he also emphasised that content is the most significant factor. Although this thesis will focus on the quality of content delivered by eLearning as the most important criteria and the most influential in the overall level of learning quality, the specified context and the perspectives of users also need to be taken into account when defining quality in eLearning. It is also essential to classify suitable criteria to address this quality.
ELearning Technology, Users and Content
Although most eLearning explanations focus on the technology and not on the learning, it is important to keep the people eLearning is designed for in mind. Moreover, individual learning styles and required learning materials should be addressed first. Then a suitable electronic delivery method can be adopted. On their website (agelesslearner.com), Karl and Marcia Conner commented, in this regard, that “Maybe the ‘e’ should actually follow the word ‘learning'”.
Henry describes the content in a way that includes all delivered materials, including the materials which are usually offered in classroom-based learning and that are tailored for eLearning, in addition to any other knowledge the developer might offer.
In fact, eLearning systems are considered to be user-adaptive systems, where systems are designed to react with user performance and choices. Webber, Pesty and Balacheff express user modelling as a central issue in the development of user-adaptive systems, whose behaviour is usually based on the users’ preferences, goals, interests and knowledge. Moreover, they declare that a system can be considered user-adaptive when changes in its functionality, structure or interface can be monitored, in order to consider the different needs of users and, ultimately, their changing needs. In the area of eLearning Heift and Nicholson believe that eLearning systems as adaptive systems are designed to meet the diverse requirements of students who have different levels of knowledge and backgrounds .
There is a significant base of literature and research in the area of adaptive systems, which usually base their behaviour on user models. In more detail, Kobsa explained that the user model often depends on one user or a group of users sharing the same profile and it characterises user’s preferences, goals, interests and knowledge. Webber, Pesty and Balacheff notice that with regard to this point there are two main problems relating to user modelling: to identify the relevant information to be modelled and to decide which method is more suitable to apply in order to determine the relevant information about the user. In fact, personalisation plays an important role in all areas of the e-era, especially in eLearning, as stated by Esposito, Licchelli and Semeraro, where the main issue is student modelling. This is the analysis of student behaviour and the prediction of future activities and learning performance . Furthermore, Ong and Ramachandran perceive that the literature on adaptive systems shows that by modelling the learner, the human tutor and the knowledge domain of instructional content, powerful pedagogical outcomes can be obtained.
Although eLearning systems are considered types of adaptive systems, the difference between the concept of the user and the concept of the student creates a fundamental problem in the eLearning area. In this context, Esposito, Licchelli and Semeraro believe that in a general web system the user is free to surf and the system attempts to predict future user steps using the user model in order to improve the interaction between the user and the system, while in the eLearning system the modelling has to improve the educational route, adapting it to the model of the student. As a result it is essential to control and to assess student browsing. The systems should not give the students absolute freedom to decide their way through the content and learning materials, rather, the system should provide a specific educational path and offer a continuous evaluation activity of student performance, towards a defined pedagogical goal.
Although delivering web-based educational materials can be very useful as the same content is distributed to a number of students and can be accessed regardless of time and place, this delivery would not be beneficial from a pedagogical point of view if the students, their level of knowledge and their learning style was not known. In fact, Sanatally and Senteni observe that the widely held principle of using the web simply as a form of distributed medium for learning materials does not add significant value to the learning process. This argument leads to the conviction of the importance of developing adaptive eLearning systems. Even if adaptive systems are focused on the interaction with users and changing the course and the content dynamically with their needs, and not on controlling the set sequence of a course, eLearning can exploit adaptive technologies to build learning environments that form user-specific sequencing. Tang and McCalla use the example of the Paper Recommender System as a good example of this exploitation: the system was designed to give recommendations to students about what conference or journal papers to read, based on their level of understanding and knowledge.
We can see more clearly, as suggested by Conati and VanLehn, that the aim of adaptive systems is to build precise, interactively changing models of individual student learning, in order to use them as representations of how learners are progressing within the content of the course. Moreover, Papanikolaou et al. describe adaptivity as being system-controlled and in most cases assists in: planning the content, planning the delivery and presentation of the learning materials, supporting student navigation throughout the field of knowledge and problem solving. From this, it can be deduced that learner models generally characterise learner knowledge levels on the concepts of domain knowledge, pedagogical goals and learning preferences towards diverse styles of learning materials. In this context, they suggest that the domain model should be used in parallel with the learner model to provide a structure for the representation of learner knowledge of the defined domain. Using this procedure, tailored learning materials can be distributed to specific learners to be consistent with their requirements. This corresponds with the vision of Mittal et al., who realised that by creating several broad groups into which it is possible to segment target learners, it can be ensured that the content of learning materials for an absolute beginner student is not the same for that of a student getting ready for an exam.
Nowadays, most student modelling systems follow the same method, in which the systems’ starting point is to create a reference template for a student, thus, the expertise or intelligence encoded into the system can adapt the course organisation and content to the individual student. The use of this method to decide the style and level of content that a student should be offered, according to how students interact with the system, will lead to a more personalised learning experience. In the case of this research, the student and domain model did not entail the complexity of those built in adaptive systems; however, several of the underlying principles of available student and domain modelling techniques proved to be useful. The key issue in most adaptive systems that feature student and domain modelling is a sequence of complex data repositories that give highly precise values about student performance and completion against learning materials.
The focus in this research will be on measuring the quality of the content of learning materials distributed via eLearning systems, and establishing how the student will interact with the materials, how they will be able to extract the relevant information from the content and how the context of the online materials will help students to recognise the underlying structure of the content and easily access the parts in which they are interested. This research will gather empirical evidence using online questionnaires, which can be used to directly ask students about their preferences and perspectives.
This part of the literature review provided a general overview of eLearning, including definitions of eLearning, a note of eLearning types and consideration of the concept of quality in eLearning. It also identified the definition adopted for eLearning in this study and considered the type upon which this research will focus. Moreover, in this section we laid the foundation for the general concept of quality in eLearning upon which the research will be based. Finally, it presented a brief discussion about the relationships between technology, users and content in an eLearning context.
The next part of this chapter will discuss the concept of IQ within ISs; this will be used later on to set standards for IQ in the context of eLearning systems.
Information Quality in Information Systems
In this part of the literature review we will start with a brief discussion of the terms “data quality” and “information quality”, and will shed some light on the concept of IQ within ISs and how it could be defined. We will also provide a comprehensive review of the major historical developments of IQ frameworks.
Data Quality(DQ) vs. Information Quality
During recent years, much work has been done to build quality frameworks for IQ dimensions. In the past, research focused on DQ, but due to the recent development of internet technologies, ISs today are providing users with information, not only data. Therefore, research attention has shifted to focus on IQ frameworks.
While, some researchers explicitly distinguish between the terms “data” and “information” and explain information as data which has been processed in some way, sometimes, it may be difficult to discriminate between them in practice .
Still, in some studies the term “information” is interchangeable with “data”. Likewise, the term “data quality” is often used synonymously with “information quality”. Consequently, in this study, the concept of information will be used in a broad sense, which covers the concept of data.
Before reviewing the researches that were conducted to formulate (data/information) quality frameworks within ISs, first we will discuss the meaning of IQ and how it could be defined.
How Information Quality Could be Defined
Although it is important to set standards for IQ, it is a difficult and complex issue, particularly in the area of ISs, because there is no formal definition of IQ, as quality is dependent on the criteria applied to it. Furthermore, it is dependent on the targets, the environment and from which viewpoint we look at the IQ, that is, from the provider or the consumer perspective. Moreover, IQ is both a task-dependent and a subjective concept. Juran summarises these aspects of quality in his quality definition as “fitness for use”. Similarly, Wang described DQ (which could apply to IQ) as data that is fit-for-use.
This description has been adopted by researchers because it brings to light the fact that IQ cannot be defined and evaluated without knowing its context. Defining IQ in a contextual approach seems to be logical because quality criteria, which could be used to assess IQ, can differ according to the context. In fact, IQ is expressed in the literature to be a multi-dimensional concept with varying attributed characteristics depending on the context of the information. However, taking into account the complexity of the IQ concept and that its measurement is expected to be multi-dimensional in nature, the prime issue in defining the quality of any IS is identifying the criteria by which the quality is determined. The criteria result from the multi-dimensional and interdependent nature of quality in ISs, and are dependent on the objectives and the context of the system. Thus, it is common to define IQ on the internet by identifying the main dimensions of the quality, for that purpose IQ frameworks are widely used to identify the important quality dimensions in a specific context, these dimensions can be used as benchmark to improve the effectiveness of information systems, as described by Porter.
Information Quality Frameworks
Today, for any IS to be judged successfully it has first to satisfy additional predefined quality criteria. An eLearning system is a special type of IS so it is important to examine the literature relating to the traditional IS success models and the proposed quality frameworks, in order to test the possibility of extending these success models to identify eLearning content quality criteria in an eLearning context.
Much of the work done in IS success has its origins in the well-known DeLone and McLean (D&M) IS Success Model.This model provided a comprehensive taxonomy on IS success based on the analysis of more than 180 studies on IS success and it identified over 100 IS success measures during the analysis. It established that system quality, IQ, use, user satisfaction, individual and organisational impact were the most distinct elements of the IS success equation. In a later work, the authors confirmed the original taxonomy and their conclusion, namely that IS success was “a multidimensional and interdependent construct”. Their model makes two important contributions to the understanding of IS success. First, it provides a scheme for categorising the multitude of IS success measures that have been used in the literature. Second, it suggests a model of temporal and causal interdependencies between the categories. The updated model, which was proposed in 2003, consists of six dimensions:
- Information quality, which concerns the system content issue. Web content should be personalised, complete, relevant, easy to understand and secure.
- System quality, which measures the desired characteristics of a web based system such as usability, availability, reliability and adaptability.
- Service quality
- Usage, which measures visits to a website, navigation within the site and information retrieval.
- User satisfaction, which measures user’s opinions of the system and should cover the entire user experience cycle.
- Net benefits, which capture the balance of positive and negative impacts of the system on the users. Although this success measure is very important, it cannot be analysed and understood without system quality and IQ measurements.
In their model, DeLone and McLean defined three main dimensions for the quality: IQ, systems quality and service quality. Each one has to be measured separately, because singularly or jointly, they will affect subsequent system usage and user satisfaction.
In 1996, Wang and Strong proposed their DQ framework, which will be discussed in more detail in the following section. In their framework they categorised characteristics/attributes in to four main types/factors: intrinsic, accessibility, contextual and representational. This method of categorising IQ factors and attributes proved to be a valuable methodology for defining IQ. Lately, several quality management projects in business and government have successfully used this framework.
After Wang & Strong DQ framework, diverse research efforts were spent in order to identify IQ dimensions in deference contexts. Although these frameworks varied in their approach and application, they shared some of the same characteristics concerning their classifications of the dimensions of quality.
In 1996, Gertz focused on finding possible solutions for the problems regarding modeling and managing data quality and integrity of integrated data. H proposed a taxonomy of data quality characteristics that includes important attributes such as timeliness and completeness of local information sources. While Redman’s work aimed to set up practical guidelines to analyze and improve information quality within business processes, h proposed a number of quality attributes grouped into six categories: Privacy, Content, Quality of Values, Presentation, Improvement and Commitment. In the same year, Zeist & Hendricks identified 32 IQ sub-characteristics grouped in 6 main IQ characteristics which covered functionality, reliability, efficiency, usability, maintainability and portability.
Unlike general purpose IQ framework, in 1997 Jarke proposed a special purpose framework where he used the same hierarchical design established by Wang & Strong. He defined IQ criteria depending on the context and requirements for specific application; Data Warehouse Quality (DWQ). In his framework, Jarke linked each operational quality goals for data warehouses to the criteria which describe this goal. The main defined criteria are accessibility, interpretability, usefulness, believability, and validation.
In 1998, Chen gave a list of IQ criteria with no special taxonomy. He, however, proposed a goal-oriented framework focusing mainly on time-oriented criteria such as response time and network delay. One year later, Alexander & Tate proposed their framework for IQ IN Web environment. This framework consisted of 6 main criteria; authority, accuracy, objectivity, currency, orientation and navigation. In the same year, Katerattanakul & Siau adapted Wang & Strong DQ framework to propose their four categories IQ framework of individual websites. Furthermore, Shanks & Corbitt recommended a semiotic-based quality framework for information on the Web. This framework includes four semiotic levels. Syntactic level to insure that information is consistent whiles the Semantic level focuses on the information completion and accuracy. Pragmatic level is the third level which covers the usability and the usefulness of the information. The forth level is the social level ensures information understandability. Within their framework there are 11 quality dimension distributed within the identified levels.
Dedeke in 2000 developed a conceptual IS quality framework that includes 5 categories; ergonomic, accessible, transactional, contextual and representational quality. Each category consists of number of quality dimensions such as; availability, relevancy and conciseness. Whilst Zhu & Gauch described 6 quality metrics for information retrieval on the web; these are availability, authority, currency, information-to-noise ratio and cohesiveness.
Leung adapted Zeist & Hendricks’s quality framework in 2001 and applied it to Intranet applications. He defined 6 main IQ characteristics; functionality, reliability, usability, efficiency, maintainability and portability. Each quality characteristic in the proposed framework includes numbers of sub-characteristics.
Several research in IS quality were undertaken in 2002, Eppler & Muenzenmayer suggested two main manifestations for their proposed framework; content quality and media quality. The content quality is focused on the quality of the presented information and it consists of two categories; relevant information and sound information. Whereas media quality is focused on the quality of the medium used to deliver the information and it includes optimized process category and reliable infrastructure category. Each category in the framework contains number of quality dimensions. Khan categorised IQ depending on the context of the system. The framework divided IQ into two main quality types; product and service quality. Moreover, it divided these two types into 4 quality classifications and each classification into number of quality dimensions. The quality classifications are sound information, useful information, dependable information and usable information.
In addition, Klein conducted a research in the same year to identify five IQ dimensions chosen Wang & Strong’s DQ framework to measure IQ in Web context; accuracy, completeness, relevance, timeliness and amount of data. Mecella also proposed an initial framework for quality management in Cooperative Information System (CIS). This framework includes a model for quality data exported by cooperating organizations and the design of an infrastructure service and improving quality.
More recent, in 2005 Liu & Huang mentioned 6 key dimensions for IQ; source (focused on information availability), content (focused on information completeness), format and presentation (focused on information consistency), currency (focused on information currency and timeliness), accuracy (focused on information accuracy and reliability) and speed (focused on how easily information is downloadable).
Besiki et all introduced in 2007 a general framework for IQ assessment. This framework consists of a comprehensive taxonomy of IQ dimensions, and provides a straightforward and powerful predictive method to study IQ problems and reason through them in a systematic and meaningful way.
Lately, Kimberly et all presented in 2009 a model for how to think about IQ depending on the application context; they identified number of common IQ metrics. Kargar & Azimzadeh also presented an original experimental framework for ranking IQ on the Web log. The results of their research revealed 7 IQ dimensions for IQ in Web log. For each quality dimension, quality variables associated coefficients were calculated and used so that the proposed framework is able to automatically assess IQ of Web logs. In the same year Thi & Helfert conducted a research aimed to propose a quality framework based on IS architecture. In their research they identified quality factors for different construct levels of IS architecture. Moreover, they also presented impacts amongst different quality factors which help to analyze the cause of IS defects.
In this part we gave a brief review of the researches conducted to formulate (data/information) quality frameworks within ISs. However in the next section we will focus on Wang and Strong’s DQ framework as we will use it as a base for this research to measure IQ in eLearning systems along the dimensions of the framework.
Wang and Strong’s Data Quality Framework
Wang & Strong’s DQ framework, one of the most comprehensive, popular, remarkable and cited DQ frameworks, was established by Richard Wang and Diana Strong in 1996. Their framework was designed empirically by asking users to give their viewpoints about the relevance of the IQ dimensions to capture the most important aspects of DQ to the data consumer.
In their framework, Wang and Strong classified quality dimensions into four groups:
- Intrinsic DQ: refers to the quality dimensions originating from the data on its own. This aspect of quality is independent of the user’s perspective and context.
- Contextual DQ: focuses on the aspect of IQ within the context of the task at hand. In this group, the quality dimensions are subjective preferences of the user. Contrary to the first group, DQ dimensions cannot be assessed without considering the user’s viewpoint about their use of provided information.
- Representational DQ: is related to the representation of information within the systems.