Essay Writing Service

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)

Theories of Student Engagement in Education

Chapter 2

Origin of Engagement Research.

The underlying foundation of engagement is a relationship between students learn and the time, effort, and resources they devote to their education. Such a relationship seems self-evident and awareness of it is not new. John Dewey, renowned philosopher and educator, described similar ideas in his pedagogic creed (1897). Though hardly novel, the systematic study of these relationships as educational engagement only became commonplace within the past 40 years (Ewell, 2008; Finn & Zimmer, 2012; King-Alexander, 2000; Kuh, 2009).

The genesis of modern engagement research is unclear and a point of disagreement between researchers (Merwin, 1969; Kuh, 2009 Pike, Kuh, & McCormick, 2010; Trowler, 2010). Three researchers are commonly credited. The earliest of these is Ralph Tyler (1930), who studied the relationship between learning and time spent on a task (Merwin, 1969; Kuh, 2009).

More commonly, the origin modern engagement research is thought to have originated with Alexander Astin’s “Student Involvement Theory” (Astin, 1985; Ewell, 2008; Heibergert, & Loken, 2011; Junco, Pike, Kuh, & McCormick, 2010; Trowler, 2010). Indeed, Astin’s theory seems to capture the foundational logic of engagement. Astin stated, “Quite simply, student involvement refers to the amount of physical and psychological energy that the student devotes to the academic experience.” (Astin, 1984, p. 518). To clarify his intended meaning of involvement Astin (1984) presented a list of similar verbs; these included “to partake”, “join in”, or “engage in”.

In his theory, Astin (1984) suggested a positive relationship between college student involvement and both personal and academic growth. Astin (1984) described student development as a function of the “…quantity and quality of the physical and psychological energy that students invest in the college experience… such as absorption in academic work, participation in extra-curricular activities, and interaction with faculty or other institutional personnel” (Astin, 1984, p. 518). Astin’s theory stated that “student developmental outcomes” not only depend upon content or teaching method but also relate to individual student behavior (Astin, 1984). The foundation of the engagement construct rests upon this principle. Because of this, engagement experts have claimed that current conceptualizations of engagement closely resemble Astin’s theory (Astin, 1984; Pascarella & Seifert, 2010; Trowler, 2010).  Doubtless, Astin’s research was instrumental in drawing greater attention to relationships between student involvement and learning (Kuh, 2009).

Robert Pace’s series of research on quality of effort also played a critical role in developing and drawing attention to the engagement construct. In 1978, Pace received a grant to study how the quality of effort may help predict and explain student learning and development (Pace, 1984).  Pace spoke of educational experiences in two parts: products and processes. Products, referring to the things gained from the educational experience, include things such as knowledge gained, new perspectives, and greater skills (Pace, 1984). Processes are the means by which students attain products. In reference to processes Pace stated,

“In thinking about how we evaluate educational programs it seemed to me that the quality of the educational experience or process should somehow be taken into account. We need ways to measure the quality of the process as well as the quality of the product.”  (Pace, 1984, 5)

This statement highlights concepts that appear relatively similar to those currently discussed in engagement research. Educational quality is often judged by what skills, abilities or mindsets students glean from their experience. Additionally, and similar to Pace’s thoughts, the outcomes of these experiences are dependent upon the degree with which they engage the experience.

Conceptualizing Engagement

Conceptualizations of engagement are plentiful, and settling on a primary definition has been a challenge for researchers (Appleton, et al., 2008; Trowler, 2010). Perhaps the greatest difficulty has been the profusion of the “Jingle Jangle” fallacy in engagement research (Appleton, et al., 2008; Reschly & Christenson, 2012). The Jingle Jangle fallacy refers to either the utilization of two or more terms to describe the same construct (i.e., the “jingle”), or the use of one term to describe two or more constructs (i.e., the “jangle”).  This can make navigating and understanding the engagement literature typically incorporates difficult. For example, terms, such as student engagement, academic engagement, school engagement, and engagement with school all describe the same construct (Finn & Zimmer, 2012). This can make navigating and understanding the engagement literature difficult for researchers. Despite this confusion, most researchers rely upon definitions provided by influential experts in engagement, such as George Kuh, Ernst Pascarella, and Peter Ewell (Axelson & Flick, 2011; Junco, Heibergert & Loken, 2010).

In contrast to the inconsistencies within the engagement research jargon, researchers have accepted embraced a set of core characteristics common to the general conceptualization of engagement.  (Appleton, Christenson, Furlong, 2008). First, engagement exists on a continuous scale ranging from fully engaged to disengagement (Appleton, et al., 2008; Reschly & Christenson, 2012; Trowler, 2010). Second, engagement is plastic, capable of changing through intervention or over time (Fredricks, Blumenfeld, & Paris, 2004). Finally, engagement is best represented as a multidimensional or a meta-construct (Appleton, et al., 2008; Fredricks et. al., 2004; Trowler, 2010). With engagement accepted as multidimensional, many models of engagement’s contents have emerged.

In a review of the engagement literature, Fredricks et al., (2004) identified three components of engagement comment in the research: cognitive, behavioral, and affective.  Of these, behavioral engagement is the most common in research (Fredricks et. al., 2004).  Fredricks et al., (2004) described behavioral engagement as educationally meaningful actions of students (e.g., studying, attending class, study abroad). Emotional engagement refers to students’ feelings concerning their educational experience, including class content, teachers, or the institution.  Of the three engagement components, cognitive engagement is less common in the research. Cognitive engagement refers to students’ degree of investment in their educational experiences, including the amount of effort they are willing to put into comprehending and mastering the material (Fredricks et al., 2004). Notably popular, Fredricks’ et al., three component model is one of the more commonly cited structures for engagement as a meta-construct.

Other models of engagement are often similar to Fredricks et al., Appleton et al., (2006) contended that engagement is better described using four components: Academic, behavioral, cognitive, and psychological. Many models such as this exist, each structure representing different ways to categorize the same behaviors. That is, most models do not differ in what content is accepted as engagement but rather in how it is organized. One exception to this is an argument for the inclusion of institutional contributions to engagement (Harper & Quaye, 2009). Researchers argued that student engagement is a function of factors dependent upon students (i.e., their willingness to engage) as well as the resources and opportunities the institution provides for engagement (Axelson & Flick, 2011; Harper & Quaye, 2009; Kuh, 2009). In other words, engagement depends upon both the students’ intrinsic will to engage and the opportunities for students’ to be engaged provided by the institution (Pace, 1984; Trowler, 2010). Thus criticisms of the three component model were of neglect to include institutional influences and over emphasis on student responsibility (Harper & Quaye, 2009; Kuh; 2009). Indeed, it seems reasonable that both student and institutional data would be needed to explain student engagement.

Both components are found in the National Survey of Student Engagement (NSSE) definition of engagement. As will be discussed later, the NSSE was designed using decades of research and is the leading measure of engagement in the United States. As such, the National Survey of Student Engagement (NSSE) has one of the most widely accepted definitions of engagement. The NSSE describes engagement as a multi-dimensional construct and as an interplay between institutional and individual characteristics (Kuh 2001; 2003).

“Student engagement represents two critical features of collegiate quality. The first is the amount of time and effort students put into their studies and other educationally purposeful activities. The second is how the institution deploys its resources and organizes the curriculum and other learning opportunities to get students to participate in activities that decades of research studies show are linked to student learning.” (“About NSSE” 2017)

Because of its apparent alignment with the research and my use of NSSE in this study, I have chosen to adopt this definition of engagement. While the use of other definitions is common in engagement research I will not describe these in any depth. Examples can be found in Table X. and a more in-depth review is available from Trowler (2010) and Axelson & Flick (2011).

Table X. Alternative Definitions of Engagement
Author Year & Page Definition
Kuh 2010, p.3 “Student engagement is concerned with the interaction between the time, effort and other relevant resources invested by both students and their institutions intended to optimize the student experience and enhance the learning outcomes and development of students and the performance, and reputation of the institution.”
Kuh et. al., 2009, p.683 “ Student engagement represents the time and effort students devote to activities that are empirically linked to desired outcomes of college and what institutions do to induce students to participate in these activities (Kuh, 2001, 2003, 2009).”
Krause & Coates 2008, p.493 “the extent to which students are engaging in 

activities that higher education research has shown to be linked with high-quality

learning outcomes”

Measuring Engagement

There have been many measures of engagement. Of these, the NSSE is by far the most prominent in higher education. Because it is also the main measure of engagement in this study, I devote the most time to it. However, I will also briefly describe the College Student Engagement Questionnaire, a predecessor of the NSSE.

College Student Engagement Questionnaire (CSEQ). The CSEQ was developed by the late Robert Pace, a researcher well known for his research on quality of effort (Pace, 1984). According to Pace, students’ quality of effort was a key determinate in educational quality (Pace, 1980). Pace described two pieces of education, the product or the outcome of education, and the process, or the way in which the product is attained (Pace 1984). Pace argued that just as the value of outcomes differ, so does the value of the processes (Pace 1984). Pace thought that the time put into a task does not suffice in understanding educational products (Pace,1980;1984).  He suggested that the quality of the practicing, studying or other activity would provide more rich information about the learning process. Based on these ideas Pace developed and released the CSEQ in 1979 (Pace, 1980). The most recent version of the CSEQ was released in 2007 and focused on measuring three aspects of student experience: College activities, college environment, and student gains toward Outcomes (CESQ, 2007).

Each of these components provided key information to estimating the quality of students’ educational experiences. Student learning gains measured key outcomes such as science, intellectual skills, personal development, etc. The college environment subscales measured characteristics of the educational environment as well as scales conserving the relationships between students and educators. College activities contained a wide range of activities common to most institutions. These activities include library experience, clubs and organizations, art music and theater, experiences with faculty etc.

The CSEQ was well used, with over 140 institutions participating when it was discontinued in 2014. The CSEQ was a pioneer of the measurement of engagement, and two thirds of the CSEQ was included in the NSSE when it succeeded the CSEQ (Kuh, 2001, McCormick & McClenney 2012).

National Survey of Student Engagement.  A team of renowned researchers in educational quality, figures such as Alexander Astin, C. Robert Pace, George Kuh, and Peter Ewell “A Brief History” 2010; Gose, 1999 developed the NSSE. The Indiana University Center for Postsecondary Research (IUCPR) was selected to host the project, and has become the home of the NSSE (Ewell, 2008; Kuh, 2009). Of the many measures of engagement, the National Survey of Student Engagement has become the most widely used and most influential. Since its first report in Fall, 2000 the number of participating institutions has grown, with a total of 1642 institutions having taken part since its creation (Kuh, 2009; /participants.cfm, 2017).

While many of the NSEE items are derived from the CSEQ, these measures serve different purposes (Kuh, 2009).  Use of the CSEQ was predominately limited to educational researchers, with few institutions using it for improvement purposes (Kuh 2001; 2009). The NSSE was primarily intended as a tool for institutions for improvement and accountability (Kuh, 2001). According to Kuh (2001; 2009), the purpose of the NSSE is threefold: to serve as an accessible tool for measuring collegiate quality and identify areas of improvement, to determine effective educational practices, and to encourage the use of empirical measures of educational quality.

NSSEs ability to achieve these goals lies in its ability to predict desirable outcomes of higher education (Kuh 2001; 2009; Pascarella & Seifert, 20010). According to Kuh (2001), many desirable learning outcomes have been linked to specific student practices. These activities, such as including collaboration, discussions with diverse others, formative feedback etc., often require substantial time and effort. Through the measurement of these practices, the NSSE proports to measure student engagement and thus, indirectly measure student outcomes.

What does the NSSE Measure? As might be expected, the centerpiece of the NSSE is student engagement.  To educators this information is invaluable. Kuh, Cruce, Shoup, Kinzie, and Gonyea state “What students do during college counts more in terms of what they learn and whether they persist in college than who they are or even where they go to college” (, 2007, p.7). Thus, the NSSE asks students to estimate how frequently they engage in specific, educationally beneficial activities. In addition to student engagement, NSSE also asks students about institutional traits, and their learning outcomes (McCormick & McClenney, 2012).  Data from these two areas complement and enhance engagement data. Measures of institutional traits assess students’ perceptions of the availability of opportunities to be engaged, identifying areas to improve. Students self-reported learning gains learning gains roughly describe student-learning gains on several desirable outcomes (e.g., quantitative reasoning, writing ability, personal/ cognitive growth etc.)

To describe engagement at the institution, NSSE divides student responses to engagement and institutional type questions into subscales (Kuh, 2001; 2009; Pike, 2013; McCormick & McClenney, 2012). Originally, NSSE items were sorted into five “benchmarks of effective educational practice”: “Level of academic challenge,””, “active and collaborative learning,””, “student-faculty interaction,””, “enriching educational experiences,””, and “supportive campus environment” (Kuh, 2001; 2003; 2009; Pike, 2013; McCormick & McClenney, 2012).  Through the NSSE benchmarks, educators could determine how institution may improve and compare engagement at their institution to others (Pike, 2013; McCormick & McClenney, 2012). While useful, information from the NSSE benchmarks was broad. Pike (2006b) restructured items into subsets of questions, giving institutions more detailed and actionable engagement information.

In 2014, the NSSE 2.0 was released (Pike, 2013). Structural changes to the NSSE followed Pikes (2006a; 2006b) example, restructuring the benchmarks into four general themes and ten specific indicators (Table X; Pike, 2013). Through this new structure the NSSE results present a more detailed description overview of how institutions may be improved. In addition to an updated structure, the NSSE 2.0 features clarified wording and other item updates from the NSSE 1.0.

Since the update, the number of participating institutions has not decreased; The NSSE continues to be the prevalent measure of engagement.  In contrast, there seems to be very little independent research that uses the new NSSE. Extensive use of an instrument does not evidence the validity of the inferences drawn from it. However, a dearth of research suggests a lack of concern surrounding NSSE modifications impact on the legitimacy of inferences drawn from the data. The following section addresses this issue, exploring what we do know.

Table X

Themes and indicators of the NSSE 2.0

Theme Engagement Indicators
Academic Challenge Higher-Order Learning
Reflective & Integrative Learning
Learning Strategies
Quantitative Reasoning
Reflective & Integrative Learning Collaborative Learning
Discussions with Diverse Others
Learning Strategies Student-Faculty Interaction
Effective Teaching Practices
Quantitative Reasoning Quality of Interactions
Supportive Environment

(NSSE, 2014)

NSSE Validity

According to Messick (1990),

“Validity is an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of interpretations and actions based on test scores or other modes of assessment.” (p. 5)

In other words, validity evidence provides theoretical and data-based evidence for using an instruments in a certain way. In order to evaluate validity, one must consider the way the instruments data is used.

NSSE data serves two purposes. The first purpose is as the developers intended, as an institutional level indicator of engagement, quality, and comparison (Pike, 2013). The second use, common in research, is to predict learning outcomes at the instructional or student level (Pike, 2013). To validate each of these, the assumptions underlying their interpretations must be identified and their rationally assessed (Kane, 2001; Messick, 1995). While valuable, a comprehensive evaluation assumptions underlying uses of NSSE data is beyond the scope of this study. Instead, I will describe a few key assumptions and relevant evidence.

Engagement’s relationship with outcomes is a foundational assumption to the use of the NSSE. As described earlier, for many educators the value of engagement is its relationship to desirable outcomes (Kuh, 2001). Whether explicitly stated or not engagement is seen as an indicator, if not a proxy for outcomes when other measures are unavailable. When engagement increases so should learning outcomes. By extension, if the NSSE truly measures engagement, it should mimic engagement-outcomes relationships (Kuh 2001; 2009; Pascarella & Seifert, 2010).

Stared briefly, these assumptions are, (1) Engagement positively correlates with outcomes, (2) NSSE measures engagement, (3) NSSEs mimics the relationship between engagement and outcomes, and (4) increased engagement suggests increased outcomes.

The following sections explore these assumptions though Benson’s three-stage program of validation: substantive, structural, and external evidence (Benson, 1998). Unfortunately, limited research on the NSSE 2.0 prevents its extensive use in the following sections. While the two versions are not identical, test content was not drastically altered in the update. Thus, where appropriate, evidence from the NSSE 1.0 supplements that of NSSE 2.0

Substantive Validity. Substantive validity, refers to how well the instrument theoretically relates to the construct it is intended to measure (Benson, 1998).  Evidence of this sort is well documented; manifest by its development by content experts, reliance on decades of engagement research, connection to conceptually similar instruments, and consideration of measurement concerns.

As previously described, NSSE development team was comprised of well-known educational researches. Their intent was to develop a measure to inform institutions of students’ participation in practices tied to desirable outcomes (Kuh, 2001; “Our Origin and Potential”, 2001). As such, the development of the NSSE was anchored in both research and professional experience.

As engagement practices are closely linked to outcomes, the validity of the NSSE is dependent upon its ability to predict learning outcomes (Astin, 1991; Pascarella & Terenzini, 1991; Krause and Coates, 2008, p. 493).  In order to ensure such a relationship between engagement as measured by the NSSE and outcomes, the research team drew upon research demonstrating ties between specific practices and outcomes (Kuh et al., 2001; Kuh, 2009; “Our Origin and Potential”, 2001). Chickering and Gammons’ (1987) “Seven principles of good practice in undergraduate education” was one of these (Kuh et al., 2001b; Kuh, 2009;).  The seven principles are as follows,

1. Encouraging contact between student and faculty

2. Encouraging reciprocity and cooperation among students

3. Encouraging active learning techniques

4. Giving students prompt feedback

5. Emphasizing time on task

6. Communicating high expectations

7. Respecting diverse talents and ways of learning

(Chickering & Gamson, 1987, p.1)

Evident in Chickering and Gammons’ (1987) principles, the behaviors associated with engagement and predictive of validity comprise a broad range of practices. Likewise, the NSSE was designed to measure engagement in a wide range of practices, grouping similar practices into benchmarks (NSSE, 2001; Kuh 2001; 2009). Many such practices, known to be tied to outcomes, were also measured by the CSEQ and others predating NSSE (NSSE, 2001; Pace, 1984; 1995; Kuh, 2009; Kuh, Pace, Vesper 1990). The NSSE developers recognized the strength of these instruments, incorporating pieces of them after careful review items relevant to engagement were isolated. Many of these items were taken from the CSEQ and incorporated into the NSSE (NSSE, 2001; Kuh, 2009).

Thus, upon creation the NSSE was a Frankenstein of items known, or expected, to be predictive of outcomes. Many of these items are still present in the NSSE 2.0, though some were reworded for clarity (NSSE, 2017). It seems reasonable that such clarifying changes would only improve the validity of the NSSE.  Another change, which may have served to increase validity of the NSSE, is a reorganization of its contents to better represent specific types of engagement present in the research. The impact of this revision will be discussed in the following section (Structural Validity).NSSE data quality and not alter NSSEs theoretical foundations.

In addition to constructing a solid theoretical foundation, the developers of the NSSE were cognizant of measurement, particularly the use of self-reported measures (Kuh, 2001; 2002; 2004). As discussed in earlier sections, skepticism about the validity of self-reported behavioral measures is common. With this in mind, the NSSE items were developed and selected according to 5 guidelines for self-reported validity established by prior research (Kuh, 2001; 2002; 2004). These suggest that self-reports are valid if: the information requested is, (1) known by the student, (2) not embarrassing or threatening, (3) worded clearly, (4) deserving of serious thought, and (5) referencing recent activity (Kuh, 2002; Pace, 1984)

Indeed, these guidelines seem reasonable to suggest that responses to engagement-based questions should accurately reflect the student’s perception of his or her actions. However, even those championing a self-reported approach concede that responses may deviate somewhat from the true response (Kuh, 2002; Pace, 1984; Pike 1999). Thus while engagement as measured by the NSSE may not be exactly representative of students actual engagement, evidence suggests that they may be relatively similar (Pike, 1999). Overall, this data suggests a strong theoretical foundation for the NSSE, representative of the engagement construct.

Structural Validity. Structural validity refers to the psychometric properties of an instrument, such as reliability, factor structure, and inter-item relationships (Benson 1998). Evidence of this sort is plentiful, but mixed in its findings on the NSSE. Previously established, the NSSE 1.0 Benchmarks are items grouped together based on statistical and theoretical evidence (Kuh, 2009; McCormick & McClenney, 2012; Pike, Kuh, McCormick, Ethington & Smart, 2011).  Using principle components analyses (PCA) the NSSE creators clustered the questions according to a structure of commonality (Kuh, 2009). Most NSSE validity research has been conducted on this original structure. However, the development NSSE 2.0 introduced a new way to structure the items, as well as clarified item wording (Pike, 2013; NSSE, 2014). Such changes have the capability to change the structural validity of an instrument. Because of this, I will briefly review the structural validity of the NSSE 1.0 and 2.0.

Structure and reliability. Psychometric analyses of the NSSE benchmarks have provided an inconsistent evaluation of the NSSE. Assessments of the validity for the 5 benchmarks and their reliability have been inconsistent. While the NSSE has presented evidence for the structure and reliability of the benchmarks, others have been unable to replicate NSSEs results (Pike 2012; Porter 2011; This has concerned some researchers, especially with the widespread use of the NSSE (Campbell and Cabrera, 2011).

One of the greatest concerns about the validity of the NSSE originated within the first version of the survey. Beyond NSSEs original use of PCS and a report by Kuh et al., (2007), little validity evidence has been presented for the structure of the NSSE in way of psychometric structural analyses. Rather, researchers for NSSE have relied upon measures of clarity of questions, consistency of responses, the “dependability of institutional benchmark scores”, and benchmarks relations to external variables. A series of studies and reports have provided convincing evidence of reasonable reliability of the NSSE, demonstrating consistency over time, institution, and student characteristics (Pascarella et al, 2009; Kuh et al., 2007; Pike 2006;  NSSE 2002, 2005; 2010;2011; 2013). These reports repeatedly describe reasonable internal consistency of benchmarks (α>.70) and strong correlations between NSSE benchmark scores over time (NSSE 2002; 2005; 2010; 2011).

While such evidence suggests the stability of the measure, it does vouch for the NSSE it does not validate the NSSE 1.0 structure. It may seem strange that, aside from its initial creation, NSSE has provided no validation of its structure though factor analysis. This has been justified by reference to the original intent of the benchmarks. According to Kuh (2001) amongst others, the NSSE benchmarks were never intended to represent latent constructs, but rather broad categories of educational practice (Kuh et al., 2001; Pike 2012; McCormick & McClenney, 2012).

Despite this, many studies contain examinations NSSEs structure (Porter 2009; 2011; Swerdzeski et al., 2007; Campbell & Cabrea 2011). On the whole, these studies have failed to come to the same structure as that described by the 5 benchmarks (Porter 2009; 2011; Campbell & Cabrea, 2011; LaNasa  et al., 2009; Lutz & Culver, 2010; Gordon, Ludlum, & Hoey, 2008; Webber, Krylow, Zhang, 2013). Some indicated less than desirable reliability (Gordon et al., 2008; Porter, 2011).

While there is little evidence on the validity of the structure and reliability of the NSSE 2.0 what does exist seems promising (Zilvinskis, Masseria, & Pike, 2015). Many of the studies described above indicated that models with more factor were better fits (Porter 2011; Tendhar, Culer & Burge, 2013). Similarly, Pike (2006a; 2006b) developed a set of 12 subscales as an alternative to the NSSE benchmarks.  These “scalelets” provided information about specific behaviors (Kuh 2006). By using scalelets, Kuh was able to predict learning outcomes better than with the benchmarks (Kuh, 2006a; 2006b). With the evidence discussed above, it seems reasonable to assume that the addition of the engagement indicators to the NSSE would serve to boost its validity.  Indeed, the evidence suggests that this may hold true.

A study by Miller, Sarraf, Dumford, and Rocconi (2016) investigated the factor structure of the ten engagement indicators. Using EFA, they found that ten components matched the ten EIs predicting 60% of the variance. CFA was used to explore the structure of the indicator themes. Overall, their results suggested “adequate” to “very good” model fit for the categorization of the indicators (Miller, Sarraf, Dumford, & Rocconi 2016).

Reports provided by NSSE from the past three years also suggest acceptable reliability of the engaging indicators (“Reliability”, 2017.)  These results suggest good internal consistency (α>.70) and moderate inter-item correlations (.3 < |r| < .7).

In the past research has provided mixed findings about the structural validity of the NSSE. As the NSSE 2.0 themes were adapted from the benchmarks, some of these structural concerns accompanying them may have carried over as well.  However, the addition of specific engagement indicators and clarification of item wording seems appropriate to improve the NSSEs structural validity. Indeed, the limited research on the structure and reliability of NSSE 2.0 supports its validity.

External validity

The third stage of Benson’s (1998) strong program of construct validity is the “external stage.” Although, sound theory and structure are necessities for validity, the instrument’s relationship to external constructs partially evidences the instrument’s measurement of the intended construct (Bandalos Validity p 45; Chronbach & Meehl, 1955; Campbell & Fiske 1959; Messik 1995).

External validity is often tested through convergent and divergent evidence (Bandalos, Validity p 45; Campbell & Fiske, 1959; Messik, 1995).  Bandalos (validity p45) clearly described both convergent and divergent validity,

“…validity arguments involving convergent and discriminant evidence state that test scores should be related to scores from other tests of the same or similar constructs (convergent evidence) and should be less strongly related to scores from tests measuring dissimilar constructs (discriminant evidence).” (Bandalos Validity 45)

If a measure is related to variables it should be if it measures the intended construct, and not related to variables it should not be, then there is evidence that the instrument is measuring its intended construct (Chronbach & Meehl, 1955; Campbell & Fiske, 1959; Messik, 1995).

This type evidence is critical in establishing the validity of the NSSE. The NSSE claims to measure student engagement. Because engagement is related to learning outcomes, if the NSSE truly measures engagement it should be similarly related to outcomes.

Most studies of the NSSEs convergent validity have been conducted using the NSSE 1.0. However, the changes made to the NSSE 2.0 may be such that evidence for the NSSE 1.0 may still be applicable. Zilvinskis, Masseria, and Pike (2017) conducted a study comparing the convergent and discriminate validity of the old 5-benchmark model of the NSSE and the new version. Their findings suggested that both NSSE versions were strongly related to self-report learning gains.  They found the NSSE 2.0 scales could predict learning outcomes more precisely than those in the NSSE 1.0 and the NSSE 2.0. They conclude, “From a practical standpoint, the engagement indicators provided in the new NSSE survey appear to be more useful than previous engagement measures in identifying institutional actions that can enhance certain types of learning outcomes.” I have presented here a brief introduction to the engagement-outcomes relationship and its importance in establishing the validity of the NSSE. Latter I will discuss outcomes and the engagement-outcomes relationship in greater depth.

Measurement Issues. Before proceeding to a discussion of learning outcomes a few measurement concerns outside Bensons (1998) framework deserve attention. These concerns originate around the NSSEs volunteer sampling and use of self-reported measures.

Because participation in the NSSE is voluntary, some have voiced concern that the NSSE samples may not be representative of the population of students. Understandably, the concern is that students at one level of engagement might be likely to respond than another. While some differences between respondents hand non-respondents have been found these differences have been small and inconsistent (Kuh 2003; NSSE, 2005; 2009).  Generally, the research suggests that students who respond to the NSSE do not meaningfully differ from those who do not. (NSSE, 2009; 2010; Sarraf, 2005)

The validity of the NSSE also hinges upon the accuracy of student-self-report data. As mentioned earlier, those constructing the NSSE used guidelines to aid the accuracy of self-reported data (Kuh, 2002; 2009). However, many have suggested that responses to these questions may still be inaccurate (Porter 2001; Porter et al., 2011; Pascarella, 2010; Gonyea 2005). In a review of literature, Porter (2011), found that students may have difficulty accurately describing their behavior over time, and may describe themselves in a more positive light than is accurate. Pike (1999) found that students were prone to overestimating their participation in activities or their gains. However, research on self-reports suggest that despite inaccuracy, the self-reported data and actual data can be correlated (Gawronski, LeBel, & Peters, 2007; Gonyea 2005). Despite a lack of precision in self-reported engagement data, it provides an estimate that is easier to collect than direct measures of student engagement (Fredricks & McColskey, 2012; McCormick, & McClenney 2012; Pike, 1995; 1996). NSSE also collects self-reported data about learning gains. This is a more controversial point than self-reported engagement, and will be discussed later on (Porter, 2011; 2012; Pike, 2012; Fredricks & McColskey, 2012).


As mentioned in Chapter 1, learning outcomes are a foundational part of assessment efforts for accountability and improvement (Ewell, 2005; Gonyea, 2005). Depending on their purpose in the program, student-learning outcomes (SLOs) may refer one of two things. First, they may refer to a statement of the intended consequences of an educational experience (Hussy & Smith, 2003; Proitz, 2010). Alternatively, SLOs may also describe the actual products of educational experiences; that is, what students know, think, or do because of their educational experience (Dugan & Hernan 2002; Harden 2002; Hussy & Smith, 2003; Melton 1996; Spady 1988). For the purposes of this study, learning outcomes will refer to the latter definition.

It is easy to see how evidence of student learning would be valuable in assessment. On one hand, gains in outcomes demonstrate the value of educational experiences (Ewell, 2005). On the other hand, by knowing what students gain from an experience, educators can make changes to increase student learning (Ewell, 2005). Such outcomes are measured at many levels of programming, most commonly institutional, programmatic, or course level (Ewell, 2005). Educators are interested in student gains in a variety of outcomes, from basic content knowledge to critical thinking, skills, or feelings (Nusche, 2008). According to Nusche (2008), these outcomes may be organized into both cognitive and non-cognitive types. Measures of cognitive outcomes are the most prevalent and typically include content knowledge, and development of physical or intellectual skills (Nusche, 2008; Gonyea, 2005). Non-cognitive outcomes are less common and typically describe values, beliefs, or attitudes of students (Ewell 2005; Nusche, 2008).

Measuring outcomes. Because of the high educational value of learning outcomes data, many instruments contain measures of outcomes. The HEIghten by ETS is an instrument solely devoted to the measurement three learning outcomes: Critical thinking, quantitative literacy, and written communication (ETS, 2017).While the NSSE is designed to measure student engagement, it also uses a short set of questions to measure ten learning outcomes (NSSE 2017).   The reader may find it strange that HEIghten, a tool devoted to measuring outcomes, may measure fewer outcomes than the NSSE. The difference in these instruments is one of methodology. The methods used to measure outcomes vary widely based on the type of outcome and the purpose of the data. One way measures of outcomes vary is whether they use self-report or direct measures.

Direct /Indirect measures. Whether direct or indirect measures are used is often a matter of what the outcome measures and convenience (Nusche, 2008; Pike 1996). For example, non-cognitive outcomes often ask about content that does not lend itself to direct measurement (e.g., attitudes; Nusche, 2008). Because of this, most non-cognitive measures use self-report methods (Gonyea, 2005). Cognitive type outcomes tend to be observable they lend themselves more easily to direct measures (Nushe, 2008). Generally, direct measures seem to be the golden standard of measurement (Gonyea 2005). However, self-reported measures are generally more easily collected than direct measures (Pike 1996). Because of this, self-reported learning gains (SRLG) are often used over direct measures and are considered proxies for direct measures (Pike, 1994; 1995; Price, & Randall, 2008; Sitzman, Ely, Brown, & Bauer, 2010).

The validity of  SRLGs as a proxy for direct measures of learning gains (DMLG) is frequently questioned (Bowman, 2010; Gonyea, 2005; Pike 1995; 1996; Pace, 1984; Kuh 2001). The estimation of learning gains requires at least two time points (Fulcher, Good, Coleman, & Smith, 2014). Researchers have questioned whether students can accurately recall and compare their past knowledge to their current knowledge (Bowman 2010; Carrell & Willmington 1996; Gonyea, 2005). Others have suggested that SRLG and DMLG measure two separate constructs; self-reports represent perceptions of learning, while direct measures better represent actual gains (Carrell & Willmington, 1996; Pike 1996).

Pike (1995; 1996) noted that if direct and self-reported questions measure the same construct they should be highly correlated and thus valid proxies for each other (Pike, 1995; 1996). In order to test the validity of SRLGs as a proxy for DMLGs researchers have conducted many studies comparing student self-reported outcomes to a direct measure of the same construct (Bowman, 2010; Carrell & Willmington, 1996; Gonyea, 2005). With some few exceptions, most studies describe weak to moderate relationships between self-reported and direct measures of the outcome (Bowman, 2010; Carrell & Willmington, 1996; Gonyea 2003; 2005; Murphey, 2008; Pike 1995; 1996; Pohlmann & Beggs 1974; Price & Rand, 2008; Sitzmann et al., 2010). Most interpret these findings as evidence against the equivalence of SRLG and DMLG (Gonyea, 2005; Murphey, 2008; Price & Randall 2008; Sitzmann et al., 2010). As with engagement behaviors, research has found that students typically overestimate their knowledge or skills (Luce, & Kirnan, 2016; Pike 1999; Porter, 2011; Volckman, 2011). What’s more, low ability students overestimate their ability more than high ability students do, a phenomenon known as the Dunning-Kruger Effect (Dunning & Kruger 1999; Cole & Gonyea, 2010; Hacker, Bol, Hogan, & Rakow, 2000; Kuncel, Crede & Thomas, 2005; Luce & Kirnan, 2016). Generally, these findings suggest that self-reported measures do not accurately represent students’ actual ability.

One alternative explanation for the poor relationship between self-reported outcomes refers to differences specificity between the two methods. Pike (1994; 1995; 1996) suggested that weak relationships might be due to misalignment of the content of measures; when self-report and direct measures ask about similar, content they are more strongly correlated. Additionally, Pace (2005) noted that relationships might be attenuated by the difference in scope of measurement between methods (Pike, 1995; Astin, 1993). Whereas self-reports are generally broad, direct measures tend to be more focused on specific knowledge or abilities (Pike, 1995). Support for this explanation has been found in several studies (Dumont & Troelstrup 1980; Pike, Year).

In summary, while self-reported measures may be appropriate for some types of content, their use as a proxy for direct measures of learning outcomes gains is unsupported (Gonyea, 2005). Because of this, researchers should avoid using SRLGs as proxies for DMLGs and practice caution when interpreting SRLGs (Gonyea, 2005; Bowman, 2010; Sitzmann, et al., 2010).

Engagement Outcomes section in progress. Feel free to take a look. I will also send you an updated copy when the basic organization is taken care of.

Link:  “C:UserskerrcsDropboxThesisThesis Drafts For KestonDraftsChapter 2Chapeter 2 CombinedCurrent DraftsEngagement and Learning Outcomes.docx”

Expected Relationships

Outcome Measures
NSSE-Self Report-General Self Report-Specific/Aligned Direct Measures

Engagement Measures

NSSE General Moderate (.40) Weak (.15) Moderate/Strong (.75)
Aligned/Specific Questions Weak (.20) Moderate (.60) Strong (.80+)
Engagement Questions Overall Specific General
Specific 1
General Moderate (.70) 1
Outcomes Types 


NSSE-Self Report-General Self Report-Specific/Aligned Direct Measures
NSSE-Self Report-General 1
Self Report-Specific/Aligned Moderate (.50) 1
Direct Measures Weak (<.35) Moderate (.50) 1
Outcomes Types 

NSSE General

NSSE-Self Report-General Self Report-Specific/Aligned Direct Measures
NSSE-Self Report-General 1
Self Report-Specific/Aligned Moderate (.50) 1
Direct Measures Weak (<.35) Moderate (.50) 1
Outcomes Types 

Specific Engagement

NSSE-Self Report-General Self Report-Specific/Aligned Direct Measures
NSSE-Self Report-General 1
Self Report-Specific/Aligned Moderate (.50) 1
Direct Measures Weak (<.35) Moderate (.50) 1
Engagement Questions Self report-Outcomes Specific General
Specific 1
General Moderate (.70) 1
Engagement Questions 

Self report- aligned Outcomes

Specific General
Specific 1
General Moderate (.70) 1
Engagement Questions 

Direct Measured Outcomes

Specific General
Specific 1

EssayHub’s Community of Professional Tutors & Editors
Tutoring Service, EssayHub
Professional Essay Writers for Hire
Essay Writing Service, EssayPro
Professional Custom
Professional Custom Essay Writing Services
In need of qualified essay help online or professional assistance with your research paper?
Browsing the web for a reliable custom writing service to give you a hand with college assignment?
Out of time and require quick and moreover effective support with your term paper or dissertation?
Did you find someone who can help?

Fast, Quality and Secure Essay Writing Help 24/7!