There is something funny about grades, or course marks. On the one hand, postsecondary institutions and policy-makers lack confidence that they are valid predictors of college performance or academic readiness for college or career. FIDDLING WITH GPA, TESTS.
On the other hand, perhaps more than any other form of summative academic assessment of individual students, grades are used in ways that matter. Course grades have immediate and long-term academic consequences for students as they influence students’ placement into classes (e.g., honors, advanced placement, and “repeater” classes) and grade promotion. Grades also affect selection for awards and honors that in turn affect post-secondary opportunities for students (see, for instance, Tyson & Roksa, 2017). Grades are the most important factor that colleges and universities use in selecting students for admission, according to the most recent survey on college admissions (National Association for College Admission Counseling, 2016). Grades are also significant at the administrative and school leadership level. Because they influence school and course promotion, grades affect graduation rates and school performance indices. For instance, in New York State, metrics of school accountability include 4-year graduation rate, which is determined by the rate at which students pass courses to achieve the required credits (New York State Education Department, 2017). After seeing poor grades, parents may chide their children to perform better, hire a tutor to assist in making improvements, or contest the grade if they believe it is inaccurate or unfair, and grades carry emotional valence that can affect student self-image (Thomas & Oldfather, 1997).
What is the unique thing about grades that differentiates them from test-based summative reports of summative achievement? The teacher. The teacher has a unique role of seeing and interacting with the student in the academic environment daily, and based on sheer quantity of available information on which to base an assessment of learning, is the best-informed expert for summative assessment of individual students. In the US, it is hard to imagine a school in which the teacher did not have a key role in end-of-course summative assessment.
As a form of measurement, what are grades? The letter grade or the marks students are awarded in school course are numerical representations of a quality related to academic performance. Grades clearly meet at least the ordinal level of measurement. They can be used to rank students similarly to other scores based on dissimilar methods such as SAT, and can be used to predict future performance in a similar context to that in which they were obtained, such as college GPA. Further, grades are based on multiple measures of student performance, and thus meet the criterion for technical quality of multiple measures and often, multiple methods. Given this evidence of convergent validity, predictive validity, and use of multiple measures to improve reliability, it is therefore perhaps curious that in the United States, the use of grades for decision-making continues to be a source of some controversy.
Further, there is considerable evidence that grades are not “pure” measures of the construct of academic achievement. Repeatedly, studies find that teachers report that they base most of a student’s grade on academic achievement. Academic achievement has repeatedly been identified as the dominant type of performance that correlates with teachers’ grades (Brookhart, 2015). Thus, grades may be said to measure primarily student achievement. But quantitative analyses also demonstrate that assigned grades significantly relate to factors other than achievement, (CITATIONS). Grading practices have been found to vary between teachers and even within a single teacher’s practice (Cizek et al., 1995). For at least the last two decades, research has overwhelmingly concluded that teachers assign grades in an unpredictable manner, based on both achievement and non-achievement factors (Cizek et al., 1995; Cross & Frary, 1999; McMillan, 2001; McMillan et al., 2002). For the purpose of simplicity, we will call these “achievement factors” and “non-achievement factors.” Because teachers consciously or unconsciously report grades that are influenced by both achievement and non-achievement factors, the construct represented by grades is sometimes considered multidimensional (Brookhart et al., 2016). It has also been referred to as a “hodgepodge” (Brookhart, 1991).
Grades are one of the most important and frequently communicated measures of school performance, but significant questions surround their validity as measures of school performance. These questions relate to how teachers construct grades – the processes they use, the factors they consider. Because these factors and processes may relate to teachers’ belief systems about what grades measure and how they relate to instruction, we have sought over the last decade to understand unorthodoxies in teacher and teacher candidates’ beliefs about grades. We have focused our research on measuring and understanding teacher and teacher candidates’ beliefs about the use of unorthodox practices in grading, because there is widespread consensus that achievement is the dominant factor that composes a school grade—what is less easily explained is source of remaining variance in grades. We seek to understand whether there are other, systematic factors related to instruction and assessment that may influence grading decisions. Ultimately, knowledge about how teachers understand grades may inform efforts to improve grading practice and make grades more reliable and valid.
PURPOSE STATEMENT :The paper reports on the development and validation of a research instrument for the study of teacher or teacher candidate beliefs about grading.
Background: Grading Practices and Beliefs
Academic achievement has repeatedly been identified as the dominant type of performance that correlates with teachers’ grades (Brookhart, 2015). Thus, grades may be said to measure primarily student achievement. However, many studies, both quantitative and qualitative, have shown that teachers explicitly consider factors other than achievement in composing grades, either explicitly through the components of course performance that they plan to include in calculated grades, or implicitly when they report factors they are likely to consider when awarding a grade. Research demonstrates that teachers sometimes assign grades with consideration to work habits (Farkas, Grobe, Sheehan, & Shuan, 1990); self-control in school work or “getting it done” (Duckworth, Quinn, & Tsukayama, 2012; McMillan, Myran, & Workman, 2002); attitude, engagement, and interest (Russell & Austin, 2002; Willingham, Pollack, & Lewis, 2002); effort and classroom conduct (Cizek et al., 1995; Cross & Frary, 1999; McMillan et al., 2002; Randall & Engelhard, 2010). Grading practices have been found to vary between teachers and even within a single teacher’s practice (Cizek et al., 1995). For at least the last two decades, research has overwhelmingly concluded that teachers assign grades in an unpredictable manner, based on both achievement and non-achievement factors (Cizek et al., 1995; Cross & Frary, 1999; McMillan, 2001; McMillan et al., 2002). Because teachers consciously or unconsciously report grades that are influenced by both achievement and non-achievement factors, the construct represented by grades is sometimes considered multidimensional (Brookhart et al., 2016). It has also been referred to as a “hodgepodge” (Brookhart, 1991).
To ameliorate internal and between-teacher conflicts, various methods to encourage teachers to stick to achievement or standards in grading have been implemented. Grades can be referenced to standards, or separate space on grade reports can be allotted to achievement and work habits, effort, or growth. There is thus far little evidence that reference to standards actually affects teachers’ perceptions about the basis for awarding grades. Swan, Guskey, and Jung (2014) reported that both teachers who did and did not use standards-based report cards indicated that they perceived standards-based grade reports to provide more and higher quality information compared to traditional grade reports. However, the teachers’ perceptions about their own grading decisions were not measured. In Canada, in spite of implementation of a standardized system for reporting grades, Canadian teachers were found still to espouse grading based on improvement (Tierney et al., 2011). In Australia, physical education teachers continued to base their grades on internal expectations and intuitions about students, despite reference to standards in their discipline during the grading process (Hay & Macdonald, 2008). Further, no studies were found to indicate that separate reporting of achievement and effort has been linked to differences in teacher perceptions about how to award grades. Indeed, in the U.S., where most primary school grading includes separate marks for effort, primary school teachers still endorsed enabling approaches to grading (McMillan, Myran, & Workman, 2002).
Relatively little research has shed light on why teachers include factors other than achievement in grading. McMillan (2003) discussed educational philosophy as one of five emergent themes that appeared to influence grading. Implicit educational philosophies may be formed out of teachers’ “foundational beliefs and values about education in general” (McMillan, p. 37). Such educational philosophies may include beliefs about how students learn and value systems.
Systematic investigations of teachers’ grading practices and perceptions about grading began to be published in the 1980s and were summarized in Brookhart’s (1994) review of 19 empirical studies of teachers grading practices, opinions, and beliefs. Five themes were supported. Second, teachers believe it is important to grade fairly. Views of fairness included using multiple sources of information, incorporating effort, and making it clear to students what is assessed and how they will be graded. This finding suggests teachers consider school (BROOKHART et al CENTURY) achievement to include the work students do in school, not just the final outcome. Compared to the number of studies about teachers’ grading practices, relatively few studies focus directly on perceptual constructs such as importance, meaning, value, attitudes, and beliefs. Several studies used Brookhart’s (1994) suggestion that Messick’s (1989) construct validity framework is a reasonable approach for investigating perceptions. This framework focuses on both the interpretation of the construct (what grading means) and the implications and consequences of grading (the effect it has on students). Sun and Cheng (2013) used this conceptual framework to analyze teachers’ comments about their grading and the extent to which values and consequences were considered. The results showed that teachers interpreted good grades as a reward for accomplished work, based on both effort and quality, student attitude toward achievement as reflected by homework completion, and progress in learning. Teachers indicated the need for fairness and accuracy, not just accomplishment, saying that grades are fairer if they are lowered for lack of effort or participation and that grading needs to be strict for high achievers. Teachers also considered consequences of grading decisions for students’ future success and feelings of competence. Fairness in an individual sense is a theme in several studies of teacher perceptions of grades (Bonner & Chen, 2009; Grimes, 2010; Hay & Macdonald, 2008; Kunnath, 2016; Sun & Cheng, 2013; Svennberg et al., 2014; Tierney, Simon, & Charland, 2011). Teachers perceive grades to have value according to what they can do for individual students. Many teachers use their understanding of individual student circumstances, their instructional experience, and perceptions of equity, consistency, accuracy, and fairness to make professional judgments, instead of relying solely on a grading algorithm. These claims suggest that grading practices may vary within a single classroom, just as it does among teachers, and that this variation is viewed, at least by some teachers, as a needed element of accurate, fair grading, not as a problem. In a case study of one high school mathematics teacher in Canada, M. Simon et al. (2010) reported that standardized grading policy often conflicted with professional judgment and had a significant impact on determining students’ final grades.
Qualitative research has sought to explain in more depth why teachers endorse such diverse grading policies. McMillan (2003) proposed that teacher beliefs about the purposes of education in general (i.e., their educational philosophy) and their beliefs about the purposes of assessment (e.g., to motivate, to promote understanding) relate to the assessment practices they are likely to endorse. McMillan identified a desire among teachers to “pull for” their students, resulting in a bias towards grading practices that are generally lenient or are heavily adapted to individual student needs (see also Bonner & Chen, 2009; Cizek et al., 1995).
Studies that have looked particularly at the perceptions of teachers as they generate marks or scores on summative assessments provide insight into how teachers manage the complexities at play in their perceptions about grading. Davison (2004) asked senior secondary English language teachers in Australia and Hong Kong to verbalize their thinking as they graded responses to writing assessments. Davison classified teachers in groups along a continuum of assessment orientations, ranging from the “assessor-technician” who was highly rule-bound, through the “principled yet pragmatic professional,” to the teacher at the extreme other end of the continuum who took the stance they termed “assessor as God,” expressing an intuitive and nearly omniscient understanding of students as individuals and little need for recourse to rules, standards, or objective assessment information during the grading process. In Australia, teachers were reported to arrive at grading decisions through a complex negotiation of cognitive and social factors which included, but was far from determined by, individual beliefs (Wyatt-Smith, Klenowski, & Gunn, 2010). In reviewing the literature on classroom summative assessment, Moss (2013) commented that even when teachers are aware of recommendations for grading practice, “they often see the realities of their classroom environments and other external factors imposed on them as prohibitive” (p.252).
Grading and Other Teacher Beliefs
Evidence clearly supports that grades are often assigned in ways that reflect an orientation towards constructivism. In assigning grades, many teachers consider student behaviors like effort and engagement that enable academic achievement, in addition to academic achievement per se. The idea that student effort and engagement should be part of the assessment of student performance is consistent with the constructivist view that learning is a process, not only an outcome. In our own research, we have found a direct relationship between constructivist beliefs and beliefs about academically enabling behaviors. Evidence that teachers in non-core content areas may include more academic enabling behaviors is consistent with the constructivist view of the developmental nature of the learning process.
Little evidence has accumulated thus far that teachers’ beliefs about awarding grades stem systematically from other individually held beliefs outside the grading or scoring domain. Cicmanec, Johanson, and Howley (2001) found that teachers’ orientation to pupil control was not a strong predictor of grading practice compared to contextual factors in the classroom such as class size and the proportion of students at-risk for failure. Bonner and Chen (2009) found only a small, though statistically significant, relationship between academic enabling beliefs and a constructivist orientation. However, research continues to suggest that knowledge and training relate to perceptions about grading and summative assessment. For instance, Tierney and colleagues (2011) reported that among teachers who experienced conflicts about grading, a plurality reported only “some degree” of awareness of recommended principles for grading. BONNER AND CHEN 2009
In our own research, we have attempted to test the theory that academic-enabling approaches to grading may be partly explained by a constructivist orientation (Bonner & Chen, 2009). We created a survey instrument with vignettes about grading according to the academic-enabling approach (Survey of Grading Beliefs [SGB], 2009, 2017). We studied the relationship between preservice teachers’ grading beliefs and their beliefs about a constructivist orientation to learning and teaching. We found that several vignettes describing grading based on effort or improvement had common variance, indicating that they could be grouped as a distinct factor in grading beliefs. We also found that a majority of preservice teachers endorsed the academic-enabling approach to grading, and that academic enabling had a significant positive correlation with favorable views of a constructivist approach to teaching. More elementary preservice teachers endorsed the academic-enabling approach to grading than did secondary preservice teachers.
Continuing our work on constructivist views of teaching (Chen & Bonner, 2017), we studied a group of novice teachers using an updated version of the SGB and interviews. The interview component of the study allowed us to delve into teacher decision-making processes and the beliefs underlying academic-enabling grading practices. Teachers’ responses indicated that they were readily able to infer rationales for academic-enabling grading practices shown in the SGB vignettes, even when they might not support the practices themselves. Teachers provided two basic rationales for such practices, which the researchers related to themes of real-world pragmatism and a “success orientation.” Teachers’ written responses indicated that effort, collaboration, and participation were very relevant student behaviors for real-world success. One teacher specifically pointed to collaboration as a skill emphasized in contemporary Common Core standards. Teachers’ responses demonstrated a “balancing act of grading”: they were concerned about students’ readiness to face the demands of the real world (e.g., college and workplace), as well as interested in assigning grades based on achievement-related evidence. In terms of a success orientation, we found a theme that converged with Sun and Cheng’s (2014) survey study of Chinese teachers: the teachers we interviewed perceived that through grades they were able to encourage students to persist and progress. Their responses revealed additional nuances. Teachers tended to consider that grading practices that would be described as academically enabling had social-emotional benefits of student empowerment, stress reduction, and motivation, even though they were aware that their use of such practices resulted in a lack of evidence of mastery on all topics for each individual. They were often willing to trade off precision for motivational benefits.
Finally, we found that the teachers we interviewed associated academic enabling practices with areas of instruction that focused on skill development rather than topic mastery. Other evidence has indicated that academically-enabling behavior or substantive engagement may be more of a factor in teachers’ grades in certain content areas such as physical education (PE), music, and art (Bowers, 2011; Russell & Austin, 2010), sometimes referred to as “non-core” subjects. For instance, Russell and Austin (2010) examined district-wide secondary music teachers’ assessment practices. On average, music teachers weighed non-achievement factors very heavily in course grades, particularly attendance, attitude, and amount of practice time. Ninety-three percent of music teachers weighed student attitude in grades, with an average weight of 27%. Although there is no direct evidence that PE, music and art teachers share a particularly constructivist frame for teaching, teachers in these content areas clearly show through their grading practices a developmental, individualized approach that is consistent with constructivism.
In terms of behaviorism, It is not clear from these studies what kind of reasoning underlies the tendency to reward or punish classroom behavior through grading practices. As with the academic-enabling approaches described above, in our own research we have tried to relate management approaches to grading to learning theories. We have found that a behavioral management approach to grading beliefs correlates with traditional views of learning similar to those represented under theories of knowledge acquisition, but does not correlate with constructivist views (Bonner & Chen, 2009). In another study, we examined the management approach to grading in more detail, again using a combination of the SGB and qualitative methods (Chen & Bonner, 2016). We found that some teachers’ reasoning about the management approach referenced the need to establish a classroom culture of respect or establish teacher authority. This is similar to Sun and Cheng’s (2014) finding that some Chinese teachers emphasized the importance of strictness or student discipline in classrooms. However, in our study, most teachers did not endorse using grades to reward or punish student behavior, and those who did often elaborated on the need to obtain additional sources of information on the child, such as grade level, subject matter, student motivation, and individual circumstances. Teachers valued taking the “whole child” into account when they assigned grades. They did not take the use of grades to reward or punish superficially, and were aware of the high consequences of grades and the meanings that grades represent to the learners, parents, teachers, schools, and society at large (Chen & Bonner, 2016).
SUMMARY: Together, the research on grading practices and perceptions suggests the following four clear and enduring findings. First, teachers idiosyncratically use a multitude of achievemenand nonachievement factors in their grading practices to improve learning and motivation as well as document academic performance. Second, student effort is a key element in grading. Third, teachers advocate for students by helping them achieve high grades. Finally, teacher judgment is an essential part of fair and accurate grading. …Relate to PROCESSES, see Kane? Cronbach? Other?
OVERALL – is grading a hodgepodge, uncorrelated with anything else?
QUESTIONS WE ASK — are there systematic factors that relate to variability in teacher/teacher candidates’ beliefs systems that relate to their inclination to support grading based on factors other than achievement?
If so, how are those factors associated with teacher beliefs about instruction and assessment practices other than grading?
We report on an iterative process of development, validation, and revision of the Survey of Grading Beliefs.
Measurement of Grading Beliefs
Some researchers (Liu, 2008a; Liu, O’Connell, & McCoach, 2006; Wiley, 2011) have developed scales to assess teachers’ beliefs and attitudes about grading, including items that load on importance, usefulness, effort, ability, grading habits, and perceived self-efficacy of the grading process. These studies have corroborated the survey and interview findings about teachers’ beliefs in using both cognitive and noncognitive factors in grading. Guskey (2009a) found differences between elementary and secondary teachers in their perspectives about purposes of grading. Elementary teachers were more likely to view grading as a process of communication with students and parents and to differentiate grades for individual students. Secondary teachers believed that grading served a classroom control and management function, emphasizing student behavior and completion of work. Some studies have successfully explored the basis for practices and show that teachers view grading as a means to have fair, individualized, positive impacts on students’ learning and motivation and, to a lesser extent, classroom control.
Other researchers have previously developed measures to capture teacher practices and/or beliefs about assessment. One measure related to teacher grading practices was a 44-item questionnaire, again directed exclusively at secondary teachers of academic subjects, developed by Frary et al. (1993). This instrument combined questions about actual teacher practice with questions, many of which were focused on teacher use of methods for ranking students and calculating grades; only one of the six broad factors addressed in their survey concerned teacher endorsement of the use of factors other than achievement to determine course grades. McMillan (2001) developed a measure about factors, types, and cognitive level of assessments that teachers use to assign grades, which has been administered to teachers at both secondary and elementary levels. Questions on this survey all relate to actual teacher practice, and are introduced by the stem that asks teachers to reflect on the basis of final first semester grades in a single class.
Manke and Loyd (1990) developed scenarios about grading practices, which were also administered, with modifications, by Brookhart (1993). The scenarios focused on consideration of effort, treatment of missing work, and grading policies when scores show improvement. All the scenarios related to high school or middle school situations in specific subject areas. Randall and Engelhard instrument::::
Because existing measures were either focused on teacher practices or on secondary teachers’ beliefs and opinions exclusively, we found it necessary to develop our own measure targeted at perceptions rather than practice and capable of being administered to naïve teacher candidates or in-service teachers at either the elementary or secondary level.
Further, we have focused on preservice teachers. PERSISTENCE OF BELIEFS. According to these related theories, beliefs flow from and express attitudes and values. Beliefs provide the basis for dispositions to act, which are profound predictors of behaviors (Ajzen, 2005; Bandura, 1986). From the perspective of motivation theory, individuals’ attitudes and beliefs about how well they will do on an activity, and the value they ascribe to the activity, partly explain their persistence during performance of the activity (Wigfield & Eccles, 2000). Thus, teachers’ beliefs, values, and self-efficacy or perception of competence in assessment (Bandura, 1986) and their perceived behavioral control over assessment (Ajzen, 2005) are expected to shape their assessment actions. Teacher beliefs are a “messy construct” to define and measure, but important because they strongly affect teacher behavior (Bandura 1986; Pajares 1992). They are difficult to change, even with new information, as teachers like all individuals tend to persevere even in erroneous prior beliefs rather than act on new information (Nisbett and Ross 1980). The impact of stable teacher beliefs on subsequent teacher practice may be considerable. Instruction and assessment often happen in relative isolation within classrooms, with a degree of teacher autonomy (Swanson and Stevenson 2002). Although teacher autonomy in the US has become more limited over the last 15 years (Milner et al. 2012), US teachers are still largely responsible for creating their own daily lessons, assignments, and assessments.
Because assessment is a professional responsibility as well as an individual activity, teachers are also affected in their perceptions and behaviors by the situational constraints of their sociocultural and policy contexts. New information from outside that is internally incompatible with prior beliefs generates stress or cognitive dissonance (Festinger, 1957). Although the “imposed physical and socio-structural environment is thrust upon people whether they like it or not” (Bandura, 1997, p. 156), individuals have leeway in how they construe and react to new information and conflicts. The amount of stress individuals experience relates to personal self- efficacy for tasks and characteristics of the environment (Bandura, 1997). In regard to teachers and assessment, teachers who believe in fulfilling their professional responsibilities and also believe that the purpose of assessment is to promote student learning may experience dissonance if they associate new duties with different assessment purposes (Festinger, 1957). They may, however, be able to learn from and synthesize competing beliefs if they perceive themselves to have high self-efficacy for assessment and to be in a supportive environment.
We have additionally found that novice teachers differ little from teacher candidates in their grading beliefs. (CHECK 2017 paper)
We have grouped phases of that work into three stages: Initial development and Study 1; intermediate development and Study 2, and revision with results of validation, Study 3. Study 1 and Study 2 have been published previously and those results are briefly summarized, while the validation study is presented in detail.
First stage of development and Study 1
The researchers began development of the SAB instrument in the pilot phase of instrument development of what we then called the Survey of Assessment Beliefs in fall 2005. Content for SAB items drew on previous studies already cited. We also drew on results from Stiggins, et al. (1989), who used case study methods to examine the assessment practices of 15 high school teachers. They found discrepancies between traditional measurement principles and teacher beliefs related to what student characteristics should be reported in grades, the quality, quantity, and consistency of grading data gathered, and policies for deciding borderline cases. Another source of information about opinions of pre-service teachers was a recent study by Green, et al. (2007), who surveyed 169 in-service and pre-service teachers about ethics in classroom assessment, and found that many considered a number of grading practices ethical that are considered unethical under professional standards, including consideration of student growth and effort in grading, and heavy weighing of class participation in grades. These studies helped us pinpoint common misconceptions among teachers about grading practices.
Prior to writing items for our survey, we made a list of grading practices and policies which teachers have been found to endorse although they diverge from professional recommendations. For each nonstandard or unorthodox practice on the list, each author wrote a brief vignette describing a teacher using this practice. Thus each practice was illustrated by two vignettes. We used vignettes because we were not interested in finding out the specific practices teacher candidates would actually adopt about grading, but the practices they believed they should endorse ( i.e., their sense of the norms of the teaching profession). Vignettes have been found to be a useful tool in the social sciences for eliciting information about respondents’ sense of cultural norms (Finch, 1987), especially when the scenarios proposed in the vignettes are realistic and relevant to the respondents’ lives, as was true in the case of our grading scenarios. Teacher candidates were asked to rate on a scale of 1 to 6 (1=strongly disagree, 6=strongly agree) the extent to which they agreed with the practice illustrated in the vignette.
The initial version of the SAB included 23 items. Our preliminary organizational plan was to have items relating to the following five categories: a) mixing academic achievement and other factors in course grades; b) individualization in grading; c) general opposition to use of paper-and-pencil tests to derive scores for grading; d) use of grades to manage student behavior; e) general leniency in grading. The first four of these categories were directly aligned with the four dimensions of our theoretical framework in measurement theory. The leniency in grading category was drawn from findings of other empirical research on teacher beliefs and practices in assessment, especially the finding of “success bias” among teachers (Cizek et al., 1995/1996). It relates to our theoretical framework of measurement in that if grades are intended to represent academic achievement, they must accurately distinguish levels of achievement without a bias either towards success or failure. Based on preliminary piloting, changes were made in several items on the SAB, and some items were dropped or added. The revised instrument consisted of 26 items.
With data from a sample of preservice teachers (n = 222), we used exploratory principal axis factoring to explore the structure of this, the first version of our instrument. Initial examination of the unrotated solution for the 26 item survey showed a very low communality for Item 24, which also failed to load meaningfully on any factor and was dropped from subsequent analyses. Using the remaining 25 items, and based on examination of the scree plot, eigenvalues, and percent variance accounted for by each factor prior to rotation (> 5%), four factors were extracted. Based on evidence from these validation tasks, we summarize the four grading beliefs factors.
We used several methods to explore the validity of the content of items on the first version of the instrument and identify the nature of the factors suggested in the exploratory analysis of the instrument’s structure. We determined our identification of the four factors based primarily on a judgmental task eliciting factor descriptions based on grouped items by experts. First, two expert judges with extensive training and experience in educational psychology and educational measurement were independently provided with the SAB measure, organized into four clusters of items according to the salient loadings from the factor analysis. These judges were asked to write a description of the common aspects of each group of items. One judge found that the items grouped under the first factor related to concern (or lack of concern) for equity and consistency in grading policies, while the other judge found the grading practices in the vignettes vulnerable to issues of personal bias. For items grouped under the second factor, one judge described the common feature as a concern to reward students for effort; similarly, the other judge described the vignettes as showing positive reinforcement through grading. For items grouped under the third factor, one judge found a preference for performance-based assessment; the other judge identified the items as relating to establishing grades through alternative assessment methods. For items grouped under the fourth factor, one judge found the items depicted use of grades to gain compliance on class tasks, and the other judge identified the items as depicting use of grades for behavior management. These results were incorporated into our final interpretations of the factors. We also conducted a judgmental task matching items to factor descriptions and conducted think-alouds with a sample of four teacher candidates.
Based on these sources of evidence, we identified the four factors as follows: F1 consisted of 7 items, with an internal consistency reliability estimate of .67 based on unit-weighted factor scores. High scores on this factor indicate a “generous” approach to grading, based on making high grades “easy” to obtain for individuals or an entire class. F2 consisted of 7 items with an internal consistency reliability estimate of .67. High scores on this factor indicate a teacher candidates’ academic enabling approach to grading, to borrow a phrase from McMillan (2001), based on consideration for effort and engagement. F3 consisted of 6 items, with an internal consistency reliability estimate of .69 based on unit-weighted factor scores. High scores on this factor indicate a tendency to base grades exclusively on information derived from alternative forms of assessment, formal or informal, in preference over the paper-and-pencil test. F4 consisted of 5 items with an internal consistency of .53. High scores on this factor indicated the use of grades to reward or punish, a behavioral management approach to grading.
Second stage of development and Study 2
Because the reliability coefficients for the initial version of the SGB were low, we began the next stage of the development by adding and dropping items. We adapted only the items from the original survey: SAB_F1, SAB_F2, and SAB_F4 (Bonner & Chen, 2009). These three factors are intended to measure teachers’ grading beliefs that deviate from the practices recommended by measurement experts. We dropped all six items loading on F3 that related to assessment format preferences rather than grading. We deemed this content to be not strongly pertinent to the study of grading beliefs. We modified those items into a simple Likert time scale of 5 items, without vignettes, and continued to use it in survey research (e.g., in press). In addition to dropping these items, we added items to the F4 “management approach” factor to improve reliability, and dropped 3 items that were either reverse-coded in the previous version of the instrument or had particularly weak factor loadings. As before, the teachers were asked to rate—from 1 (strongly disagree) to 6 (strongly agree)—the extent to which they agreed with the scenario illustrated in each vignette. Using this 19 version item of the SGB, we also added an inquiry into respondent thinking processes to our survey administration procedures, providing an opportunity for participants who volunteered to complete a second part of the survey to provide open-ended responses that explain a rationale for endorsement of each item.
The SGB developed and used in the second study contained 19 vignettes. Analyses were performed with data from 203 inservice teachers. Because the items were adapted from our prior study with preservice teachers, we assumed a three factor structure (having dropped F3) and although we explored the data for study 2 with EFA, we did not attempt at this point to confirm the structure. We determined the factor labels by evaluating descriptions of behaviors elicited on grouped items and compared the items within each factor to the original SAB (Bonner & Chen, 2009) factor descriptions.
Based on these processes, we made a determination about final factor labels and descriptions for this version of SGB, targeting inservice teachers’ grading beliefs. We found that with a three-factor solution, one factor consisted of six items with internal consistent of .77, and corresponded to the “easy” approach to grading, four common items to the structure in Study 1. One factor consisted of 6 items with an internal consistency of .71, and corresponded in content to the previously established F4, the management approach to grading, as well as having three items common to the previous version in common. The third factor consisted of six items with internal consistent of .60, and corresponded to the “enabling” approach to grading, with five common items to the structure in Study 1. Three items were new and loaded as expected on the management factor, and four items changed factors.
Third stage of development: Current study
As we continued to refine definitions of the clearest systematic differences in the structure of teachers’ beliefs about grading that influences their responses to the SGB, we added exploratory items. A third version was analyzed in 2013 on a 24 item version SGB. Using a random split of 435 cases, we performed an EFA which suggested a four-factor model. This model was then tested with a confirmatory approach with the second half of the sample, 218 cases. The hypothesized model was not confirmed, so a further exploratory approach to CFA was used with that data set. In this case CFA was used as an exploratory technique, to explore better fitting model structure via modification indices (MI). This technique is useful in exploring sources of poor fit, areas of strain, and model revision (Harrington, 2009). MI were analyzed, data from the EFA were examined, factor loadings and standardized residual covariances were analyzed, and model fit indices were noted. Several items were dropped, and an alternative model was introduced with 17 items. The overall fit indices suggested that the revised 17 item four factor model moderately fit the data, 2(112) = 205.50, p = .00, SRMR = .07 RMSEA = .06 (90% CI = .05 – .08), CFI = .88, GFI = .90, and TLI = .86. Although this was far from a strong fit to the data, it was impossible to adequately assess the validity of the model as all the data had been gathered in the context of the 24 item survey. This modified model could not be confirmed in that study and needed to be tested and confirmed on separate samples in future EFA and CFA research.
Beginning in 2013, we administered a 17 item survey based on the above model and administered it to a new sample of participants. In addition to the revised SGB, we administered three scales. The first was a modified version of the Teacher Beliefs Survey (TBS) developed by Woolley,Benjamin, and Woolley (2004) to assess teacher beliefs related to constructivist and traditional approaches to teaching and learning. The TBS has 21 Likert-type items, with a scale that ranges from 1 (strongly disagree) to 6 (strongly agree.) For our study, all of the 10 TBS items measuring constructivist teaching (CT), all 7 TBS items measuring traditional teaching (TT) approaches, and all 4 TBS items measuring traditional management (TM). We added 4 new items on TM, due to its low reliability as reported by Woolley et al. (α = .52).
We also administered two scales that we had developed and used in other research on teacher perceptions about assessment. The extent to which teachers held positive beliefs about standardized state tests was assessed using four items (EXTTEST). The items included general statements such as “How important are state tests for making classroom instructional decisions?” with a possible range of 1 to 4 points, where 1 = not at all, 2 = not very, 3 = somewhat, and 4 = very. Teacher preferences about classroom assessment methods were measured on a five-item scale (ALTPREF). Responses to items on this scale indicated preference for using assessment alternatives to traditional paper-and-pencil tests, such as performance assessments, portfolios, group projects, and participation. A positive response to an item represents a disposition against formal testing and in favor of a given alternative: for instance, “I prefer to base grades on students’ work and projects in learning groups, rather than on paper-and-pencil tests.” Each item had a possible range of 1 to 6 points, where 1 = strongly disagree, 2 = disagree, 3 = slightly disagree, 4 = slightly agree, 5 = agree, and 6 = strongly agree.
The data files were downloaded from the online survey system with data were XXX cases from three waves of data collection. After all duplicate cases and respondents who were either full-time teachers or not enrolled in teacher preparation programs) were deleted, the data set contained XXX cases, all preservice teachers. XXXX cases had missing data on one or more survey variables, and were deleted on the grounds that there was no evidence that the data were missing at random, and valid imputation would rely on assumptions about the validity of the responses that had not yet been established. The final pool of data for analysis therefore consisted of XXXX responses all of whom were preservice teachers who had completed the entire SGB with no missing data. The participants were XX% female, XX% studying at the graduate level. XX percent had completed a bachelors degree, and an additional XX% had completed a masters degree or higher. XX percent were pursuing certification in childhood education and XX% in adolescent education (see Table 1).
A CFA was performed on data from a new sample of teacher candidates using LISREL on the 17 item SGB. The tested model is shown in Figure XXX, with circles representing latent variables and rectangles representing survey vignettes. The purpose of the CFA was to confirm the four-factor model indicated by the results of the previous wave of SGB modifications and assess the consistency of the latent variables. Five items were hypothesized to indicate the Management factor; 3 items to indicate the Pull factor, 6 items to indicate the Generosity factor, and three items to indicate the Effort factor. The four factors were allowed to covary, and no error covariances were assumed.
The data were examined for normality and multivariate normality in SPSS. Several items demonstrated negative skewness possibly due to an “agreeable” tendency in the sample, and 15 multivariate outlier cases were detected using Mahalanobis distance (Tabachnick & Fidell). Subsequent analyses were run on the correlation matrix with and without inverse reflected transformations of item level data, and with and without removal of outliers. Because the fit statistics of all analyses are extremely similar, only the results of the analysis with all untransformed cases are reported.
What we found: Internal structure
What we found: External relationships – Woolley, Test value, test pref, responsibility??? (0316 only)
The reliability of internal consistency for each dimension of our revised version of the TBS (CT α = .74, TM α = .74, TT α = .77) was generally consistent with Woolley et al. except the consistency of TM, which was notably improved (CT α = .73, TM α = .52, TT α = .78). The extent to which teachers held positive beliefs about standardized state tests was assessed using four items (EXTTEST, α = .XX). Teacher preferences about classroom assessment methods were measured on a five-item scale (ALTPREF, α = .XX).
- What this tells us
- We dispute the prevalent “hodgepodge” point of view on grading. We have found predictable systems of beliefs that influence teacher candidate and teachers’ support for grading practices that diverge from professional recommendations. We believe these belief structures can be of help for directing teacher education, teacher professional development, and school and district instructional and grading policies.
- Why we should care
- The instrument can be used for ……..
- RIPE FOR PD