About the Author(s)


Meshack Moloi Email symbol
Department of Primary Education, Tshwane University of Technology, South Africa

Anil Kanjee symbol
Department of Primary Education, Tshwane University of Technology, South Africa

Citation


Moloi, M., & Kanjee, A. (2018). Beyond test scores: A framework for reporting mathematics assessment results to enhance teaching and learning. Pythagoras, 39(1), a393. https://doi.org/10.4102/pythagoras.v39i1.393

Original Research

Beyond test scores: A framework for reporting mathematics assessment results to enhance teaching and learning

Meshack Moloi, Anil Kanjee

Received: 25 Aug. 2017; Accepted: 29 Apr. 2018; Published: 25 July 2018

Copyright: © 2018. The Author(s). Licensee: AOSIS.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

In this article we propose a framework for reporting mathematics results from national assessment surveys (NAS) such that effective use of the resulting reports can enhance teaching and learning. We explored literature on factors that may contribute to non-utilisation of assessment data as a basis for decision-making. In the context of South Africa, we identified the form and formats in which results of NAS are reported as a possible limiting factor to the effective use of summative assessment results for formative purposes. As an alternative, we propose a standards-based reporting framework that will ensure accurate measurement of, and meaningful feedback on, what learners know and can do. We illustrate how, within a properly designed reporting framework, the results of a NAS in mathematics can be used for formative purposes to enhance teaching and learning and, possibly, improve learner performance.

Background

National assessment surveys (NAS) have been implemented in South Africa since the abolishment of the apartheid education system in 1996, and have evolved over time, changing in name, purpose, design, scope and frequency (Department of Education [DOE], 2005; Kanjee, 2007). National assessments are defined as ‘regular and systematic measurement exercises designed to determine what students have learned as a result of their educational experiences’ (UNESCO, 2000, p. 14). They are different to public examinations in that their goal is to inform policy for the education system as a whole, rather than to certify individual learners. These assessments may be administered to an entire cohort (census testing) or to a statistically chosen group (sample testing) and may also include background questionnaires administered to learners, teachers or education officials to obtain additional information for use in interpreting learner scores. Braun and Kanjee (2007) note that the utility of the data generated from these assessments depends on the quality and relevance of the assessment, the thoroughness of the associated fieldwork, as well as the expertise of those charged with the analysis, interpretation, reporting and dissemination of results.

Between 1996 and 2015, the form, format and frequency of NAS in South Africa have changed significantly – from national sample-based surveys administered in selected grades to assess mathematics and language performance every 3 to 4 years to annual national census-based assessments (Department of Basic Education [DBE], 2013). In addition to NAS, individual provinces such as the Western Cape and North West also administer provincial assessments and common tests, respectively, which focus on different subject areas and grades (Hoadley & Muller, 2016). While there have been marked improvements in the administrative and logistical processes of the assessments, a challenge that remains unresolved pertains to the meaningful reporting and effective use of the results from these assessments for enhancing teaching and learning.

The phenomenon of non-utilisation or under-utilisation of national assessment data in influencing decision-making in South Africa has been noted as a matter of concern (Kanjee & Moloi, 2014; Kanjee & Sayed, 2013). Yet there has been a growing body of research which indicates that, when the results of NAS are reported, disseminated and utilised properly, there are observable improvements in learner performance (Klinger, DeLuca & Miller, 2008; Ravela, 2005; Schiefelbein & Schiefelbein, 2003). It would appear, therefore, that one challenge facing teachers in South Africa is the inadequacy in meaningful reporting and effective utilisation of evidence from assessment. Meaningful reporting includes finding effective ways of converting raw data into information that could inform decision-making. At classroom level, ‘meaningful information’ refers to information that the teacher could use for determining what learners at a particular grade level know or do not know, and can or cannot do, and to develop relevant interventions to address specific learning needs of learners.

In this article, we propose a framework for reporting results from NAS for use at the school level, and demonstrate how this framework can be applied to identify specific learning gaps of learners and provide guidelines to address identified learning gaps. Although the reporting framework is exemplified in mathematics, its applicability extends to any school subject. First, we contextualise the proposed framework by providing a brief overview of reporting of assessment results as regulated in the South African Curriculum and Assessment Policy Statement (CAPS). Next, we provide a conceptual framework for reporting and using assessment data, highlighting the challenges impeding effective use of data. This is followed by a description of the proposed reporting framework, its underlying philosophy and an exemplar school report, in which we highlight its practical application and implications for enhancing teaching and learning. We conclude the article by listing areas for further research to optimally use summative assessment results for formative purposes.

Reporting mathematics results

The view taken in this article is that mathematics as a subject embodied in most school curricula is often characterised as a hierarchical cumulative body of knowledge. As such, the foundations of relevant mathematics content at a particular grade level are developed in the previous grade and the acquisition of complex capabilities builds on relatively basic concepts. For instance, young children progressively develop a ‘number concept’ often demonstrated by first being able to organise concrete objects before they can manipulate abstract concepts. Given this unique nature of the subject, assessment and use of assessment results in mathematics seem to present specific challenges to mathematics teachers (Webb, 1997).

In order to enhance learning of mathematics knowledge and skills, as well as to identify and address specific learning gaps revealed by assessment results, teachers must have full mastery of the mathematics content area as well as a thorough understanding of the hierarchical nature of the subject. Similarly, for assessment data to be useful for teachers to enhance learning in mathematics, it becomes more critical that the data be organised and reported in a manner that reflects the nature of mathematical knowledge and how learning in mathematics takes place. In practice, this implies that learner performance results reported with the intention of enhancing teaching and learning must, at a very minimum, provide information on what learners at a particular point know and can do and, at the same time, what they are potentially ready to learn (Vygotsky, 1962).

One limitation in reporting the results of NAS is the tendency to adopt a norm-referenced approach in which schools, and even learners, are ranked and compared with one another according to their performance in the tests (Green, 2002). The ‘league tables’ that often emanate from norm-referenced reporting are notorious for attracting resistance to assessment and evocation of negative feelings among teachers. This undesirable phenomenon was reported in the United Kingdom (Goldstein, 2001) and was also observed in South Africa when teacher unions boycotted the administration of the Annual National Assessment (ANA) because they perceived the assessment as ‘an onslaught on teachers with no intention to improve the [education] system’ (Nkosi, 2015).

In this article we argue that the vital element that links NAS results to enhancing teaching and learning is a reporting framework that provides accurate measurement and meaningful feedback on what learners know and can do (Griffin, 2009). Importantly, the reporting framework must reflect the structure of mathematical knowledge as well as the process of learning in mathematics. It must embrace what Griffin (2009) defines as ‘criterion-referenced interpretation’ and involve measurement coupled with ‘skills audits’ in which responses to clusters of items in a test are interrogated to identify an underlying construct. For example, a Grade 6 learner who only responds correctly to test items that involve counting forward with whole numbers is demonstrating mathematical understanding that is at a lower level than a learner who, in addition, also responds correctly to items that involve doing calculations using fractions.

We are aware of the critical distinctions that some make between NAS and school-based assessments in terms of how the assessments are impacted upon differently by the socio-economic contexts within which learning takes place (Nichols & Berliner, 2007). Disparities among school-based testing procedures (Webb, 1997), possible variations in curriculum coverage across schools and other differences may lead to questioning the fairness of the NAS. Within the limits set by these caveats, we take the view advanced by Dunne, Long, Craig and Venter (2012) that a good balance between NAS and school-based assessment is possible with proper test design and effective reporting of results. Proper test design encapsulates considerations of the extent to which the test adequately elicits meaningful information on what learners know, can or cannot do in the subject area of interest. Effective reporting involves ‘packaging’ and presenting the results in ways that enable the target users to initiate appropriate interventions for improvement. In particular, the South African ANA model, where all learners in a grade, and not just typical representative samples, participate in the NAS, enhances both the feasibility and the practicability of the balance that Dunne et al. recommend. A standards-based reporting framework (SRF) that allows criterion-referenced interpretation of test results in these conditions stands to benefit policymakers, teachers, parents and even learners.

It is important to recognise that the value of the results of an assessment is optimal when they are used within the confines of the purpose for which the assessment is designed. On the one hand, school-based assessments include formative assessments whose purpose is to inform teaching and learning while a lesson is in progress and are, therefore, developmental in design. On the other hand, schools also conduct summative assessments which basically measure the extent to which learning has taken place after several lessons were delivered. Testing that characterises NAS falls under the latter category of assessments. Our argument is that, within a properly designed reporting framework, the results of summative assessments can be used for formative purposes to enhance the quality of teaching and learning.

Curriculum and Assessment Policy Statement reporting framework

Assessment results in basic education in South Africa, both school-based as well as results of common examinations and NAS, are recorded and reported according to a framework that is prescribed in the CAPS document. The ‘framework’ has three key elements designed in seven levels, namely rating codes, descriptions of competence and percentages (Table 1). We examined the CAPS framework against our proposed conceptual model and noted some disparities which we consider to be of material importance.

TABLE 1: Curriculum and Assessment Policy Statement framework for reporting assessment results.

The CAPS framework prescribes that assessment data will be organised into fixed percentage bands with the lowest band ranging from 0% to 29% and the highest from 80% to 90%. Within this framework, a learner obtaining a minimum score of 50% is deemed to be functioning at the ‘adequate achievement’ level (DBE, 2011). We argue that by organising and summarising results using percentages, the CAPS framework does not provide any information on the specific knowledge and skills that learners have or have not mastered. For example, a score of 56% provides no information on what should be done for enhancing teaching and learning. We extracted a table that summarises NAS results in a typical ANA report compiled according to the CAPS framework to point out some of the conceptual challenges that compromise CAPS-based reports (Table 2).

TABLE 2: Percentage of Grade 6 learners by achievement level in mathematics.

In Table 2, which contains information that was put in the public arena, the raw score bands have been summarised using the seven codes and the corresponding descriptions of competencies. No substantive qualitative analysis has been presented that provides detailed information on what learners at each score band in Table 2 know and can do. In a survey to assess the extent to which South African teachers used the ANA results, Kanjee and Moloi (2014) reported that up to 26% of the teachers in their study were of the view that the ANA reports did not provide any new information that they did not already know. An inference that could be made from these perceptions was that these teachers were, logically, not likely to utilise these results. Our view is that perceptions of inadequacy in the content of the NAS reports could contribute to non-utilisation of the results for enhancing teaching and learning, which in turn could lead to perpetuation of underperformance in the system.

The fixed percentage bands as exemplified in Table 2 do not accommodate variations in the difficulty of tests. For instance, learners who score in the range of 0% – 29% are categorised as functioning at the ‘Not achieved’ (L1) level and those who score in the range of 80% – 100% as functioning at the ‘Meritorious’ (L7) level, regardless of the difficulty of the specific test. We should be aware that on an easy test percentage correct raw scores tend to be higher than in a difficult test; however, in the same test learners of higher ability are expected to score higher than their counterparts of lower ability (Bond & Fox, 2007). So, a meritorious achievement in an easy test may not necessarily be the same in a difficult test. It is also not possible to set two different tests that have exactly the same level of difficulty, even if the exact same test specifications are followed. It is for this reason that test equating measures have been introduced to adjust for differences in test difficulty (Kolen & Brennan, 1995).

The net effect of these inconsistencies is that the users of CAPS-based reports may have either superficial or distorted knowledge about the performance of learners. Moreover, the use of this reporting framework implies that a higher conceptual workload is placed on teachers and school leaders by expecting them to be able to record, report, categorise and address learner needs across seven levels of performance. It could prove unrealistic to expect a teacher to keep track of and provide differentiated support across seven categories of learners in a class. To mitigate the observed shortcomings of the CAPS reporting framework and ensure that assessment results are reported in ways that provide quality information to support users and enhance the teaching and learning process, in this study we propose an alternative model that is underpinned by a theory of data use proposed by Breiter and Light (2006).

Conceptual model for using assessment data

Breiter and Light (2006) developed a conceptual model for using data to inform decision-making in the management of education districts. Central to their model is a definition of decision-making as a ‘highly complex, individual cognitive process that can be influenced by various environmental factors’ (Breiter & Light, 2006, p. 208). They discourage notions of decision-making that require innumerable disparate pieces of data and suggest rather that decision-making involves intelligibly reducing (collecting and organising) large amounts of data, converting the data (summarising and analysing) into information and transforming the information into context-related knowledge to inform action (prioritising and synthesising). Their model comprises four key elements, namely data, information, knowledge and decision-making. While not necessarily focusing specifically on assessment data, the model also accounts for the multiplicity of data and data sources that decision-makers in education must deal with. We adapted the model by Breiter and Light and developed a conceptual model to report assessment results so that the information can be used to enhance the quality of teaching and learning in schools (see Figure 1).

FIGURE 1: Conceptual model for using assessment data.

The basic element of the model by Breiter and Light (2006) is data. This includes, but may not be confined to, raw statistical data like test scores, for instance. Once teachers, school leaders or other decision-makers become aware of a situation of educational importance that needs to be addressed such as the persistent underperformance in mathematics and related issues, for example language ability or home background, appropriate data, often presented in numerical formats, need to be collected and analysed to gain detailed insight into the nature of the phenomenon. We agree with Breiter and Light that once data is collected it must be organised in ways that will make it meaningful to the users. But data do not speak for themselves; hence the continued reporting of mathematics assessment results in raw scores in our schools seems to have influenced neither teaching nor learning. Data must be reported in ways that allow key users, such as district officials, school leaders and teachers, to decode the data (Coburn, Honig & Stein, 2009).

Information, in the model, refers to data that has been appropriately analysed and summarised so that it sheds light on the nature and extent of the identified problem. Thus, any report must communicate relevant information that will either add to what is known or will illuminate a new area of interest or further investigation. For example, a mathematics school report should provide information on what individual or groups of learners know or do not know and can or cannot do in mathematics, which domains of mathematics pose particular challenges to learners and whether different groups of learners (e.g. boys vs girls or rural vs urban) display comparable levels of proficiency. Later in this article we show how reporting assessment data using meaningful performance standards provides information that empowers key users to make relevant decisions about the challenges of teaching and learning in schools.

Knowledge builds on available information by synthesising what is new with what is already known or available to change the undesirable situation and weighing what the priorities are. For example, a teacher who interprets assessment results and identifies relevant teaching strategies to address revealed learning gaps and explores possible interventions to rectify the situation has knowledge. We contend that there is a relationship between the depth and quality of knowledge about the education system and the quality of available information. Assessment information that is either incomplete or inaccurate will lead to partial or distorted knowledge about the education system and is likely to result in ineffectual interventions for improvement.

Decision-making is the deployment of acquired knowledge to impact the situation as desired and, in the case of knowledge that comes from assessment, to improve learning outcomes. Breiter and Light (2006) argue that decision-making does not begin with data but with knowledge of needs, for instance needs of learners, teachers or even district officials. It is knowledge that directs the decision-maker to the types of data to collect, the time of collecting it and the methods of transforming the data into actionable decisions. It is important to note that, because of the dynamic nature of the education enterprise, there is a dialectical relationship between knowledge and the context in which teaching and learning take place. On the one hand, there is knowledge of what the assessment results reveal and what needs to be done to turn things around. On the other hand, there is knowledge of new phenomena that may require the collection of new data to understand their nature and thus begin a new cycle of data collection, generation of information and development of necessary knowledge to make relevant interventions. Decision-making involves leveraging on existing knowledge and prioritising what needs to be done to achieve the desired goals. For instance, when it is known that learner performance in mathematics in a school or district is particularly and continually unacceptable and the factors that contribute to the situation are also known, policymakers and practitioners are confronted with deciding on the best action to take to remedy the situation and count on existing evidence to justify their interventions.

Development and implementation of relevant intervention for any decision to have an impact on practice, relevant interventions that address the key challenges identified must be developed and implemented. In practice, the nature, extent and duration of the intervention may vary depending on what decisions are taken across different contexts. For example, interventions to improve mathematics performance could focus on a specific phase or grade, for example Foundation Phase or Grade 3; these interventions could address specific content areas, for example geometry, or groups of students, for example second language speakers, or the interventions could be conducted as additional lessons before new concepts are introduced or as additional exercises during lessons. The key point is that the intervention developed must be based on addressing challenges identified from the information collected, provided the information is clear, meaningful, easy to read and relevant. In addition, it is critical that some form of evaluation be conducted to monitor progress in implementing interventions.

Improved learning is the ultimate goal within classrooms and schools. Within a learner-centred paradigm, improvement in learning and realisation of observable learner performance hinge largely on the quality of feedback that is given to learners (Saddler, 2010). While Breiter and Light (2006) were specifically referring to feedback in formative assessment in classrooms, we argue that the principle applies to test results as well. When feedback, in the form of information-rich assessment reports, is clear and specific in terms of where the learners are and what the expectations are such that learners are enabled to take control of their learning, it can serve to move learners to the next step. Feedback that provides evidence of what knowledge and skills learners have mastered and which they have not guides teachers to support learners meaningfully and relevantly according to their identified needs (Sloane & Kelly, 2003). It creates a conducive environment wherein teachers and learners work together to realise their shared instructional goals. In this environment learner performance is highly likely to improve.

The implications of adopting the proposed conceptual model for use of assessment data to enhance learning in mathematics are twofold. Firstly, an assessment framework that is based on this conceptual model must have a facility that makes it possible to transform assessment data to information and add value to information to convert it to knowledge. Secondly, because our focus is on mathematics, the framework must be sensitive to the nature of mathematics as a body of knowledge and a school subject and to how learning in mathematics takes place. We argue that these requirements can be met by a standards-based reporting framework.

Challenges to the use of assessment results

Some of the reasons identified for the non- or under-utilisation of information from NAS include poor or non-dissemination of the findings, lack of confidence in the validity of such information among those who have to act upon it, and lack of capacity and absence of appropriate tools to help teachers use the data (Kellaghan, Greaney & Murray, 2009). Other researchers (Hambleton & Pitoniak, 2006; Hambleton & Slater, 1997; Underwood, Zapata-Rivera & Van Winkle, 2010) also blame reports from national assessments for being complex, often couched in statistical jargon that users cannot decipher, difficult to read, and even more difficult to interpret. In South Africa, Kanjee and Moloi (2016) reported that, although the results of NAS had been considered in some policy-related decisions, there had been limited focus on using the results to support improvements in teaching and learning.

More pertinent to the objectives of this article was the finding from Kanjee and Moloi (2016) that a significant percentage of the teachers endorsed ‘Agreed’ and ‘Strongly agreed’ when presented with the statement: ‘Teachers do not know how to use ANA results to assist learners’. What was more concerning was that approximately 60% of the teachers in this category were teaching in affluent schools that were reputed for high performance. The corresponding percentage of teachers in poorer and often under-performing schools went up to 85%. An inference that could be made from these perceptions was that these teachers were not likely to utilise the results of these national assessments. The situation could be exacerbated by the finding that in the same study, up to 65% of the teachers strongly negated a statement that district officials provided guidance and training on the use of ANA results. Effectively, it would appear that teachers are left to their own devices when it comes to interpreting and using the ANA results.

In his study on how provinces, districts and circuits utilise data from ANA, Govender (2016) reported wide variations in the two provinces and districts that he sampled. Although the education officials were aware of the utility value of the data, Govender (2016) notes that the majority reported that they lacked technical and practical capacity to analyse and interpret the data in meaningful ways. Again, like in the case of Kanjee and Moloi (2014), this finding implies that district officials are unable to provide relevant guidance and support to schools and teachers to enhance their use of assessment results for improving teaching and learning.

In another study Kanjee and Mthembu (2015) explored the extent to which Foundation Phase teachers in one district in South Africa demonstrated understanding of concepts and practices related to both formative and summative assessment. Their study sample included teachers from schools serving communities that ranged from low to high socio-economic status. Kanjee and Mthembu (2015) reported that the teachers demonstrated very low levels of assessment literacy, more so in formative than in summative forms of assessment. Although the sample was quite small and not representative, it is to be noted that these findings were in agreement with the observations that Govender (2016) made. Both district officials and teachers in South Africa appear to lack adequate capacity to utilise assessment data in ways that will enhance the quality of teaching and learning.

Overall, research suggests that, invariably, the interpretation and utilisation of assessment results to enhance teaching and learning in schools are often limited by the competencies of key users, including teachers, school leaders and education department officials (Griffin, 2009; Timperley, 2009). The implication is that reports presenting assessment results must not be dependent on assumed competencies of key users. Thus, these reports should be easy to read, easy to understand, and provide some indication of possible ‘next steps’ that users can follow to identify and address specific learning gaps or support learners in improving on their current levels of performance. However, limited information currently exists on how such reports need to be developed, nor on what type of analysis is required or how information is best presented to increase the utility value of these reports for teachers.

Exploring the use of standards-based reports

To address the limitations of reporting as discussed in the previous sections we propose a standards-based reporting framework. Green (2002) notes that a standards-based report presents assessment results according to demonstrable mastery of knowledge and skills displayed by learners as evidence of achieving expected learning outcomes. A standards-based report does not ‘average’ learner scores in a test but identifies what learners know and can do in relation to what the expected standard specifies. Implicit in a standards-based report is a priori statement of what is expected of a learner at a particular level or grade. Drawing from the analysis of observed learner scores, the report provides easy-to-read, easy-to-understand and clear guidelines or clues on next steps for teachers (Ravela, 2005). The ‘knowledge and skills’ expected from learners are generally referred to as ‘standards’ (Goodman & Hambleton, 2004, p. 148).

In educational circles, a distinction is made between ‘content standards’ and ‘performance standards’ (Cizek, 1996; Rodriguez et al., 2011). Rodriguez et al. (2011, p. 18) define ‘content standards’ as ‘what students need to learn’. In the context of South Africa, ‘content standards’ are spelt out in the CAPS by grade and by subject (DBE, 2011). ‘Content standards’ specify the nature and scope of content knowledge, including skills, that a learner must acquire in a given grade. Hambleton (2000) defines performance standards as:

well-defined domains of content and skills and performance categories for test score interpretation [that] are fundamental concepts in educational assessment systems aimed at describing what examinees know and can do. The primary purpose [of the affected assessments] is not to determine the rank ordering of examinees, as is the case with norm-referenced tests, but rather to determine the placement of examinees into a set of ordered performance categories. (p. 2)

Thus, while content standards answer the ‘what’ question, performance standards answer the question about ‘how much’. An apt description of the purpose of performance standards proffered by Hambleton (2000) is that they are qualitative and descriptive statements of how much learning has taken place and how much of it is ‘good enough’. Our interpretation of Hambleton is that performance standards provide a framework of evidence to be used for placing learners at particular points on a continuum of knowledge and skills according to what they are able to demonstrate when given opportunity to do so, like in a test.

Important features of performance standards are performance levels (PLs) and performance level descriptors (PLDs). Zieky and Perie (2006) describe PLs as:

general policy statements that indicate the official position on the desirable number and labels of categories to be used in classifying learners according to their kno