One can glean from Table 1 that the majority of teachers were teaching in the Further Education and Training phase (Grades 10–12) and had at least 10 years of teaching experience. Ethical considerations At the time of data collection respondents were assured of confidentiality and anonymity, and that their participation was voluntary and they could withdraw at any time. They were further informed that there would be no penalties related to their results for assessment of courses they were following. Results Variance and unidimensionality A procedure to determine the unidimensionality of an instrument measuring a latent variable is the principal component analysis of standardised residuals. This procedure is not ‘usual factor analysis’ but ‘shows contrasts between opposing factors, not loadings on one factor’ (Linacre, 2008, p. 250). This procedure points to possible items which may distort the unidimensional aspect of an instrument. The decision criteria for the results emanating from this procedure were as follows: Variance explained by measures > 60% is good. Unexplained variance explained by 1st contrast (size) < 3.0 is good. Unexplained variance explained by 1st contrast < 5% is good. (Linacre, 2008, p. 335) Data obtained from implementation of the instrument rendered the variance explained by measures forthcoming from the empirical data as 27.2%. The expected variance (for the data to fit the Rasch model) to be explained by measures was 28.0%. This difference was not deemed significant. The unexplained variance emanating from the data was 72.8% and the Rasch model’s expectation is 72.0%; this can also be deemed as not significant. As is clear from the decision criteria, the cut-off point of 60% was not met. However, these percentages of the variances were to be expected, since the respondents were fairly homogeneous with regard to their teaching context and the issue under discussion. The reported standardised residual variance for the first contrast of 3.1 is above the recommended cut-off point. Analysis of graphs of the spread of items indicated that two items, one dealing with ‘Youth Dances’ and the other with ‘Youth Fashion’, stood out as operating as a group independent of the rest of the items. These two items can be considered as conceptually linked around the notion of the behaviour of young people. The respondents were mature adults, and it can reasonably be assumed that they viewed the two activities as related. Further analysis was done to ascertain the absence of which of these two items produced a better unidimensional instrument. This analysis rendered that removal of the item dealing with ‘Youth Dances’, with standardised residual variance for the first contrast of 2.8, enhanced the unidimensionality of the instrument. Further analysis proceeded using the instrument now reduced to 16 items through removal of that dealing with ‘Youth Dances’. A variety of other indicators can be calculated by the Rasch procedures, and these were interpreted to give an indication of the functioning of an attitudinal instrument such as that under consideration here. The results emanating from these procedures are discussed in the next section. Differential item functioning Another important criterion for a measuring scale is that the items should not function differentially for different categories of participants comprising the sample. Given that for the instrument under scrutiny here the participants were teachers of different genders, the items should not function differentially for females and males. Analysis of differential item functioning along gender lines was conducted for the cohort of teachers. This analysis rendered that two items (‘Pension and Retirement’ and ‘Health’) might be easier for female than for male teachers to endorse, and that a further two items (‘Agriculture’ and ‘Emergency Services’) might be easier for male than for female teachers to endorse. Although differential item functioning (DIF) is noticeable for these items, ‘For statistically significance DIF on an item, Prob. < .05’ (Linacre, 2008, p. 266). None of the reported probabilities for these items were less than 0.05 and hence DIF between female and male teachers was not statistically significant for all the items of the scale. DIF was not performed for the other demographic dimensions since the sample was fairly homogeneous in respect of their teaching environments. Rank ordering of the items As pointed out earlier, in a useful scale the items operationalising the abstract construct under discussion should form a hierarchy, so that it is possible to conclude which of the items respondents would find easy and which they would find difficult to endorse. With Rasch modelling three values can be determined to ascertain the hierarchical property of a scale: the measure of an item, and the infit mean square and the outfit mean square values respectively. The measure of an item is the location on the scale. For a rating scale it indicates the level of difficulty for endorsing the item. The difficulty of endorsement ‘of an item is defined to be the point on the latent variable at which it’s high and low categories are equally probable’ (Linacre, 2008, p. 221). Reeve and Fayers (2005) give the criterion for the spread of items to be deemed acceptable. These authors point out that the measures should be in the range -2 to +2 logits. With the range for the instrument in this study being -0.94 to 1.46, as given in Table 2, this criterion was fulfilled. 2Measure and fit statistics. http://pythagoras.org.za/index.php/pythagoras/article/downloadSuppFile/13/30

In Rasch analysis mean infit and outfit squares (see Table 2) are calculated to indicate ‘items which do not contribute to the definition of a coherent and useful variable’ (Wright & Masters, 1982, p. vi). For items to have a good fit to the Rasch model, the decision criteria are: values greater than 2.0 degrades measurement; values greater than 1.5 neither constructs nor degrades measurement; values from 0.5 to 1.5 are productive of measurement and those less than 0.5 misleads us into thinking we are measuring better than we really are. (Linacre, 2008, pp. 221−222) It is observable from Table 2 that the fit statistics for all the items were within the ‘productive of measurement’ range. In fact, both the infit and outfit mean square values for all the items fell within this acceptable range. It is thus concluded that the reconstructed scale to measure the construct ‘teachers’ preference for real-life situations to be used in Mathematical Literacy’ forms a continuum. The Rasch model can be used simultaneously to estimate a person’s ability (ability to endorse an item) and an item’s difficulty (or endorsability of the item). The Winsteps software (Linacre, 2008) analysis presents these two estimates in a ‘person-item map’ which provides an indication of the informativeness of the measures. Figure 1 gives the person map of items for teachers’ preferences for real-life situations to be used in Mathematical Literacy. The 49 teachers appear on the left-hand side, with teachers with a high level of endorsement of the scale at the top and those with a low level of endorsement at the bottom. The items, with those ‘hard to endorse’ at the top and ‘easy to endorse’ at the bottom, appear on the right-hand side. Noticeable from this figure is that the mean for the persons (M = 0.66, SD = 0.72) is higher than the mean for the items (M = 0.00, SD = 0.59), which suggests that the respondents hierarchically endorsed the same items. A further observation is that items in four sets of items (T11 and T5; T14 and T9; T17 and T3; T12, T13, T4 and T6) share the same location. Essentially this may imply redundancy of items, and that the reliability of the instrument will not be influenced if only one of the shared items is used. However, for an instrument dealing with affective issues, care should be taken with replacement, and conceptual considerations in addition to computational ones should drive decisions about replacement of items. For example, T5 (Youth Fashion) and T11 (Lottery and Gambling) share the same location and are somewhat remotely conceptually linked, but are different in terms of the mathematics related to them. They are thus not candidates for replacement. On the other hand, T9 (Pension and Retirement) and T14 (Inflation) can be considered as conceptually linked, because of the relationship they share in construction of pension and retirement scheme mathematical models. At school level, however, they point to different mathematical topics and thus removing any one would not be sensible. In Figure 1 gaps are apparent at five places (between T18 and T11; between T16 and T1; between T14 and T17; between T17 and T12 and between T2 and T7.) These gaps indicate that the items in these regions are not evenly spread. This might be a result of the homogeneity of the respondents, the small sample and strong preferences, both negative and positive expression. For example, for T10 (‘Health’), the item found to be the easiest to endorse, 96% of the respondents selected the categories ‘agree’ and ‘strongly agree’. T18 (‘Youth Fashion’) was the hardest to agree with, with only 8% selecting ‘strongly agree’. The Rasch model reports a ‘person reliability’ measure which ‘is equivalent to the traditional “test” reliability’ (Linacre, 2008, p. 393). The person reliability for the teacher context preferences for real-life situations to be used in Mathematical Literacy was 0.65. ‘Low values indicate a narrow range of person measures’ and person reliability of 0.5 ‘divides a sample into 1 or 2 groups’ (Linacre, 2008, p. 393). The homogeneity of the sample accounts for the person reliability being low and points in the direction of a need for a more diverse sample for further development of the instrument. ‘”Item reliability” has no traditional equivalent and low values indicate a narrow range of item measures, or a small sample’ (Linacre, 2008, p. 393). The item reliability obtained was 0.84 and gives a high level of support that the hierarchical ordering of the items in Table 2 will be reproduced with a different sample of teachers working in a similar context as the respondents in this study. The results obtained on the functioning of the instrument and the adjustments effected indicate that this was reasonable. The revised instrument resulted from removal of misfitting persons and an item contributing to violation of the unidimensional character of the initial instrument. This has helped identify a unidimensional trait representing Mathematics and Science teachers’ preferred contextual situations to be used in Mathematical Literacy. Discussion and conclusion The overall objective of the ROSME project is to ascertain and trace over time the real-life situations which learners, teachers and parents would prefer to be used in Mathematical Literacy. The teacher instrument is specifically aimed at ascertaining the contexts that teachers prefer. Expression by teachers of preferred contextual situations is a subjective issue. However, for a variety of reasons – of which economic factors and expediency are the most important – it is desirable to have some robust, easily implementable measurement instrument. This is because such an instrument will enable the assessment of real-life contexts that teachers prefer to be used in Mathematical Literacy. Also, the instrument would allow for the tracking of teachers’ interests in contexts over time, in the same way that the TIMSS and Programme for International Student Assessment instruments track the performance of learners in school Mathematics. Tracking is important for informing decision-makers and learning resources developers of relevant real-life situations to include in such materials. As Boone and Scantlebury (2006, p. 253) assert, ‘statistical confidence in [such] measures’ validity and reliability is essential.’ The results of the infit and outfit mean squares and standardised residual variance are indicative of the ROSME instrument’s ability to ascertain the contextual situations that Mathematics and Science teachers prefer, bolstering this ‘statistical confidence’. The fit statistics show that the instrument used to determine the contexts that teachers prefer for Mathematical Literacy reasonably identifies this variable. Given that attitudinal instrument development is an iterative process, this finding points in the direction of further development with a more heterogeneous group of teachers in terms of the socio-economic context within which they teach, to ascertain the universality of the instrument. In pursuing this path we will heed the advice of Wright and Masters (1982, p. 102), that: ‘When items do not fit, that signifies … not the occasion for a looser model, but the need for better items’. Low endorsement of items points to areas in need of continuous professional development. So, for example, low endorsement was accorded to the item ‘Mathematics involved in a lottery and gambling’. A plausible reason for this low endorsement is the attachment of negative consequences of this activity. A teacher motivated the low endorsement status as follows: ‘If you want to instil positive value these [lottery and gambling] might be the opposite effect’. In this instance teachers might not, as yet, have a sense of the mathematics involved in lottery and gambling and how this can be used productively to inculcate the positive values they desire. Niss (2007, p. 1306), in his assessment of the state of research related to problematiques in Mathematics Education, concludes that there is a ‘need for investing a considerable amount of effort’ into researching issues related to the affective domain in Mathematics Education. In research on issues related to affective issues pertaining to school mathematics, instruments are normally used without reporting the viability of these in measuring the trait under consideration. Our analysis of one such instrument shows that much more care needs to be exercised in the construction of these. They should, at a minimum, reasonably identify the latent traits they purport to measure in order to provide useful information on attitudinal issues related to school mathematics. Acknowledgements This research is supported by the National Research Foundation under Grant number FA2006042600032. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Research Foundation. Competing interests We declare that we have no financial or personal relationships which may have inappropriately influenced us in writing this article. Authors’ contributions C.J. was the project leader and was responsible for experimental and project design. C.J. performed the analysis. L.H. and M.M. made conceptual contributions. C.J., L.H. and M.M. wrote the manuscript. 1.AndrichD1988Rasch models for measurement2.BondT.GFoxC.M2001Applying the Rasch model: Fundamental measurement in the human sciences3.http://dx.doi.org/10.1002/sce.20106 4.http://dx.doi.org/10.1007/BF032174325.De RoosYAllen-MearesP1998Application of Rasch analysis; exploring differences in depression between African-American and White children6.http://dx.doi.org/10.5688/aj690597 7.FowlerF.J1995Improving survey questions: Design and evaluation8.JulieC2002Making relevance relevant in mathematics teacher educationI. Vakalis, D. Hughes-Hallett, C. Kourouniotis, D. Quinney, & C. Tzanakis9.JulieCMbekwaM2005What would Grade 8 to 10 learners prefer as context for mathematical literacy?10.LessingADe WittM2007The value of continuous professional development: Teachers’ perceptionsSouth African Journal of Education2715367http://journals.sabinet.co.za/WebZ/images/ejour/educat/educat_v27_n1_a4.pdf?sessionid=01-38432-106046365&format=F11.LinacreJ.M2008Winsteps® Rasch measurement computer program user’s guideBeaverton, OR: Winsteps.com.http://www.winsteps.com/a/winsteps-manual.pdf 12.MpofuECaldwellLSmithEFlisherAMathewsCWegnerLetal2006Rasch modeling of the structure of health risk behavior in South African adolescentsJournal of Applied Measurement7332333413.NissM2007Reflections on the state and trends in research on Mathematics teaching and learning: from here to utopiaF.K. LesterSecond handbook of research on mathematics teaching and learning (pp. 1293−1312). Charlotte, NC: National Council of Teachers of Mathematics14.PringR2005Philosophy of educational researchLondon: Continuum15.ReeveBFayersP2005Applying item response theory modeling for evaluating questionnaire item and scale propertiesP. Fayers & R. HaysAssessing quality of life in clinical trials: Methods and practice (pp. 55−76). New York, NY: Oxford University Press16.SwanepoelCBooyseJ2006The involvement of teachers in school change: A comparison between the views of school principals in South Africa and nine other countriesSouth African Journal of Education26218919817.VermeulenN2007Mathematical literacy: Terminator or perpetuator of mathematical anxiety?M. Setati, N. Chitera, & A. Essien Proceedings of the 13th Annual National Congress of the Association for Mathematics Education of South Africa, Vol. 1 (pp. 368−380). Johannesburg: AMESA18.WrightB.DMastersG.N1982Rating scale analysisChicago, IL: MESA Press19.ZevenbergenRSullivanPMousleyJ2002Contexts in mathematics education: Help? Hindrance? For whom?P. Valero & O. SkovsmoseProceedings of the 3rd International Mathematics Education and Society Conference (pp. 1−9). Copenhagen: Centre for Research in Learning Mathematicshttp://www.mes3.learning.aau.dk/Papers/Zevenbergen_et_al.pdf1.Only first items in the row are mentioned.