Latent variable modeling of disability in people aged 65 or more
by Paolo Eusebi
Montanari GE, Ranalli MG and Eusebi P
Statistical Methods & Applications 20 (1), 49-63, DOI:10.1007/s10260-010-0148-6
Does IRT Provide More Sensitive Measures of Latent Traits in Statistical Tests? An Empirical Examination
by Jeff Stewart
Stewart, J. (In Press) Shiken Research Bulletin, 16 (1)
It has been frequently stated that Item Response Theory produces interval-scale measures where raw scores can only... more It has been frequently stated that Item Response Theory produces interval-scale measures where raw scores can only provide ordinal measures, and that therefore, researchers should choose IRT measures when selecting variables for common statistical tests, because raw scores may not meet their assumptions (Wright, 1992; Harwell & Gattie, 2001). In this study, this claim is empirically examined by conducting Pearson Correlations and ANOVAs on two data sets using raw scores, Rasch Person Measures and 2-Parameter IRT ability estimates, in order to determine if results differed as a consequence. Raw Scores and Rasch Person Measures were very highly correlated, and lead to extremely similar results in all cases. For a well-constructed, reliable test the same was true of 2PL ability estimates. However, in cases where the test has middling to poor reliability, 2PL ability estimates appear to produce a somewhat more sensitive measure of a latent trait than raw scores, which can result in meaningful differences in statistical tests.
Examining the Scalability of a Writing Scale via IRT Partial Credit Model
Paper presented at TELLSI 7 annual conference, held at Yazd University, Iran, October 2009
Abstract
Interval scaling has always been a serious concern in developing rating scales; however, in most of the... more
Abstract
Interval scaling has always been a serious concern in developing rating scales; however, in most of the rating scales to measure performance on skills such as writing and speaking, an equal distance is assumed between the points on the scale. It is needless to say that the provision of enough evidence to assume so is an inevitable requirement if one is to claim that a scale works as an accurate and valid measuring device. By the application of Polytomous IRT Models, ground is paved for analyzing the measuring characteristics of rating scales. That is to say that the scalability and the distance between the intervals on the scales are easily determined if some prerequisites for the application of IRT models such as unidimensionality, local independence, and enough sample size are observed. This paper summarizes the steps in determining the scalability of a paragraph writing scale as well as measuring the distance between the points on the scale. The results of the Partial Credit Model, as one of the Polytomous IRT Models, reveal that this scale is sufficiently scalable with more or less differing distances between the points or scores. This paper closes with a recapitulation of the implications, applications, and limitations of this study.
Equating classroom pre and post tests under item response theory
by Jeff Stewart
Stewart, J. & Gibson, G. 14(2), October 2010 (10-17)
The authors illustrate how classroom pre-tests can be used to gather information for an item bank from which to... more The authors illustrate how classroom pre-tests can be used to gather information for an item bank from which to construct summative tests with appropriate measurement properties, and detail methods for equating pre and post-test forms under item response theory in such a manner that resulting ability estimates between conditions are comparable.
Making Meaningful Measurement in Survey Research: A Demonstration of the Utility of the Rasch Model
Citation: Royal, K. D. (2010). Making meaningful measurement in survey research: A demonstration of the utility of the Rasch model. IR Applications, 28, pp. 1-16.
Quality measurement is essential in every form of research, including institutional research and assessment. This... more Quality measurement is essential in every form of research, including institutional research and assessment. This paper addresses the erroneous assumptions institutional researchers often make with regard to survey research and provides an alternative method to producing more valid and reliable measures. Rasch measurement models are discussed and a demonstration is provided, thus highlighting the utility of the Rasch models in higher education research and practice.
The Big Five personality traits and foreign language speaking confidence among Japanese EFL students
Apple, M. (2011a). Unpublished dissertation (Open Access).
This research examined the relationships between the Big Five human personality traits, favorable social conditions,... more
This research examined the relationships between the Big Five human personality traits, favorable social conditions, and foreign language classroom speaking confidence. Four research questions were investigated concerning the validity of the Big Five for a Japanese university sample, the composition of Foreign Language Classroom Speaking Confidence, the degree to which the Big Five influenced Foreign Language Classroom Speaking Confidence, and the degree to which perceptions of classroom climate affect Foreign Language Classroom Speaking Confidence.
The first stage of the research involved three pilot studies that led to the revision of the Big Five Factor Marker questionnaire and the creation of a new instrument for measuring foreign language classroom speaking confidence that included both cognitive and social factors as theorized in mainstream social anxiety research. The second stage of the research involved the collection and analysis of data from 1,081 participants studying English in 12 universities throughout Japan. Data were analyzed using a triangulation of Rasch analysis, exploratory factor analysis (EFA), and confirmatory factor analysis (CFA) in order to verify the construct validity of the eleven hypothesized constructs. Following validation of the measurement model, the latent variables were placed into a structural regression
model, which was tested by using half of the data set as a calibration sample and confirmed by using the second half of the data set as a validation sample.
The results of the study indicated the following: (a) four of the five hypothesized Big Five personality traits were valid for the Japanese sample; (b) Foreign Language Classroom Speaking Confidence comprised three measurement variables, Foreign Language Classroom Speaking Anxiety, Perceived Foreign Language Speaking Self-Competence, and Desire to Speak English; (c) Emotional Stability and Imagination directly influenced Foreign Language Classroom Speaking Confidence, and; (d) Current English Classroom Perception and Perceived Social Value of Speaking English directly influenced Foreign Language Classroom Speaking Confidence. The findings thus demonstrated a link between personality, positive classroom atmosphere, and foreign language classroom speaking confidence. The implications of the findings included the possibility that foreign language anxiety is not situation-specific as theorized, and that improved social relations within the foreign language classroom might help reduce speaking anxiety.
Embedding measurement within existing computerized data systems: Scaling clinical laboratory and medical records heart failure data to predict ICU admission
Fisher, W. P., Jr., & Burton, E. (2010). Embedding measurement within existing computerized data systems: Scaling clinical laboratory and medical records heart failure data to predict ICU admission. Journal of Applied Measurement, 11(2), 271-287.
This study employs existing data sources to develop a new measure of intensive care unit (ICU) admission risk for... more This study employs existing data sources to develop a new measure of intensive care unit (ICU) admission risk for heart failure patients. Outcome measures were constructed from laboratory, accounting, and medical record data for 973 adult inpatients with primary or secondary heart failure. Several scoring interpretations of the laboratory indicators were evaluated relative to their measurement and predictive properties. Cases were restricted to tests within first lab draw that included at least 15 indicators. After optimizing the original clinical observations, a satisfactory heart failure severity scale was calibrated on a 0-1000 continuum. Patients with unadjusted CHF severity measures of 550 or less were 2.7 times more likely to be admitted to the ICU than those with higher measures. A nomogram facilitates routine clinical application. Existing computerized data systems could be programmed to automatically structure clinical laboratory reports using the results of studies like this one to reduce data volume with no loss of information, make laboratory results more meaningful to clinical end users, improve the quality of care, reduce errors and unneeded tests, prevent unnecessary ICU admissions, lower costs, and improve patient satisfaction. Existing data typically examined piecemeal form a coherent scale measuring heart failure severity sensitive to increased likelihood of ICU admission. Marked improvements in ROC curves were found for the aggregate measures relative to individual clinical indicators.
Physical disability construct convergence across instruments: Towards a universal metric
Fisher, W. P., Jr. (1997). Physical disability construct convergence across instruments: Towards a universal metric. Journal of Outcome Measurement, 1(2), 87-113.
Meaning and method in the social sciences.
Fisher, W. P., Jr. (2004, October). Meaning and method in the social sciences. Human Studies: A Journal for Philosophy and the Social Sciences, 27(4), 429-54.
Abstract. Academia’s mathematical metaphysics are briefly explored en route to an elaboration of the qualitatively... more Abstract. Academia’s mathematical metaphysics are briefly explored en route to an elaboration of the qualitatively rigorous requirements underpinning the calibration and unambiguous interpretation of quantitative instrumentation in any science. Of particular interest are Gadamer’s emphases on number as the paradigm of the noetic, on the role of play in interpretation, and on Hegel’s sense of method as the activity of the thing itself that thought experiences. These point toward and overlap with (1) Latour’s study of the metrological social networks through which technological phenomena are brought into language as modes of being that can be understood, and (2) the way that Rasch’s models for measurement comprise a potential beginning for metaphysically astute, qualitatively and quantitatively integrated, mathematical methods in the social sciences. The paper closes with observations on the general problem that is philosophy, the need to remain open to multiplicities of meaning even as clear understandings are sought and obtained.
Evidence of a Structural Effect for Alcohol Outlet Density: a Multilevel Analysis
Scribner, R. A., Cohen, D. A., & Fisher, W. P. (2000, Feb). Evidence of a structural effect for alcohol outlet density: A multilevel analysis. Alcoholism: Clinical & Experimental Research, 24(2), 188-95.
BACKGROUND: Ecological studies reveal that alcohol-related outcomes tend to occur in high alcohol outlet density... more BACKGROUND: Ecological studies reveal that alcohol-related outcomes tend to occur in high alcohol outlet density neighborhoods. The ecological design of these studies limits the interpretation of the findings in terms of the level of the effect. The effect of alcohol outlet density could be related to greater individual access to alcohol, an individual level effect, or to the grouping of drinkers by neighborhood, a structural effect at the neighborhood level. METHODS: To differentiate between individual and neighborhood level possibilities, we conducted a multilevel study. Individual distance to the closest alcohol outlet was the individual level measure of the effect of alcohol outlet density whereas the mean distance to the closest alcohol outlet for all individuals within a census tract was the neighborhood level measure for the effect of alcohol outlet density. We analyzed telephone surveys of 2604 telephone households within 24 census tracts stratified by poverty status and alcohol outlet density. Individual distance to alcohol outlets, age, sex, race/ethnicity, and level of education were entered as individual level covariates, and their corresponding aggregated means were entered as census tract level covariates (i.e., mean distance to outlets, mean age, percentage male, percentage Black, mean education). RESULTS: Analysis of variance revealed that 16.2% of the variance in drinking norms and 11.5% of the variance in alcohol consumption were accounted for at the census tract level. In multivariate hierarchical analysis, individual distance to the closest alcohol outlet was unrelated with drinking norms and alcohol consumption, whereas mean distance to the closest alcohol outlet demonstrated a negative relation with drinking norms (betae = -5.50+/-2.37) and with alcohol consumption (betae 0.477+/-0.195); that is, the higher the mean distance to the closest alcohol outlet, the lower the mean drinking norms score and mean level of alcohol consumption. CONCLUSIONS: The findings suggest that the effect of alcohol outlet density on alcohol-related outcomes functions through an effect at the neighborhood level rather than at the individual level. Problem drinkers tend to be grouped in neighborhoods, an effect predicted by alcohol outlet density.
The cash value of reliability
Fisher, W. P., Jr. (2008, Summer). The cash value of reliability. Rasch Measurement Transactions, 22(1), 1160-3.
Mindfulness in measurement: Reconsidering the measurable in mindfulness
Solloway, S., & Fisher, W. P., Jr. (2007). Mindfulness in measurement: Reconsidering the measurable in mindfulness. International Journal of Transpersonal Studies, 26, 58-81.
Can an organic partnership of qualitative and quantitative data confirm the value of mindfulness practice as an... more
Can an organic partnership of qualitative and quantitative data confirm the value of mindfulness practice as an assignment in undergraduate education? Working from qualitative evidence suggesting the existence of potentially measurable mindfulness effects expressed in ruler measures, a previous study calibrated a mathematically invariant scale of mindfulness practice effects with substantively and statistically significant differences in the measures
before and after the assignment. Current efforts replicated these results. The quantitative model is described in measurement terms defined at an introductory level. Detailed figures and appendices are provided, and a program of future research is proposed.
Survey design recommendations
Fisher, W. P., Jr. (2006). Survey design recommendations. Rasch Measurement Transactions, 20(3), 1072-4.
Interpretation, validity, measurement, and mathematics
Fisher, W. P., Jr. (2007). Interpretation, validity, measurement, and mathematics. Measurement: Interdisciplinary Research and Perspectives, 5(2-3), 165-70.
The American Educational Research Association (AERA) 2007 Call for Proposals points out the need for researchers in... more The American Educational Research Association (AERA) 2007 Call for Proposals points out the need for researchers in education to look beyond their usual sources to other fields that have extensive experience in relevant theories and methods. In transitioning from a response to AERA's Call for Proposals to the text of these articles on Assessing Measures of Mathematical Knowledge for Teaching, one cannot help but be struck by (1) the heavy and repeated emphasis on interpretive arguments, their structure, and the relation of interpretation to validity; and (2) the complete lack of any use of, or even reference to, the decades and volumes of research and theorizing that have been invested in these issues in philosophy, history, anthropology, literary studies, sociology, psychology, and, most pointedly, the philosophy, history, and social studies of science. The papers presented in this journal issue provoke precisely the reaction that the authors of AERA's Call for Proposals must also have often experienced when reading research in education. The authors are trying to reinvent concepts, methods, and tools that already have long-standing histories of success and failure in other fields. In this article, the author offers his critique to the papers presented in this journal issue.
Reliability Statistics
Fisher, W. P., Jr. (1992). Reliability statistics. Rasch Measurement Transactions, 6(3), 238.
Reliabilities are often reported as though they were invariable characteristics of tests. Of course, they are not.... more Reliabilities are often reported as though they were invariable characteristics of tests. Of course, they are not. They depend not only on the construction of the test, but also on the distribution of the examinee sample tested. Conventionally, only person separation reliability is reported, but item separation statistics are also useful indicators. They tell how well this sample of examinees have spread out the items along the measure of the test, and so defined a meaningful variable.

