Measuring agreement in medical informatics reliability studies

George Hripcsak; Daniel F Heitjan

doi:10.1016/s1532-0464(02)00500-2

Measuring agreement in medical informatics reliability studies

J Biomed Inform. 2002 Apr;35(2):99-110. doi: 10.1016/s1532-0464(02)00500-2.

Authors

George Hripcsak¹, Daniel F Heitjan

Affiliation

¹ Department of Medical Informatics, Columbia University, 622 West 168th Street, VC5, New York, NY 10032, USA. hripcsak@columbia.edu

PMID: 12474424
DOI: 10.1016/s1532-0464(02)00500-2

Abstract

Agreement measures are used frequently in reliability studies that involve categorical data. Simple measures like observed agreement and specific agreement can reveal a good deal about the sample. Chance-corrected agreement in the form of the kappa statistic is used frequently based on its correspondence to an intraclass correlation coefficient and the ease of calculating it, but its magnitude depends on the tasks and categories in the experiment. It is helpful to separate the components of disagreement when the goal is to improve the reliability of an instrument or of the raters. Approaches based on modeling the decision making process can be helpful here, including tetrachoric correlation, polychoric correlation, latent trait models, and latent class models. Decision making models can also be used to better understand the behavior of different agreement metrics. For example, if the observed prevalence of responses in one of two available categories is low, then there is insufficient information in the sample to judge raters' ability to discriminate cases, and kappa may underestimate the true agreement and observed agreement may overestimate it.

Publication types

Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, P.H.S.
Review

MeSH terms

Decision Making
Humans
Medical Informatics / standards*
Medical Informatics / statistics & numerical data*
Models, Theoretical
Reproducibility of Results

Abstract

Publication types

MeSH terms

Grants and funding