Agreement Between Raters


Compliance studies between experts are often conducted to assess the reliability of diagnostic and screening tests. Many screening and diagnostic results are categorized by a classified category scale. For example, radiologists use the Breast Imaging Reporting and Data System (BI-RADS) scale to classify breast density using mammographic screenings. BI-RADS is an ordination classification scale of four categories ranging from A (almost entirely oily) to D (extremely dense) to reflect increasing breast density [1]. Agreements and associations provide useful summaries for ordinal classifications. The measures of the agreement focus on assessing the accuracy of compliance (i.e. where advisors assign exactly the same category to the test result of a subject), while the association measures take into account the levels of correspondence between the classifications of the evaluators. For example, the level of disagreement between two councillors, who independently classify the same mammogram in categories A and D, is higher than the degree of disagreement between two councillors, each of whom independently categorizes the same mammogram in categories A and B. Association measures are sometimes considered weighted matching measures in which pairs of more similar council classifications are assigned to a higher weight (“credit”).

In ordinal classifications, association measures are often preferred or used in combination with (exact) matching measures. The association measures take into account the extent of the differences of opinion between the councillors, with more appropriations for the classifications of the pairs of advisers, which coincide more closely. Figure 1 shows the difference between the agreement and the association. The measurement agreement only takes into account the exact consistency in the classifications by the evaluators (any ambiguous classification does not receive “credit”) [Figure 1a]. With respect to the measure of association, the exact correspondence between advisors obtains the highest “credit,” while classifications that differ from a scale obtain the second highest credit and ranking rate, which are distinguished by two scales, the third highest “credit” etc. [Figure 1b] Therefore, the association ratio is generally higher than the number of agreements. In the example of Figure 1, Cohen`s cap for the agreement between the two advisors is 0.40, while Cohen`s weighted kappa for the association is 0.65. THE CCI ranges from 0 to 1Koo and Li interprets ICC as follows: 0.90 indicate excellent reliability [27].

Comments are closed.