Kappa Statistic Vs Percent Agreement

Some researchers have expressed concern about the tendency of κ to take as data the frequencies of the observed categories, which may make them unreliable for measuring concordance in situations such as the diagnosis of rare diseases. In these situations, κ tends to underestimate the concordance on the rare category. [17] This is why κ is considered too conservative a degree of convergence. [18] Others[19][citation required] dispute the assertion that kappa “takes into account” random agreement. To do this effectively, there would need to be an explicit model of the impact of chance on evaluators` decisions. The so-called random adjustment of kappa statistics assumes that, if it is not entirely certain, evaluators simply advise – a very unrealistic scenario. Over-compliance of percentages and kappa have strengths and limitations. Over-rate statistics are easy to calculate and can be interpreted directly. Its main limitation is that it does not take into account the possibility of evaluators finding themselves on scores.

He could therefore overestimate the true concordance between the evaluators. Kappa was designed to take into account the ability to guess, but the assumptions it makes about the independence of the evaluator and other factors are not well supported, and it can therefore reduce the estimation of concordance excessively. In addition, it cannot be interpreted directly, and so it has become common for researchers to accept low levels of Kappa in their interracter reliability studies. A low level of reliability of interraters is not acceptable in the field of health or clinical research, especially when the results of studies may alter clinical practice in a way that leads to poorer patient outcomes. Perhaps the best advice for researchers is to calculate both percentage over-compliance and Kappa. While there are probably many assumptions among evaluators, it may be useful to use Kappa statistics, but if the evaluators are well trained and there are probably few presumptions, the researcher can certainly rely on the percentage of concordance to determine the reliability of the interraters. As Marusteri and Bacarea noted (9), there is never 100% certainty about research results, even if statistical significance is achieved. .