How to Calculate an Inter-Rater Agreement

An inter-rater agreement is a measure of how consistent measurements are between two or more observers (or "raters"). For example, medical researchers often use a group of physicians to observe and categorize the effects of a treatment experiment. Measuring the consistency between observers is important because we want to be sure that any effects we see are due to our experiment and not to big differences between the observers. The primary statistic used to measure inter-rater agreement is called "kappa."

Instructions

    • 1

      Calculate percent agreement by dividing the number of observations that agree over the total number of observations. Consider the example of an experiment involving 100 patients given a treatment for asthma. Two physicians are asked to categorize the patients into two categories: controlled (yes) or not controlled (no) asthma symptoms. The results might look like this:

      Both Raters Yes 70

      Both Raters No 15

      Rater #1 Yes/Rater #2 No 10

      Rater #1 No/Rater #2 Yes 5

      Total observations: 100

      In this example, the percent agreement (Pa) would be the number of observations in which the raters agree (70+15) divided by the total observations (100), or 85 percent. This will be used in the numerator in the final equation.

    • 2

      Calculate the expected agreement due to random chance. Some of the observed agreement will be due to pure luck. We can calculate how much agreement would be seen if the observations were completely random.

      In our example:

      Both Yes 70

      Both No 15

      #1Yes/#2 No 10

      #1 No/#2 Yes 5

      Total observations: 100

      Dr. #1 rated Yes 80/100 or 80 percent of the time (.80).

      Dr. #1 rated No 20/100 or 20 percent of the time (.20).

      Dr. #2 rated Yes 75/100 or 75 percent of the time (.75).

      Dr. #2 rated No 25/100 or 25 percent of the time (.25).

      If the observations were random, the probability of both rating Yes would be (.80)x(.75), or 60 percent, and the probability of both rating No would be (.15)x(.25), or 4 percent.

      So the total probability of agreement due to chance (Pe) would be the sum of these: (.60)+(.04), or 64 percent. This will be used in the numerator and the denominator of the final equation.

    • 3

      Calculate kappa by using the following equation:

      k= (Pa)-(Pe)

      ---------------

      1- (Pe)

      In our example,

      k= (.85)-(.64)/1-(.64)=(.58) or 58 percent

    • 4

      Evaluate kappa to determine if inter-rater agreement is strong. A good rule of thumb is that kappas above 80 percent are excellent and above 60 percent are good. Anything below 60 percent is considered less than optimal agreement.

      In our example, 58 percent is on the borderline of good inter-rater reliability, and study results should be interpreted with this in mind.

    • 5

      Evaluate kappa for more than two raters by using Fleiss' kappa calculations. This is a much more complicated calculation that should be done by computer.

Medical Research - Related Articles