Thursday, August 8, 2013

How much of a Bayesian posterior distribution falls inside a region of practical equivalence (ROPE)

The posterior distribution of a parameter shows explicitly the relative credibility of the parameter values, given the data. But sometimes people want to make a yes/no decision about whether a particular parameter value is credible, such as the "null" values of 0.0 difference or 0.50 chance probability. For making decisions about null values, I advocate using a region of practical equivalence (ROPE) along with a posterior highest density interval (HDI). The null value is declared to be rejected if the (95%, say) HDI falls completely outside the ROPE, and the null value is declared to be accepted (for practical purposes) if the 95% HDI falls completely inside the ROPE. This decision rule accepts the null value only when the posterior estimate is precise enough to fall within a ROPE. The decision rule rejects the null only when the posterior exceeds the buffer provided by the ROPE, which protects against hyper-inflated false alarms in sequential testing. And the decision rule is intuitive: You can graph the posterior, its HDI, its ROPE, and literally see what it all means in terms of the meaningful parameter being estimated. (For more details about the decision rule, its predecessors in the literature, and alternative approaches, see the article on this linked web page.)

But how do we set the limits of the ROPE? How big is "practically equivalent to the null value"? There is typically no uniquely correct answer to this question, because it depends on the particular domain of application and the current practice in that domain. But the rule does not need a uniquely correct ROPE, it merely needs a reasonable ROPE that is justifiable to the audience of the analysis. As a field matures, the limits of practical equivalence may change, and therefore what may be most useful to an audience is a description of how much of the posterior falls inside or outside the ROPE as a function of the width of the ROPE. The purpose of this blog post is to provide some examples and an R program for doing that.

Consider a situation in which we want to estimate an effect size parameter, which I'll call δ. Suppose we collect a lot of data so that we have a very precise estimate, and the posterior distribution shows that the effect size is very nearly zero, as plotted in the left panel below:
We see that zero is among the 95% most credible values of the effect size, but we can ask whether we should decide to "accept" the value zero (for practical purposes). If we establish a ROPE around zero, from -0.1 to +0.1 as shown above, then we see that the 95% HDI falls entirely within the  ROPE and we accept zero. What if we think a different ROPE is appropriate, either now or in the future? Simply display how much of the posterior distribution falls inside the ROPE, as a function of the ROPE size, as shown in the right panel above. The curve plots how much  of the  posterior distribution falls inside a ROPE centered at the null value, as a function of the radius (i.e., half width) of the ROPE. As a landmark, the plot shows dashed lines at the 95% HDI limit farthest from the null value. From the plot, readers can decide for themselves.

Here is another example. Suppose we are spinning a coin (or doing something more interesting that has dichotomous outcomes) and we want to estimate its underlying probability of heads, denoted θ. Suppose we collect a lot of data so that we have a precise estimate, as shown in the left panel below.
We see that the chance value of 0.50 is among the 95% most credible values of the underlying probability of heads, but we can ask whether we should decide to "accept" the value 0.50. If we establish a ROPE around zero, from 0.49 to 0.51 as shown above, we can see that the 95% HDI falls entirely within the ROPE and we would decide to accept 0.50. But we can provide more information by displaying the amount of the posterior distribution inside a ROPE centered on the null value as a function of the width of the ROPE. The right panel, above, allows the readers to decide for themselves.

The plots can be used for cases of rejecting the null, too. Suppose we spin a coin and the estimate shows a value apparently not at chance, as shown in the left panel below.
Should we reject the null value? The 95% HDI falls entirely outside the ROPE from 0.49 to 0.51, which means that the 95% most credible values are all not practically equivalent to 0.5. But what if we used a different ROPE? The posterior distribution, along with the explicit marking of the 95% HDI limits, is all we really need to see what the biggest ROPE could be and still let us reject the null. If we want more detailed information, the panel on the right, above, reveals the answer --- although we need to mentally subtract from 1.0 to get the posterior area not in the ROPE (on either side).

Here is the R script I used for creating the graphs above. Notice that it defines a function for plotting the area in the ROPE.

#------------------------------------------------------------------------------

# From http://www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/Programs/ get
# openGraphSaveGraph.R, plotPost.R, and HDIofMCMC.R. Put them in the working
# directory of this script, or specify the path in the next lines' source()
# commands:
source("./openGraphSaveGraph.R")
source("./plotPost.R")
source("./HDIofMCMC.R")

# Define function for computing area in ROPE and plotting it:
plotAreaInROPE = function( mcmcChain , maxROPEradius , compVal=0.0 ,
                           HDImass=0.95 , ... ) {
  ropeRadVec = seq( 0 , maxROPEradius , length=201 ) # arbitrary comb
  areaInRope = rep( NA , length(ropeRadVec) )
  for ( rIdx in 1:length(ropeRadVec) ) {
    areaInRope[rIdx] = ( sum( mcmcChain > (compVal-ropeRadVec[rIdx])
                            & mcmcChain < (compVal+ropeRadVec[rIdx]) )
                         / length(mcmcChain) )
  }
  plot( ropeRadVec , areaInRope ,
        xlab=bquote("Radius of ROPE around "*.(compVal)) ,
        ylab="Posterior in ROPE" ,
        type="l" , lwd=4 , col="darkred" , cex.lab=1.5 , ... )
  # From http://www.indiana.edu/~kruschke/DoingBayesianDataAnalysis/Programs/
  # get HDIofMCMC.R. Put it in the working directory of this script, or specify
  # the path in the next line's source() command:
  # source("./HDIofMCMC.R")
  HDIlim = HDIofMCMC( mcmcChain , credMass=HDImass )
  farHDIlim = HDIlim[which.max(abs(HDIlim-compVal))]
  ropeRadHDI = abs(compVal-farHDIlim)
  areaInFarHDIlim = ( sum( mcmcChain > (compVal-ropeRadHDI)
                         & mcmcChain < (compVal+ropeRadHDI) )
                      / length(mcmcChain) )
  lines( c(ropeRadHDI,ropeRadHDI) , c(-0.5,areaInFarHDIlim) ,
         lty="dashed" , col="darkred" )
  text( ropeRadHDI , 0 ,
        bquote( atop( .(100*HDImass)*"% HDI limit" ,
                     "farthest from "*.(compVal) ) ) , adj=c(0.5,0) )
  lines( c(-0.5,ropeRadHDI) ,c(areaInFarHDIlim,areaInFarHDIlim) ,
         lty="dashed" , col="darkred" )
  text( 0 , areaInFarHDIlim , bquote(.(signif(areaInFarHDIlim,3))) ,
        adj=c(0,1.1) )
  return( list( ROPEradius=ropeRadVec , areaInROPE=areaInRope ) )
}

#------------------------------------------------------------------------------

# Generate a fictitious MCMC posterior:
m = 0.03
s = (0.1-m)/3
paramMCMCsample = rnorm(50000,m,s)

# Specify the desired comparison value, ROPE radius, and HDI mass:
compVal = 0.0
ropeRad = 0.1
HDImass = .95

# Open graphics window and specify layout:
openGraph(width=7.5,height=3.5)
par(mar=c(3.5,3.5,2,1),mgp=c(2,0.7,0))
layout( matrix( c(1,2) , nrow=1 , byrow=FALSE ) )

# Plot the posterior with HDI and ROPE:
postInfo = plotPost( paramMCMCsample , compVal=compVal ,
                     ROPE=compVal+c(-ropeRad,ropeRad) , showMode=TRUE ,
                     credMass=HDImass , xlab=expression(delta*" (parameter)") )

# Plot the area in ROPE:
ropeInfo = plotAreaInROPE( mcmcChain=paramMCMCsample , maxROPEradius=0.15 ,
                           compVal=compVal , HDImass=HDImass )


#------------------------------------------------------------------------------

9 comments:

  1. Interesting to read more about ROPE! It does feel a little bit like a Bayesian twist on hypothesis testing. A large ROPE results in a higher chance of accepting the null while a tight ROPE (tightrope?) gives a higher chance of rejecting the null. It would be interesting to see a paper where the ROPE decision rule is compared to other possible decision rules (Neyman–Pearson, Bayes factor, etc.).

    By the way, since you call it a ROPE, would it be ok to introduce some extra rope related nomenclature? For example, when the ROPE null can't be rejected nor accepted this could be a "tie", a reasonably tight ROPE could be a "tightrope", an unreasonable tight ROPE could be a "knot", etc.

    ReplyDelete
  2. I like your string of thought, and it might lead to a long thread of loosely tied responses. Wherever these PUNishable comments take us, let's agree in advance not to noose our sense of humor!

    ReplyDelete
  3. Very nice post and useful function. A version of BEST package with it is now on GitHub and I'll submit to CRAN after I've run some more checks.

    BTW is it clear that the landmark "HDI limit farthest" is the radius at which the HDI lies entirely within the ROPE? Ie. your preferred decision rule for accepting the hypothesis.

    ReplyDelete
  4. Mike:

    That's a good point you make about the meaning of the farthest HDI limit.

    And thank you again for creating the BEST package!

    --John

    ReplyDelete
  5. Is there a way to frame this criterion as the solution to the minimiation of the posterior expected loss for some loss function? If so, what would be the corresponding loss function?

    I started a question here: http://stats.stackexchange.com/questions/103067/rope-vs-hdi-loss-function and it would be great to get your input or someone elses on it.

    Thanks for the great post!

    ReplyDelete
  6. 2 questions on ROPE:

    1. Which do you think more appropriate as a decision criterion a) require the HDI interval to be within ROPE, or b) set a lower limit on the posterior probability contained within ROPE.

    2. What about a confidence level? For instance consider the philosophy behind a Bayesian fixed-in-advance beta-content tolerance interval (e.g., see Wolfinger) in which both a content (proportion) and a confidence level are specified. Specifying a confidence level provides a level of assurance that the result is repeatable. I think this question may apply mostly to a posterior predictive situation.

    Thank you for your thoughts!!

    Dave LeBlond (david.leblond@sbcglobal.net)

    ReplyDelete
  7. > 1. Which do you think more appropriate as a decision criterion a) require the HDI interval to be within ROPE, or b) set a lower limit on the posterior probability contained within ROPE.

    I take the posterior density seriously, which is why it makes sense to me to consider the relation of the HDI to the ROPE. The decision is to "accept" the null value is based on the 95% most probable values all being practically equivalent to the null value. The decision to "reject" he null value is based on the 95% most probable values all not being practically equivalent to the null value.

    Suppose instead that we accept a ROPEd value if (say) 95% of the posterior mass falls within the ROPE, and we reject a ROPEd value if (say) 95% of the posterior mass falls outside the ROPE. Then we would get weird conclusions when distributions are strongly skewed. Consider Figure 12.2, p.342, of DBDA2E, which shows a skewed distribution with its 95% HDI and its 95% equal-tailed interval (ETI). Suppose the ROPE boundaries are at the ETI boundaries. Then we would accept the ROPEd value despite the fact that there are parameter values with high posterior probability that are not practically equivalent to the ROPEd value. Or, suppose that the right side of the ROPE is at the left edge of the ETI in Figure 12.2. Then we would reject the ROPEd value despite the fact that there are parameter values with high posterior probability that are practically equivalent to the ROPEd value.

    > 2. What about a confidence level? For instance consider the philosophy behind a Bayesian fixed-in-advance beta-content tolerance interval (e.g., see Wolfinger) in which both a content (proportion) and a confidence level are specified. Specifying a confidence level provides a level of assurance that the result is repeatable. I think this question may apply mostly to a posterior predictive situation.

    I don't have much to say about this. I think, though, that you're leaning toward a frequentist criterion applied to a Bayesian posterior distribution. There's nothing inherently wrong with that, but it's a different animal than what's being pursued with the HDI and ROPE. With a frequentist criterion you need to make assumptions about your sampling intention --which you can do-- but then the criterion is linked to your specific sampling assumptions.

    ReplyDelete
  8. Somewhat related to the loss function question, why use the HDI and not just compute P(\theta outside ROPE)? If it is "high" it seems like reason to reject the ROPE.

    ReplyDelete
  9. Dear Anonymous March 18:

    I think that question is already answered (perhaps implicitly) in my last comment. I like a decision rule that treats the densities seriously. To illustrate with an extreme contrived example, suppose that 98% of the distribution is outside the ROPE, but it is so spread out that its density is actually lower than the 2% of the distribution inside the ROPE. Then the most credible (highest density) values are inside the ROPE, even if most of the distribution is outside the ROPE.

    That's why I like a decision rule that actually uses the highest-density values. This isn't a uniquely correct decision rule, it's just one that makes a lot of sense if you treat densities seriously.

    ReplyDelete