[email protected]
Some studies consider the question “Are two measurements of a characteristic of a subject by two methods, two sites, or by two observers sufficiently agree with one another?”. The objective is to find whether one can be replaced with the other without much loss of information. When the measurements are quantitative, such as hemoglobin level and creatinine level, the method of choice for assessing this agreement is the one developed by Bland and Altman.
The Bland-Altman (B-A) method requires the calculation of the limits (𝑑̅ – 2sd, 𝑑̅ + 2sd), where 𝑑̅ is the mean and sd is the standard deviation (SD) of the individual differences d = x – y. These limits are popularly known as Bland-Altman limits of agreement, although they are better understood as the limits of disagreement since they are based on the differences.
Under the Gaussian assumption, which is likely to hold because x and y are measuring the same quantity and the difference is likely to be just the measurement error, nearly 95 percent of the differences are likely to be within the B-A limits. An adequate agreement is inferred when these limits are narrow in the sense that the difference within these limits “would not affect decisions on patient management”. Let us call such limits of indifference as clinical tolerance limits.
The crucial aspect of the B-A limits is that their interpretation regarding agreement or no agreement entirely depends on the pre-specified limits of clinical tolerance. We argue in this communication that such limits of clinical tolerance can be directly used for assessing the extent of the quantitative agreement without calculating the B-A limits.
Direct Use of the Clinical Tolerance Limits: A Simple, Nonparametric, Robust, and More Appealing Alternative for Assessing AgreementWe propose direct use of prespecified clinical tolerance limits to find the percentage of differences within these limits and call this percentage agreement. Consider a pair of medical measurements (x, y) on a random sample of n subjects. The natural parameter of interest is the extent of agreement between the two measurements. Because of random fluctuations and possibly systematic differences, some difference between the observed values of x and y will almost invariably occur. Suppose the clinicians decide that this difference should not be less than 𝐶𝐿 or more than 𝐶𝑈 for it to be acceptable as of no clinical consequence. For example, in the case of aspartate aminotransferase (AST), if these limits are set at ±2 U/L, a difference within these limits will be considered as having no clinical significance. (𝐶𝐿, 𝐶𝑈) are the clinical tolerance limits and they would be around zero but may or may not be symmetric.
Define the extent of agreement 𝜋 = 𝑃(𝐶𝐿 < 𝑑 < 𝐶𝑈). The estimate of 𝜋 is the binomial proportion of the observed differences falling between (𝐶𝐿, 𝐶𝑈). If somebody wants to be more confident, the 95% lower confidence bound for 𝜋 can be obtained by one of the several methods but the Wilson score method can be recommended, which is implementable and generally considered to perform better. This will give the limit below which the proportion agreement is extremely unlikely.
This method measures the extent of agreement instead of a binary yes or no. Although dichotomization has its risks, for those who prefer binary result as agreement exists or not, we suggest a cut-off a little later. However, many researchers these days would like to measure the exact extent of agreement instead of binary yes or no and interpret it in their context. This direct method is simpler, nonparametric, and immediately tells the percentage agreement. The information regarding the percentage of the differences within and beyond tolerance is more useful in deciding whether the agreement is adequate, and this would assess clinical agreement in the true sense since it is based on clinical tolerance limits. This method uses all the individual differences and not their mean and SD. Perhaps many clinicians would prefer to use the percentage agreement to estimate the extent of agreement but, in case needed, the minimal agreement would be estimated by the lower confidence bound.
For those who prefer binary results, we recommend that at least 90% of differences should be within the clinical tolerance limits to conclude an adequate agreement. In place of 90%, any other desired percentage can be chosen by the investigator depending on the clinical context. Some clinicians would want no more than 1 or 2 percent values go beyond the clinical tolerance for agreement, and some may be willing to tolerate 10 percent or even higher deviation. Such flexibility is available under the direct method but not under the B-A method.
In an agreement setup, the tolerance limits should ideally be based on expected measurement error but can also be based on the clinical implication for managing a patient. If a researcher wants to add a condition, such as no difference should be more than two times the upper or lower tolerance limit, that can also be done in this method. Any big difference, howsoever isolated, raises the alarm regarding the agreement, and this method can be used to raise such an alarm.
Unlike the B-A limits, the clinical tolerance limits to be used in our method do not have to be symmetric with respect to any value – they can be (–a to +b) where a ≠ b and a or b can be zero depending upon the clinical context.
For further details, see https://www.preprints.org/manuscript/202108.0343/v1
doi: 10.20944/preprints202108.0343.v1