Agreement Assessment without Bland-Altman Limits
Abhaya Indrayan
Suppose a company claims that they have devised an improvised glucometer that measures the blood glucose level from the finger prick almost as accurately as by a laboratory requiring conventional venous sampling – thus obviating the need for venous sampling. The company says that its glucometer reading makes adjustments for higher values obtained in capillary blood. They approached a clinic to conduct a study on a sample of subjects and record their reading by the glucometer and the laboratory. The question was whether these readings were in sufficient agreement. If yes, the glucometer – being much more convenient – can be recommended in place of venous sampling.
The method of choice for assessing such quantitative agreement is the one developed by Bland-Altman (B-A). This method requires calculation of the Gaussian distribution-based B-A limits given by (Mean of the differences - 2*SD of the differences, Mean of the differences + 2*SD of the differences). These limits turned out to be (–7.62, +9.83) mg/dL for the persons included in the study. The B-A method requires that the B-A limits be compared with the prefixed clinical tolerance limits to decide that the agreement exists or not.
For clinical tolerance limits, the company claims that the reading by their glucometer can still be higher but by no more than 5 mg/ dL despite adjustment but not lower by 2 mg/dL due to random error. Thus, the clinical tolerance limits are (–2, +5) mg/dL. The B-A limits were wider, and the conclusion was that the glucometer readings do not sufficiently agree with the laboratory readings of venous blood.
The detailed examination of the data revealed that most of the differences were small but were large for a few subjects, particularly those with higher blood glucose levels. These large values distorted the mean and SD of the differences – thus also the B-A limits. Almost 93% of the differences were within the clinical tolerance limits (–2, +5) mg/dL. Thus, direct use of clinical tolerance limits indicates that the two readings were sufficiently close in a large number of subjects and the agreement was good.
This example shows that the assessment of agreement can be easily done by direct use of clinical tolerance limits without calculating B-A limits. For details, see https://www.preprints.org/manuscript/202108.0343/v1. This direct method is nonparametric and more robust as there is no need to worry about the distribution pattern of the differences or of the outliers that can distort their mean and SD. This uses all the differences and not just their mean and SD – thus possibly more appealing too. The B-A limits use 95% coverage that can be termed as arbitrary as 5% level of significance. There is no need to use these arbitrary limits under the direct method.
There are several other advantages of direct use of clinical tolerance limits for assessing agreement. If somebody wants to add a condition that not more than, say, 1% of the difference should be more than twice of lower or the upper clinical tolerance limits that can be done. In our example, this condition for lower limit could mean that not more than 1% differences should be less than -4 mg/dL. If the differences are likely to be proportional such as in this example (higher differences for higher values of blood glucose level), the clinical tolerance limits can be set to reflect this without log-transformation. For example, the clinical tolerance limits can be 2% of the values obtained by the laboratory. Such flexibility is not available with B-A limits.
The B-A method of assessing agreement is an iconic method with several papers on its extension and on its merits and demerits, and thousands have used this method for assessing agreement. The method is firmly entrenched so a better alternative would require intensive scrutiny. The method has indeed been extremely successful in distinguishing agreement between individual values and equivalence of group means or high correlation. Even the regression with gradient = 1 and constant = 0 can occur when the values are very different. However, over dependence on pre-specified clinical tolerance limits makes the B-A limits redundant and the assumption of Gaussian distribution of the differences and sometimes of the lower and the upper limits makes it vulnerable.
For some, an advantage of B-A limits is that it gives binary results – agreement exists or not. For others, percentage agreement by direct use of clinical tolerance limits may be more informative as it can be interpreted in the clinical context of the specific problem. However, if a binary result is required, a cut-off of 90% can be suggested. If at least 90% of the differences are within clinical tolerance limits, the agreement can be considered adequate, otherwise not. For better assurance, the lower confidence bound at a desired level can be obtained for percentage agreement using binomial distribution. In our example, where the agreement was 93%, the lower bound was 90.5%, and this suggests that it was adequate with a 90% cut-off.
As mentioned earlier, the B-A method is firmly entrenched and extensively used. Any alternative, such as the one suggested now for using the clinical tolerance limits, needs to be thoroughly examined before it is accepted.
Abhaya Indrayan
Suppose a company claims that they have devised an improvised glucometer that measures the blood glucose level from the finger prick almost as accurately as by a laboratory requiring conventional venous sampling – thus obviating the need for venous sampling. The company says that its glucometer reading makes adjustments for higher values obtained in capillary blood. They approached a clinic to conduct a study on a sample of subjects and record their reading by the glucometer and the laboratory. The question was whether these readings were in sufficient agreement. If yes, the glucometer – being much more convenient – can be recommended in place of venous sampling.
The method of choice for assessing such quantitative agreement is the one developed by Bland-Altman (B-A). This method requires calculation of the Gaussian distribution-based B-A limits given by (Mean of the differences - 2*SD of the differences, Mean of the differences + 2*SD of the differences). These limits turned out to be (–7.62, +9.83) mg/dL for the persons included in the study. The B-A method requires that the B-A limits be compared with the prefixed clinical tolerance limits to decide that the agreement exists or not.
For clinical tolerance limits, the company claims that the reading by their glucometer can still be higher but by no more than 5 mg/ dL despite adjustment but not lower by 2 mg/dL due to random error. Thus, the clinical tolerance limits are (–2, +5) mg/dL. The B-A limits were wider, and the conclusion was that the glucometer readings do not sufficiently agree with the laboratory readings of venous blood.
The detailed examination of the data revealed that most of the differences were small but were large for a few subjects, particularly those with higher blood glucose levels. These large values distorted the mean and SD of the differences – thus also the B-A limits. Almost 93% of the differences were within the clinical tolerance limits (–2, +5) mg/dL. Thus, direct use of clinical tolerance limits indicates that the two readings were sufficiently close in a large number of subjects and the agreement was good.
This example shows that the assessment of agreement can be easily done by direct use of clinical tolerance limits without calculating B-A limits. For details, see https://www.preprints.org/manuscript/202108.0343/v1. This direct method is nonparametric and more robust as there is no need to worry about the distribution pattern of the differences or of the outliers that can distort their mean and SD. This uses all the differences and not just their mean and SD – thus possibly more appealing too. The B-A limits use 95% coverage that can be termed as arbitrary as 5% level of significance. There is no need to use these arbitrary limits under the direct method.
There are several other advantages of direct use of clinical tolerance limits for assessing agreement. If somebody wants to add a condition that not more than, say, 1% of the difference should be more than twice of lower or the upper clinical tolerance limits that can be done. In our example, this condition for lower limit could mean that not more than 1% differences should be less than -4 mg/dL. If the differences are likely to be proportional such as in this example (higher differences for higher values of blood glucose level), the clinical tolerance limits can be set to reflect this without log-transformation. For example, the clinical tolerance limits can be 2% of the values obtained by the laboratory. Such flexibility is not available with B-A limits.
The B-A method of assessing agreement is an iconic method with several papers on its extension and on its merits and demerits, and thousands have used this method for assessing agreement. The method is firmly entrenched so a better alternative would require intensive scrutiny. The method has indeed been extremely successful in distinguishing agreement between individual values and equivalence of group means or high correlation. Even the regression with gradient = 1 and constant = 0 can occur when the values are very different. However, over dependence on pre-specified clinical tolerance limits makes the B-A limits redundant and the assumption of Gaussian distribution of the differences and sometimes of the lower and the upper limits makes it vulnerable.
For some, an advantage of B-A limits is that it gives binary results – agreement exists or not. For others, percentage agreement by direct use of clinical tolerance limits may be more informative as it can be interpreted in the clinical context of the specific problem. However, if a binary result is required, a cut-off of 90% can be suggested. If at least 90% of the differences are within clinical tolerance limits, the agreement can be considered adequate, otherwise not. For better assurance, the lower confidence bound at a desired level can be obtained for percentage agreement using binomial distribution. In our example, where the agreement was 93%, the lower bound was 90.5%, and this suggests that it was adequate with a 90% cut-off.
As mentioned earlier, the B-A method is firmly entrenched and extensively used. Any alternative, such as the one suggested now for using the clinical tolerance limits, needs to be thoroughly examined before it is accepted.