Matter 3 - Validity of continued use of utilities
Background to matter arising
The sponsor raised the following in its requesting letter for review.The sponsor says ’PBAC concern with regard to continuing to use the same utility values in spite of the sponsor’s efforts to address these concerns in its responses – paragraph 4 of the PBAC minutes’
The nature of PBAC concern is stated in the minutes of the March 2006 PBAC meeting (item 7.8.25): ‘The PBAC remained concerned about the continuing use of the same utilities and disutilities in the model where the sensitivity analyses indicate the model is sensitive to the assumptions used to derive the incremental utility estimates from the trial-based outcome measures.’
The nature of the concerns by the sponsor outlined in the November 2005 sponsor submission relate to the response by PBAC to earlier submissions that stated “the utilities in the base case model might be skewed downward since the experts may not have accounted for some fractures being asymptomatic. The Model should separate out the utilities for a painful and non-painful vertebral fracture rather than relegating this to a sensitivity analysis.”
The sponsor argues that the “comparison of their AQoL derived utility weight (0.217) being substantially lower than the utility elicited for a ‘good outcome’ vertebral fracture using time trade-off (0.257) is not valid. The sponsor proceeds to state that the comparison is not relevant for several reasons:
- Adjusting the submission figure to account for 30% of fractures being symptomatic the utility is actually 0.266 and close to the Salkeld 0.258 utility.
- It is inappropriate to compare utilities elicited using different methods such as HUI vs EQ-5D (PBAC overview for the July 2005 meeting, PBAC OVR 7.6.1]
- Patients in the model have severe previous fractures. Therefore, their utility with a further fracture is expected to be below the Salkeld et al 0.259 utility for an initial or subsequent fracture
On the question of separating out the utilities for a painful and non-painful vertebral fracture rather than relegating this to a sensitivity analysis, the sponsor states that this was excluded from the utility survey “as it was felt that the experts could better evaluate the proportion of fractures that would cause painful symptoms rather than providing a figure (i.e. 70%). However, to account for this unlikely oversight by the experts, a sensitivity analysis assumes that 70% of fractures would be asymptomatic”.
”The evaluators acknowledge the possibility that the 70% figure for the asymptomatic portion is, in fact, less in this sub set 97.6 EVAL.22]. In this case the acute and chronic fracture utilities in the model would actually be skewed upwards leading to a conservative analysis.”
Reviewer’s understanding of key issues
Based on the matters raised by the sponsor regarding utilities, the specific matters reviewed are:- whether the continued use of the same utilities and disutilities in the model in the cost utility analysis in the submission is sufficiently justified.
- whether the AQoL derived utility weights incorporated expert judgement on the proportion of fractures that would cause painful symptoms. In the event that the expert’s valuation does not include a judgement about the proportion of symptomatic fractures, are the sensitivity analyses adjustment for the disutility associated with asymptomatic fractures plausible?
- whether the AQoL derived utility weights for vertebral fractures are comparable with utilities derived for other health states resultant to using TTO or other multi-attribute instruments.
Materials considered
All materials considered by the PBAC relating to the derivation of utility weights were reviewed. In particular, key reference documents (page 50) used in making judgements about concepts and methods relating to the derivation of the utility weights in the submissions included the PBAC Guidelines for cost effectiveness analysis submissions and the textbook authored by Drummond on Methods for the Economic Evaluation of Health Care Programmes.Reviewer’s opinion
1. Is the continued use of the utility weights used in the submission sufficiently justified?
(a) Choice of QoL measurement instrument
In the modeled economic evaluation of the July 2005 and March 2006 submissions, quality of life was accepted as an appropriate final outcome of therapy and hence the need for a utility-based measure of quality of life to generate a QALY measure.For economic evaluation, the sponsor uses a QALY framework, which requires the derivation of QALY weights (referred to as ‘utility’ weights by the sponsor).
- According to the PBAC Guidelines “Where a quality of life instrument is used, details should be provided on the instrument. Because currently there is controversy over which quality of life instruments are most acceptable, special attention should be paid to the following parameters:
- the validity of the instrument;
- the reliability of the instrument;
- the responsiveness of the instrument to differences in health states between individuals and to changes in health states over time experienced by any one individual; and
- the clinical importance of any differences detected by the instrument. "
The generally accepted requirements to generate a QALY measure are summarized by Drummond et al which state that to “satisfy the QALY concept……the quality weights must be (a) based on preferences, (b) anchored on perfect health and death, and (c) measured on an interval scale."
The sponsor uses the Assessment of Quality of Life (AQoL) Instrument to derive utility weights for five health states related to the comparator and outcomes of drug therapy.
Conclusion
The AQoL instrument would be regarded as meeting guideline requirements for a valid and reliable instrument that is sensitive to clinically important changes in health status. The choice of the AQoL instrument is therefore justified in the context of this submission.
Top of page
(b) Protocol for the derivation of utilities
BackgroundThe current PBAC guidelines do not require the sponsor to outline the protocol for generating utilities. However, a protocol is the best means by which a reviewer can justify the final utility weights used in their QALY model. An example of a framework for generating a protocol for utility measurement and valuation by Furlong et al is contained in appendix 1 (page 51). The absence of a protocol for the derivation of utilities in the submission means that elements of the utility survey are not adequately justified in the text of the submission.
Instead, the sponsor describes the process by which a group of clinical experts were asked to complete a postal survey using the AQoL instrument. The utility weights used in the QALY calculation were based on the responses of 8 experts, although there are 9 declaration forms from experts contained in Appendix J of the July 2005 submission.
To assess whether the sponsor has adequately justified the utilities used in the QALY model, each element of utility measurement and valuation is addressed in the review. The elements of utility measurement and valuation include:
- Choosing health states and descriptors
- Selecting a measurement instrument
- Whose preferences (respondents and sample size)
The March 2005 and November 2005 sponsor submissions state that utility weights were derived for five health states, they are:
- the comparator (base case) – “An individual with experience of 3 vertebral fractures one of which was a severe grade vertebral fracture (SQ3 grade)”, and
- four additional health states are described, based on the QoL for an individual at 2 weeks, 6 months, 12 months and up to 10 years post an additional moderate vertebral fracture.
The health state descriptor and duration of the health states are appropriate, with one exception. The use of the word ‘moderate’ in the health state descriptions is crucial to the valuation of the potential outcomes. The economic model makes no distinction as to the type of vertebral fracture avoided beyond accounting for clinically evident and asymptomatic vertebral fractures in the sensitivity analysis. The use of the descriptor ‘moderate’ in the health states may represent a simplification of the range of possible vertebral fracture outcomes (from asymptomatic through to a severe vertebral fracture) but may also induce some bias in the QoL measurement by failing to make any distinction between symptomatic and asymptomatic vertebral fractures.
Conclusion The health state descriptors do not distinguish between asymptomatic and clinically evident vertebral fractures. This is discussed further in the next section.
Top of page
ii. Measurement instrument
The AQoL is a multiattribute utility instrument that has used both the time trade-off (TTO) and person trade-off (PTO) methods to develop a utility-based scoring algorithm. The sponsor omitted three QoL questions from the survey, those concerned with vision, hearing and speech, because these would be unaffected by vertebral fracture. This was stated on the survey form received by the respondents. It is inappropriate to omit questions from a standardized survey instrument even if their exclusion is highly unlikely to affect the final valuation.
Conclusion The AQoL is an appropriate utility-based QoL measurement instrument
iii. Whose preferences?
The AQoL scoring algorithm is based on the responses from a community sample so the values used in the sponsor’s QALY estimates are appropriate. The key question is whether the measurement process, where 9 clinicians were asked to map the health states into the AQoL was adequately justified.
The PBAC Guidelines provide no information on whose preferences should be used to map health states in a multiattribute utility instrument. In addition, there is not any guidance on the sample size required for this exercise. The sponsor does not provide an adequate justification for the selection of the 9 experts nor for the sample size.
Reviewer’s summary
The use of the AQoL multiattribute utility instrument is appropriate for the purpose of deriving utility weights for the vertebral fracture health states. The instrument is valid and reliable and uses community values to ascertain a utility score. The role of the nine clinical experts in mapping the 5 health states onto the AQoL index was appropriate but there was inadequate justification of the selection of the experts and the sample size. The health state descriptors do not distinguish between asymptomatic and clinically evident vertebral fractures.Conclusion
Both the PBAC guidelines for deriving health state utility weights and the sponsor’s justification of their approach to deriving the utility weights are inadequate.2. Whether the AQoL derived utility weights incorporated the expert’s judgement on the proportion of fractures that would cause painful symptoms and whether the sensitivity analyses adjustment for the disutility associated with asymptomatic fractures are plausible?
Top of page
Background
The sponsor comments in the re-submission for consideration at the March 2006 meeting of the PBAC say: ’ The re-submission included a modelled economic evaluation using alendronate as the comparator. This model was constructed to be deliberately conservative in the use of the clinical and epidemiological evidence over the ten-year period of the model. Including less conservative assumptions in the model would reduce the ICER substantially. However all sensitivity analyses conducted were done so using this base case model. The evaluator has presented the table of sensitivity analyses presented in the re-submission (Table 83 of the submission) but has not commented on the two-pages of discussion of the plausibility of these analyses. We have re-presented this Table, with this response, but have re-ordered the analyses based on the ICER. (see Table -“reproduced without alteration in this review as Table 7 on page 46”)
Several scenarios were included in the sensitivity analysis of this conservative model to determine the effect on the modelled evaluation. Essentially these analyses suggest that the cost-effectiveness ratios are sensitive to changes in assumptions, however, the incremental cost-effectiveness ratio for teriparatide remains below $45,000-$75,000 1 per QALY gained for all sensitivity analyses, with the exceptions being the most extreme and least plausible analyses. Even under these scenarios the ratio remains in the range $45,000-$105,0002 The cost-effectiveness of teriparatide is insensitive to the assumptions around residential care, the mortality risk increase following vertebral fracture, and the discount rate.
A two-way sensitivity analysis was carried out, following that performed by the evaluators in the PES commentary on the March 2005 submission. In that commentary, an analysis was presented where the disutilities from new fracture were low and used in conjunction with the upper 95% confidence limit for fracture relative risk (0.51) for the entire GHAC population.
When these scenarios are used together with the mean relative risk of fracture for alendronate (0.533), the ICER increases to >$200,0003 However, to accept this sensitivity analysis as plausible would assume that a new fracture in an SQ3 patient would not be associated with any disutility. As we have presented in the submission, these patients are already suffering from multiple fractures, experience clinical levels of pain and have a low HR QoL. New fractures in this patient will be associated with further pain and disability and negative impacts on the activities of daily living, thus it is reasonable to expect that new fractures will further compound the patient’s condition and will be associated with further disutility.
1. ICER replaced with ICER range, consistent with PBAC procedures for PSD
2. As above
3. As above
(a) Accounting for asymptomatic vertebral fractures in the QALY model
Key issuesThe main issue here is how the QALY model accounts for an estimated 70% of vertebral fractures being asymptomatic and 30% being symptomatic. There is an additional issue about the exact proportion of vertebral fractures that are asymptomatic in the sub set of patients modeled in the economic evaluation.
Background to matter arising
- In its November 2005 submission, the sponsor states that “the major concern that the PES and ESC had with the utilities was that the descriptions of the health states in the utility study might not capture the fact that around 70% of vertebral fractures are asymptomatic. The sponsor goes on to say that the PES itself pointed out that “one would expect experts to be aware that most morphometric fractures are not clinically evident {July 2005 Commentary on the Resubmission 7.6.22}
In the PBAC July 2005 minutes, the quote in italics above was accompanied by a clear concern by PBAC about the impact of not explicitly stating this fact in the utility survey. The July 2005 PBAC minutes, state that “although one would expect experts to be aware that most morphometric vertebral fractures are not clinically evident, it may be that the survey by its structure and/or implementation skewed the utilities downward by not mentioning this fact prominently in the instructions….The fact that the scenario to be assessed involves a patient with a severe vertebral fracture and that it is one of three vertebral fractures in total is prominently emphasized, and reinforced by noting that the next vertebral fracture encountered is “moderate.” On the other hand, it is possible that the 70% figure for the asymptomatic portion is, in fact, less in this sub-set. No good data is available regarding any aspect of this topic, which makes the derivation of utilities highly uncertain and a particularly important element to capture in the assessment of uncertainty. In all cases it seems evident that the primary model should use utilities that account for a significant portion of vertebral cases being asymptomatic, rather than relegating this scenario to a sensitivity analysis.”
Reviewer’s opinion-survey design and QALY model
The problem here is in the QoL survey design and the QALY model. The probability of entering a health state should be considered separate to the task of valuing that health state. Hence there should have been separate health states for a symptomatic and asymptomatic fracture used in the AQoL health utility survey. The proportion of asymptomatic vertebral fractures (be it 70% or otherwise) would be a transition probability in the economic model (as would the probability of having a symptomatic fracture). The probability would be multiplied by the relevant utility weight for a symptomatic or asymptomatic vertebral fracture.
Top of page
Conclusion
Based on the data presented in the submission, there is no way of assessing whether the expert respondents did or did not consider the proportion of asymptomatic fractures in mapping the health states into the AQoL.
There is a flaw in the design of the utility survey and cost utility model. The probability of entering a health state should have been separated from the task of valuing the health state.
(b) Reviewer’s opinion-sensitivity analysis
Where there is uncertainty in one of more parameters in the economic model, sensitivity analysis should be used to assess how varying the parameter(s) impacts on the study results. Sensitivity analysis is also used to quantify the level of uncertainty relating to the methodological assumptions of the study. In the two submissions under review here, there is uncertainty relating to the mean utility weight and to the method used to derive the utility weight.The sponsor has allowed for uncertainty in the method (for deriving the utility weights) and in the utility score itself by modeling two scenarios: a) one where 70% of asymptomatic fractures are assumed to have one third of the disutility associated with clinical fractures and another b) where 70% of asymptomatic fractures are assumed to have no disutility relative to the baseline health state.
Conclusion
This is an appropriate way to deal with the uncertainty surrounding the derivation of the final utility weights. It is plausible to assume that there is disutility associated with a symptomatic fracture in a SQ3 patient.
(c) Reviewer’s opinion-Inclusion of asymptomatic vertebral cases
On the question of whether the base case in the QALY model should include a significant proportion of vertebral cases being asymptomatic, there is a difference of opinion between the sponsor and PBAC on whether the experts’ valuations, using AQoL-based utility survey, do allow for a proportion of vertebral fractures being asymptomatic.There is no way of to make an objective assessment on this without going back and asking the experts whether they did or did not consider the proportion of asymptomatic versus symptomatic fractures. This points to a flaw in the survey design. It was not clear whether the vertebral fracture health states represented a clinically evident or asymptomatic fracture. It is inappropriate to expect the valuer (expert) to impute proportions when mapping the health state into the AQoL matrix. It is the health state that is being valued, not the probability of entering that state. The probability, is a measure of chance of entering the health state and rightly belongs as a separate parameter in the economic model.
Conclusion
The base case ICER (cost per QALY) should be based on the utility valuation of separate health states (symptomatic versus non-symptomatic) and an appropriate transition probability of a person in the model entering that state. It is impossible to assess whether the sponsor’s base case cost per QALY does allow for the proportion of symptomatic versus asymptomatic vertebral fractures because of a flaw in the design of the utility survey.
The two scenarios in the sensitivity analysis that allow for 70% of asymptomatic fractures are assumed to have one third of the disutility associated with clinical fractures and another where 70% of asymptomatic fractures are assumed to have no disutility relative to the baseline health state. The two scenarios (a and b described above) are both clinically plausible and better reflect the likely impact of uncertainty surrounding both the method used to derive the utility weights and the final utility weights themselves.
Top of page
Reviewer’s summary
The correct approach to modeling the utility/disutility associated with a symptomatic and asymptomatic vertebral fracture would have been to value the health states separately in the AQoL expert survey. This was not done in the AQoL survey contained in the sponsor’s submission and there is no way of assessing whether the expert respondents did allow for the proportion of asymptomatic fractures in their responses. Given this limitation, it is reasonable and plausible to include the two scenarios in the sensitivity analysis allow (a) for 70% of asymptomatic fractures assumed to have one third of the disutility associated with clinical fractures and (b) another where 70% of asymptomatic fractures are assumed to have no disutility relative to the baseline health state.3. The comparability of the AQoL derived utility weights with other health states, such as the utility associated with a hip facture
BackgroundThe comparability of the acute vertebral fracture utility elicited using the AQoL (0.217) to other health states, such as hip fracture, may provide evidence on the validity of the utility weights used in the two submissions.
The sponsor states that a comparison of the acute vertebral fracture utility elicited using the AQoL (0.217) to the ‘good outcome’ hip fracture utility weight using TTO (Salkeld et al) is not relevant. Three reasons are given by the sponsor:
- the adjusted (for 30% of fractures being asymptomatic) utility is 0.266 not 0.217 and 0.266 is close to the hip fracture utility of 0.31;
- that the PES state that it is inappropriate to compare utilities elicited using different methods;
- patients in the model have severe SQ3 fractures and their utility with a further fracture would be expected to be lower than the hip fracture utility weight.
Reviewer’s opinion
It is inappropriate to compare mean utility weights using different utility measurement instruments without some attempt to transform them into a common index to ensure comparability. Whilst most multiattribute utility instruments, such as the AQoL, HUI3, EQ-5D, and direct health state description valuation methods such the SG, TTO PTO all satisfy the conditions for the derivation of a QALY weight, the underlying theory and methodological assumptions behind each in terms of deriving ‘utility’ weights are slightly different. The usefulness of utility theory is to ensure that valuations across people, time and place are comparable. As the underlying theory for the different instruments listed above are different, a direct comparison is not possible. Therefore, the comparison to the Salkeld et al hip fracture utility weight of 0.31 is not relevant to the submission(s).
To highlight the impact of different approaches to utility valuation, the Harvard University-based Cost Effectiveness Registry has hundreds of published preference weights (they do not use the word ‘utility’), classified by disease group and health state. A summary of preference weights for various vertebral fracture health states are presented in Table 6 (pages 44).
Top of page
Table 6: Other published preference weights for vertebral fracture for the period 1998 - 2001
Health state | Preference weight | Method | Valuer | Author |
1st year following vertebral fracture in post menopausal women | 0.64 | Standard Gamble | 42 women with osteoporosis | Coyle D et al 7 2001 |
Year with vertebral fracture | 0.9 | Clinician judgment | Clinician | Willis M et al 8 2001 |
Vertebral fracture 1st year | 0.704 | Unknown | Clinician | Armstrong K et al 9 2001 |
Vertebral fracture subsequent years | 0.858 | Unknown | Clinician | Armstrong K et al 9 2001 |
Source: >http://www.tufts-nemc.org/cearegistry/data/default.asp
In the absence of a standardized approach to QALY weight measurement, it is known that different utility measurement techniques will produce different weights for identical health states. Where comparability of QoL weights is important to resource allocation decisions, as it can be for PBAC, there is considerable merit in recommending a standardized approach to utility-based weights for QALY ratios.
Reviewers Summary for Matter 3
- The continued use of the utility weights used in the submission is not sufficiently justified. The base case ICER (cost per QALY) should be based on the utility valuation of separate health states (symptomatic versus non-symptomatic) and an appropriate transition probability of entering that state.
- It is impossible to assess whether the sponsor’s base case cost per QALY does allow for the proportion of symptomatic versus asymptomatic vertebral fractures because of a flaw in the design of the utility survey. The one-way sensitivity analysis adjustment for the disutility associated with asymptomatic fractures ($45,000-$75,0004 ICER replaced with ICER range, consistent with PBAC procedures for PSD for the one-third disutility and $75,000-$105,0005 As abovefor zero disutility) are plausible.
- The AQoL derived utility weights for vertebral fracture health states are not directly comparable to the utility weights associated with a hip facture and which are derived using a different utility measurement technique.
4. ICER replaced with ICER range, consistent with PBAC procedures for PSD
5. As above
Recommendations
That the PBAC make explicit in the guidelines, the steps required to identify, measure and value QoL outcomes for inclusion in a QALY-based cost utility analysis. To a large extent this has been achieved in the PBAC Guidelines Draft for Consultation July 2006. The draft Guidelines 2006 requires a more detailed justification for the choice of utility measurement approach – a MAUI or Scenario – Based approach and the criteria for justifying the selection of a particular instrument.Top of page
Table 7 Sensitivity analyses in the modelled evaluation
Change from baseline | Incremental cost | Incremental QALYs | Incremental cost per QALY gained |
Vertebral fracture relative risk d. Teriparatide upper confidence interval, alendronate lower confidence interval | <$15,000
(ICER replaced with range consistent with PBAC procedures for PSD) | -0.075 | Dominated |
11. Comparator is placebo | 0.533 | <$15,000 (ICER replaced with range consistent with PBAC procedures for PSD) | |
1. Vertebral fracture relative risk c. Teriparatide lower confidence interval, alendronate upper confidence interval | 0.485 | ||
6.Treatment benefit duration b. Remains constant to 10 years | 0.314 | $15,000-$45,000 (ICER replaced with range consistent with PBAC procedures for PSD) | |
7. Vertebral fracture utilities b. Decreased by 10% | 0.306 | ||
8. Costs b. Vertebral fracture acute cost increased by 50% | 0.264 | ||
12. Non-vertebral fractures included | 0.276 | ||
1. Vertebral fracture relative risk a. Teriparatide lower confidence interval, alendronate lower confidence interval | 0.295 | ||
1. Vertebral fracture relative risk f. Antiresorptive relative risk from MORE SQ3 subgroup analysis (0.74) | 0.277 | ||
9. Discounting a. Costs and outcomes undiscounted | 0.287 | ||
8. Costs d. Residential care cost increased by 50% | 0.264 | ||
Baseline | 0.264 | ||
7. Vertebral fracture utilities e. Residential care disutility excluded | 0.263 | ||
8. Costs e. Residential care cost decreased by 50% | 0.264 | ||
3. Relative risk of mortality a. Relative risk of mortality after vertebral fracture = 1.66 | 0.260 | ||
10. Two vertebral fractures at baseline | 0.259 | ||
4. No increased risk of residential care after vertebral fracture | 0.263 | ||
3. Relative risk of mortality b. No increased mortality after vertebral fracture | 0.247 | ||
9. Discounting b. Costs and outcomes discounted at 10% per annum | 0.244 | ||
5. Starting age in model b. Starting age is 75 years | 0.240 | ||
8. Costs c. Vertebral fracture acute cost decreased by 50% | 0.264 | ||
7. Vertebral fracture utilities a. Increased by 10% | 0.222 | ||
5. Starting age in model a. Starting age is 65 years | 0.215 | ||
8. Costs a. All non-drug costs excluded | 0.264 | ||
6. Treatment benefit duration a. Expires at 5 years | 0.201 | ||
7. Vertebral fracture utilities c. Adjusted so 70% have 1/3 disutility | 0.154 | $45,000-$75,000 (ICER replaced with range consistent with PBAC procedures for PSD) | |
2. Lower increase in vertebral fracture risk after fracture (2.2) | 0.160 | ||
1. Vertebral fracture relative risk e. Relative risk for teriparatide uses entire OP patient sample (0.35) | 0.140 | ||
1. Vertebral fracture relative risk b. Teriparatide upper confidence interval, alendronate upper confidence interval | 0.116 | ||
7. Vertebral fracture utilities d. Adjusted so 70% have no disutility (zero disutillity if asymptomatic) | 0.097 | $75,000-$105,000 (ICER replaced with range consistent with PBAC procedures for PSD) | |
Two-way sensitivity analysis Teriparatide upper confidence interval for vertebral fracture relative risk, and vertebral fracture utilities adjusted so 70% of fractures have no disutility | 0.006 | >$200,000 (ICER replaced with range consistent with PBAC procedures for PSD) |
Source: Response to the Pre-PBAC consultation – Re-submission for consideration at the March 2006 Meeting of PBAC.
Document download
This publication is available as a downloadable document.