Abstract
The analytical evaluation threshold (AET) establishes which chromatographic peaks, produced during organic extractables/leachables (E&L) screening, require toxicological safety risk assessment because the peaks are associated with compounds of potentially unacceptable toxicity. Thus, the AET protects patient safety as its proper application ensures that all potentially unsafe E&L are necessarily assessed. Generally, application of the AET involves the presumption that all organic E&L have the same detector response factor, an assumption that is not valid for any of the detection methods commonly used in E&L screening. Thus, the AET’s ability to be protective is compromised for poorly responding compounds, as they will appear to be below the AET when in fact they are not. This unacceptable outcome is addressed by adjusting the AET with an uncertainty factor (UF) whose value is dictated by the magnitude of response factor variation, with a larger variation resulting in a larger UF and a lower adjusted AET. Although the concept of the UF is straightforward, setting a generally accepted, scientifically valid, and practical value for the UF has been challenging. In this article, a database of relative response factors obtained for nearly 1200 E&L via the most commonly applied chromatographic screening methods (gas chromatography/mass spectrometry [GC/MS], liquid chromatography/mass spectrometry with atmospheric pressure chemical ionization [LC/MS-APCI], and LC/MS with electrospray ionization [LC/MS-ESI]) is used to justify UFs for these methods, individually and as a combined practice, based on the practical principle of “the point of diminishing returns”. Using this concept results in nearly 92% of the compounds in the database being properly flagged as above an AET adjusted with a UF = 3. Ninety-five percent (95%) coverage of the compounds can be achieved when a UF of 4 is applied to the combination of GC/MS and LC/MS methods or with other combinations of UF values applied to the various methods individually. Coverage is increased to 97% when a UF of 4 is individually applied to the GC/MS method and a UF of 10 is individually applied to the LC/MS methods. Furthermore, the available data suggest that application of both APCI and ESI ionization in LC/MS screening (as opposed to either method separately) provides the greatest coverage of E&L.
- Organic extractables and leachables
- Analytical evaluation threshold
- Uncertainty factor adjustment
- AET
- Screening
Introduction
When pharmaceutical drug products are packaged in a container closure system (CCS), substances (leachables) present initially in the CCS may leach into the packaged drug products. Similarly, when a medical device is used in its intended clinical manner, substances present in the medical device can leach into the patient via the medium of contact between the medical device and the patient’s body. Corresponding to leachables are extractables, which are substances that are present in laboratory extracts of the CCS or medical device and which therefore are potential leachables.
Leachables are clinically significant to the extent that they could produce an adverse safety effect in a patient who is exposed to them during the course of a medical therapy or procedure. Thus, drug products are chromatographically screened for organic leachables, packaging system extracts are screened for organic extractables, and medical devices are either screened for organic leachables or extracts of the devices are screened for organic extractables, where the purpose of the chromatographic screening is to discover, identify, and quantify these organic substances.
In certain experimental situations (such as aggressive or exaggerated extraction of a chemically complex CCS or medical device), chromatographic profiles of extractables or leachables can be complex, consisting of a multitude of chromatographic peaks. In such a situation, it may become analytically challenging to accomplish the necessary tasks of identification and quantitation for all the chromatographic peaks. Furthermore, addressing all chromatographic peaks may not be required if it can be established that leachables below a certain level are unlikely to adversely affect patient health and safety, regardless of their identity and underlying toxicity.
To this end, the concept of the analytical evaluation threshold (AET) was developed by an Extractables and Leachables Working Group of the Product Quality Research Institute (PQRI)(1) to facilitate the toxicological safety assessment of extractables and leachables. The AET establishes that level at or above which organic leachables in drug products or medical devices must be reported for toxicological safety risk assessment. In essence, the AET is a “protective” threshold in the sense that compounds at or above the AET are required to be toxicologically assessed, thereby protecting patient safety.
Routine application of the AET is predicated on an important underlying presumption, that the response factor (RF, see eq 1) for every organic leachable (extractable) and calibrating standard candidate (internal or surrogate) is the same; that is, equal concentrations of leachables (extractables) and calibration standards produce a response with equal magnitudes. Unfortunately, this assumption is not valid with most commonly employed chromatographic detectors, and RFs can vary, sometimes quite substantially, from compound to compound.
(1)
This RF variation creates a significant problem for poorly responding leachables (i.e., leachables with low RFs), as the concentration-based relationship between such low responding leachables and the AET will be distorted. Thus, a poorly responding leachable, present in a sample at the AET level, will produce a chromatographic peak that appears to be below the AET. In this circumstance, a leachable that should be safety assessed (because it is at or above the AET level) will not be submitted for safety assessment (because its response appears to be below the AET), thereby compromising the AET’s ability to be protective.
The response equivalent to the AET can be established by using an internal standard (IS), also known as a surrogate standard, which is spiked into samples at a known concentration. The peak produced by the internal standard is used to establish that response which is attributable to an analyte present in the sample at the AET level. Thus, it is useful to think in terms of relative response to the IS and a relative response factor (RRF), where the RRF for an analyte relative to an internal standard is shown in eq 2.
(2)
where:
R = response
C = concentration
A = analyte
I = Internal standard
The RRF value for the IS will be 1; compounds that respond more poorly than the internal standard will have a RRF <1, whereas compounds that respond more strongly than the IS will have a RRF >1.
The response variation complication in the use of the AET can be mitigated somewhat by adjusting the AET for RF differences between compounds. That is, an AET that is adjusted lower via an uncertainty factor (UF) will account for certain of the poorly responding compounds (eq 3).
(3)where:
AETf = final (adjusted) AET
AETe = initial (estimated, unadjusted) AET
UF = the uncertainty factor
Clearly, the larger the value of the UF (and the lower the value of the final AET), the greater the number of poorly responding compounds that will be properly accounted for as having a response greater than or equal to the AET.
The original PQRI recommendation for the AET was that the final (adjusted) AET be an adjustment of an initial (or estimated) AET, where that adjustment accounts for analytical uncertainty (RF variation between analytes). The specific AET recommendation was “The Working Group proposes and recommends that analytical uncertainty in the Estimated AET be defined as one (1) %Relative Standard Deviation in an appropriately constituted and acquired Response Factor database OR a factor of 50% of the Estimated AET, whichever is greater” (1).
The validity of this recommended approach to AET adjustment, the proper means of calculating the UF, appropriate values for the UF, and the means for managing large UFs all have been the subject of numerous publications (2⇓⇓⇓–6) and considerable debate. The UF topic is debated because of the potentially conflicting impacts of AET adjustment. On one hand, as noted previously, the larger the AET adjustment (and thus the larger the UF), the lower the adjusted AET and the greater the number of poorly responding compounds that will be flagged by the adjusted AET. This positive benefit of a larger UF is that the adjusted AET is more protective than is its unadjusted counterpart. The negative drawbacks of this outcome are the practical difficulties associated with either analytically achieving the adjusted AET (screening methods may not have sufficient sensitivity to achieve the adjusted AET and/or analytical steps necessary to achieve the requisite sensitivity may compromise the sample) and/or securing the necessary robust data (concentration and identity) necessary to toxicologically assess low level, less commonly encountered, and poorly studied leachables. Furthermore, as the AET is lowered due to the larger UF, likely more peaks (compounds) are flagged for reporting (as being above the AET), thus increasing the amount of required toxicological review. Lastly, there will be false positives; compounds that are truly below the AET but which appear to be above the AET because their peaks have been inappropriately enlarged. Applying the logic of risk management to this juxtaposition of positive and negative outcomes suggests that a recommended UF, or a recommended process for establishing the UF, must balance the competing realities of protection versus practical application.
The purpose of this article is to establish UF values for the chromatographic screening process most typically applied to extractables and leachables testing (a combination of gas and liquid chromatography with mass spectrometric detection [GC/MS and LC/MS]), using a robust database of extractables and leachables RFs and the concept of “optimization to the point of diminishing returns”.
The Concept of Coverage
To begin this discussion, it is proper to define the terms coverage and % coverage. In the context of the AET, coverage is defined as the number of compounds that are flagged to be at or above the AET when they are present in a sample at the AET level. Clearly, compounds with a response equal to or greater than the IS used to establish the AET will be flagged as being above the AET without the AET being adjusted by the UF. However, compounds with responses less than the IS that sets the AET will not be flagged as being above the AET. These lower-responding compounds that are not flagged will not be reported for safety risk assessment, even though it is proper that they would be reported and safety assessed. This under-reporting of compounds is mitigated by UF adjustment of the AET to lower values as noted previously in eq 3; the larger the UF, the smaller the adjusted AET and the greater the number of compounds that are properly flagged as having a concentration equal to or greater than that of the IS (i.e., greater coverage).
The term % coverage is simply the ratio of the number compounds in a population that are covered versus the total number of compounds in the population, expressed as a percentage.
Clearly, the objective of AET adjustment is to establish that value of the UF that produces an acceptable level of coverage without complicating the analytical process by producing a final AET that is so low as to be essentially unachievable.
Illustrating the Concept of “Optimizing to the Point of Diminishing Returns
The concept of “optimizing to the point of diminishing returns” will be illustrated by an example relevant to extractables testing, which is establishing the optimum extraction duration. For example, consider an extraction that by design is intended to be taken to completion; that is, the point at which equilibrium between the extracted item and the extracting solution is achieved (so-called asymptotic extraction), illustrated in Figure 1. One notes that the concentration of the extracted substances increases rapidly during the early duration of extraction and more slowly as equilibrium is approached. If one examines Figure 1 to answer the question “what is the proper extraction duration?”, it is noted that there are two possible answers. The most rigorous answer is that the proper extraction duration is a duration at or after equilibrium has been definitively established and extraction is conclusively complete, which, in Figure 1, is 120 h.
Extraction profile, extracted concentration versus extraction duration. It is clear that the extraction has reached equilibrium (asymptotically reached its maximum value) after 120 h of extraction. However, the extraction is very nearly complete after 72 h, and the incremental increase in the extractable’s concentration between 72 and 120 h is small. Although rigorously speaking the extraction should proceed for 120 h to achieve full extraction, practically speaking the “point of diminishing returns” has been achieved at 72 h, and ending the extraction at 72 h would only marginally underestimate the extractable’s maximum extracted amount while cutting the required extraction time in half.
However, a more practical answer would be that the proper extraction duration is the duration at which continued further extraction produces a minimal further increase in the extractable’s concentration. Such a “point of diminishing returns” duration, approximately 72 h as noted in Figure 1, essentially reduces the recommended extraction duration to 60% of its full value without materially under-reporting the extractable’s highest (worst-case) concentration. To wit, the extractable’s concentration at the point of diminishing returns is approximately 93% of the maximum concentration at equilibrium.
This concept can be leveraged to establish a proper value of the UF as follows. Consider the situation in which RFs for an acceptably large number of potential organic extractables or leachables have been generated and collected in a database. Furthermore, suppose an RF for an IS has been obtained. If the response equivalent to AET is established via the IS, then the coverage of the analytical method can be established. All compounds with an RF equal to or greater than that of the IS will be covered; any compound with an RF less than that of the IS will not be covered. Thus, the % coverage with an unadjusted AET can be established.
The effect that adjusting the AET will have on the % coverage can likewise be established from the RF database. Clearly, the % coverage produced by applying a UF will increase as the value of the UF increases. A plot of % coverage versus UF can be used to determine that value of the UF that produces the desired or required coverage.
For example, consider a small database of RRF values for 30 extractables and one IS published by Mullis et al. (2) in 2008 to explain the RF uncertainty issue with respect to the AET (Table I). As the mean and median of this data set is <1.0 (mean = 0.643, median = 0.574), it is noted that, in general, the 30 extractables are more poorly responding than the chosen IS.
Relative Response Factor (RRF) for 30 Organic Extractables. Evaluation of Coverage of Compounds via the Adjustment of the AET (Corresponding to RRF = 1) Downwards with Different Uncertainty Factor (UF) Values. The Italic Entries Mean That the Compound is Covered by the Adjusted AET, the Non-italic Entries Mean That the Compound is Not Covered by the Adjusted AET. As the UF Increases, More Compounds Are Covered by the UF Adjustment. UF = 1 Data from Reference 2
If these individual extractables were present in an extract at a concentration equal to the AET, only three of the extractables would be properly flagged as being above the AET based on their response (column UF = 1 in Table I). Thus, the AET coverage at a UF = 1 is (3/30 × 100%) = 10%. As UFs are applied, the AET coverage increases until finally at a UF = 5, all the extractables in the database are covered. Thus, to get 100% coverage, an AET of 5 would have to be applied.
Figure 2 plots the percent of the extractables population that is covered as a function of UF; that is, what percentage of the compounds in the population would have responses equal to or greater than the AET when they are present in the sample at the AET concentration. As observed previously, as the UF increases, the % coverage increases. In fact, Figure 2 takes a shape that is reminiscent of Figure 1, the extraction profile. Thus, the concept of “point of diminishing returns” can be applied to the UF (Figure 2) in the same way that it was applied to the duration of extraction (Figure 1). In the case of the UF, the concept of “point of diminishing returns” corresponds to a UF = 3, where 93% of the compounds in the database are covered with a UF = 3. Thus, practically speaking, the Mullis database suggests that a UF value of 3 provides both appropriate and achievable coverage.
Plot of percent (%) coverage versus UF value for a database of GC/MS relative response factors (RRFs) for 30 organic extractables. The organic extractables present in the database are characterized by poorer responses relative to the internal standard, as evidenced by the low % coverage for UF = 1. As the value of the UF increases, more compounds are covered, until at UF = 5 complete coverage is obtained. However, a point of diminishing returns is achieved with a UF =3, and it is proposed that this is a proper and practically achievable value for the UF (2). GC/MS, gas chromatography/mass spectrometry; RRF, relative response factor; UF, uncertainty factor.
Both Table I and Figure 2 illustrate the effect of IS choice on coverage. For this dataset, the analytes generally respond more poorly than the IS, and coverage is lower because of this. Had an IS been chosen whose response was equal to the population mean (or median), smaller values of UF would have produced a greater % coverage, as is shown in Figure 3. This outcome reinforces observations made by several authors with respect to careful and proper choice of the IS (4, 7).
Plot of percent coverage versus UF value for a database of GC/MS relative response factors (RRF) for 30 organic extractables, based on an internal standard having the same response as the population mean. Use of an internal standard whose response is closer to the response of the compounds in the analytical dataset significantly reduces the value of UF required to get acceptable coverage. Compared with Figure 2, the UFs required for diminishing returns and complete coverage are significantly reduced due to proper selection of the internal standard. GC/MS, gas chromatography/mass spectrometry; RRF, relative response factor; UF, uncertainty factor.
Although this analysis is compelling, its shortcomings include the relatively small size of the database, the circumstance that there are few very poorly or strongly responding compounds in the database, and the circumstance that the database is for only one of the methods (GC/MS) typically applied to extractables and leachables screening. Thus, this concept is revisited with a more comprehensive database as follows.
The Nelson Response Factor Database
Since 2006, Nelson Labs has been developing an internal database that contains 6000 entries of key analytical information for extractable compounds that were detected as—and are—extractables or leachables. This analytical information, acquired across Nelson’s orthogonal screening methodologies, was obtained through the analysis of authentic standards that were either purchased from well-known chemical suppliers, externally synthesized, internally synthesized or internally isolated, purified, and internally qualified. For GC/MS specifically, single-compound working standards were prepared for the majority of the compounds to contain 20–50 mg/L of the analyte and 10 mg/L of 2-fluorobiphenyl as IS, using dichloromethane as the standard’s vehicle. Single injections of the working standards were made on multiple comparable analytical systems with a Retention Time Locked method using the surrogate compound to lock the retention time. For LC/MS with atmospheric pressure chemical ionization (LC/MS-APCI) specifically, single-compound working standards were prepared for the majority of the compounds to contain 10 mg/L of the analyte and 1.0 mg/L of Tinuvin 327 as IS, using a 50/50 v/v mixture of methanol and dichloromethane as the standard’s vehicle. Single injections of the working standards were made on multiple comparable LC-Orbitrap systems. For LC/MS with electrospray ionization (LC/MS-ESI) specifically, single-compound working standards were prepared for the majority of the compounds to contain 10 mg/L of the analyte and 1.0 mg/L of caffeine-(trimethyl-13C3) as IS in positive mode and bis(2-ethylhexyl) phthalate-3,4,5,6-D4 as IS in negative mode, using a 50/50 v/v mixture of methanol and water as the standard’s vehicle. Single injections of the working standards were made on multiple comparable LC-Orbitrap systems.
Although the entire Nelson database would be an appropriate foundation to consider the setting of the UF for individual analytical screening methods, it is not an appropriate basis for considering UF for a combined analytical strategy consisting of multiple methods for the simple reason that it was not populated in a way that produced an RRF value for each member of the database for each analytical method. For example, the database could contain a compound whose RRF was experimentally established by one method (e.g., GC/MS) but not for other methods (e.g., LC/MS). Thus, a smaller subset of the complete database, consisting of 1193 compounds, was excised from the larger Nelson database. This dataset comprises compounds whose response in three screening methods, GC/MS, LC/MS-APCI, and LC/MS-ESI has been established, which is not the case for the entire database. All 1193 compounds produced a response in at least one of the targeted methods, GC/MS, LC/MS-APCI, and LC/MS-ESI; however, the individual methods did not necessarily produce a response for each compound in the dataset. Although a majority of the compounds (91.5%) produced a response in GC/MS, many fewer compounds produced responses in LC/MS-APCI (47.3%) and LC/MS-ESI (30.5%).
Before proceeding further, the following discussion describes the Nelson screening strategy as consisting of three methods, GC/MS, LC/MS-APCI, and LC/MS-ESI. One notes that this strategy could be more rigorously described as consisting of five methods, as both APCI and ESI are accomplished in both the + ion and − ion mode. However, Nelson’s use of Orbitrap technology allows it to switch the polarities within a chromatographic run and thus, for example, APCI + ion and APCI − ion is captured as merely APCI in this article.
Although the Nelson dataset itself is not the focus of this article, it is noteworthy nevertheless to examine the RRF values and their frequency distributions for the three analytical methods. Statistical figures of merit are summarized in Table II; Figures 4–6 illustrate the distribution of the RRF values. For the GC/MS method, the distribution is approximately normal, centered at a value consistent with the choice of the IS and with a bias toward low or no response. The percent relative standard deviation (%RSD) of 66.8% for the entire dataset is largely a reflection of the bias toward lower RRF values, strongly influenced in part by the number of compounds that produce no response. When the smaller population of compounds that produce a response is considered, the effect of excluding the nonresponding compounds is clear as the mean increases, the %RSD decreases, and the mean and the median are more closely aligned. However, there is still a noticeable bias at the low end of the RRF distribution, resulting from the low-responding members of the dataset. This circumstance is related in part to the means employed to obtain the RRF values, which was to increase the injected concentration of the analyte so that a response could be obtained. Such an approach allows RRF values to be reported that are lower than what can reasonably be obtained using compounds at a single concentration; for example, the lowest RRF reported for an analyte by GC/MS was 0.001, an analyte whose response is one-one thousandth the response of the IS. Had Nelson taken a different, less sensitive, approach (e.g., allowing a lowest RRF of only 0.01), more analytes would have been reported as not detected, and fewer compounds would have been reported as poorly detected. Such an approach would produce a total population distribution that is even more noticeably skewed toward no response and a responding compound population that is less affected by low responders.
Statistical Data for the Nelson RRF Dataset; All Compounds in the Dataset Whose Analytical Response Was Addressed in All Three Methods
Distribution of GC/MS RRF values for compounds in the Nelson Dataset. (A) Distribution for all compounds in the data set; (B) distribution for all compounds that produced a detectable GC/MS response. The distribution of RRF values is approximately normal, with a bias toward lower RRF values. GC/MS, gas chromatography/mass spectrometry; RRF, relative response factor.
Distribution of LC/MS (APCI) RRF values for compounds in the Nelson Dataset. (A) Distribution for all compounds in the data set; (B) distribution for all compounds that produced a detectable GC/MS response. LC/MS (APCI) A vast majority of the compounds in the dataset produce either no or low responses. When the compounds that did not produce a response are removed from the dataset, the distribution of RRF values is still skewed toward lower responses. Nevertheless, there is a discernible group of compounds that produce larger responses. APCI, atmospheric pressure chemical ionization; GC/MS, gas chromatography/mass spectrometry; LC/MS, liquid chromatography/mass spectrometry; RRF, relative response factor.
Distribution of LC/MS (ESI) RRF values for compounds in the Nelson Dataset. (A) Distribution for all compounds in the data set; (B) distribution for all compounds that produced a detectable GC/MS response. A vast majority of the compounds in the dataset produce either no or low responses. When the compounds that did not produce a response are removed from the dataset, the distribution of RRF values is still skewed toward lower responses. Nevertheless, there is a discernible group of compounds that produce larger responses. ESI, electrospray ionization; LC/MS, liquid chromatography/mass spectrometry; RRF, relative response factor.
The presence of low-responding compounds in the Nelson database is to be expected given the dual purpose of the database. Although the use of the database in a quantitative sense (RRF) is considered in this article, the database also contains information that is useful in establishing a compound’s identity; for example, retention time and mass spectrum. For poorer responding compounds, use of a higher concentration reference standard not only allows the compound to be captured in the database but also facilitates the generation of high-quality MS spectra, thereby enabling identification via mass spectral matching.
The two LC/MS detection methods, APCI and ESI, produce RRF distributions that are anything but normal (Figures 5 and 6). Firstly, and clearly, the %RSD for the LC/MS datasets are large and well over 100%, complicating the calculation of the UF considerably. Additionally, the distributions are even more biased toward low responders; for the entire dataset, the most common outcome for both APCI and ESI was “not detected” (median RRF = 0), followed by the lowest possible response (<0.01). Moreover, the distribution is not normal at the higher RRF values; in fact, the distribution most likely suggests that RRF is evenly distributed in the range from RRF = 0.5 to RRF = 2.0. Lastly, in both APCI and ESI, there is a small cluster of compounds with RRF values >2 (very strongly responding compounds).
Chromatographically screening samples for organic extractables or leachables is a strategy that leverages the combined capabilities of the three individual methods. Use of such a strategy and not just a single screening method is advocated in relevant standards for packaging (8), manufacturing components (9), and medical devices (10). It is intuitive that a combination of methods would be more effective in covering a dataset than a single individual method. In order for a single method to provide coverage, that single method must produce an acceptable response for all compounds. However, in the case of multiple methods, only one of the methods must produce an acceptable response for each analyte. Considering a set of three methods, a compound is covered if the response in any one method is adequate, even if the compound does not even produce a response in the other two methods.
To illustrate this concept, consider the small dataset shown in Table III, which consists of 25 compounds selected more or less arbitrary from the Nelson dataset. Of these 25 compounds, 20 produce a GC/MS response; thus, the GC/MS method covers 80% of our 25-member universe of extractables. Because the LC/MS-APCI and LC/MS-ESI methods produce responses for five compounds that were not detectable by GC/MS, the number of compounds covered by the combination of the three methods increases to the entire population and coverage becomes 100%. Thus, adoption of a strategy of multiple complementary and orthogonal methods increases the quantity of compounds covered and reduces the UF value required to achieve a certain level of coverage.
Twenty-Five Extractables Selected from the Nelson Dataset to Illustrate How Multiple Analytical Methods Improve Coverage by Increasing the Quantity of Compounds Covered and the Quality of the Coverage. As the First Five Compounds Listed Have No GC/MS Response but Produce a Response by One of the LC/MS Methods, the Combined Strategy of GC/MS and LC/MS Increases the % Coverage from 80% (GC/MS Alone) to 100%. Because the Next Three Listed Compounds Have LC/MS Responses That Are Better than the GC/MS Responses, the Combined Strategy of GC/MS and LC/MS Increases the Quality of Coverage (Lower %RSD and Thus Lower UF) versus GC/MS Alone
Furthermore, there are three compounds, two by LC/MS-APCI and one by LC/MS-ESI for whom the RRF by LC/MS is much higher (and closer to 1) than the RRF by GC/MS. Thus, the distribution of RRF values is improved (e.g., reduced %RSD) for a combination method dataset in which the poorer GC/MS responses are “replaced” by the better LC/MS responses. Thus, adoption of a strategy of multiple complementary and orthogonal methods improves the quantity of coverage and reduces the UF value required to achieve a certain level of coverage.
Whereas the performance capabilities of the individual methods are established by considering the RRF values specific to the individual methods, the performance capabilities of an analytical strategy consisting of multiple methods are established by considering only one of the individual method’s RRFs. Thus, the dataset of RRF values for the analytical strategy based on three methods has one RRF value for each compound taken as one of the three RRF values obtained by the three individual methods. For example, consider the case of an analyte that produces a response by all three methods, GC/MS, LC/MS-APCI, and LC/MS-ESI. The RRF dataset entry for this would be only one RRF value, where it is logical that the dataset entry would be the “best” RRF obtained by any of the three methods.
The challenge becomes establishing which RRF is “the best” for each analyte. Three possibilities were considered in this article:
The “best” RRF is the highest RRF obtained by any of the three methods,
The “best RRF” is the RRF that is closest to a value 1, based on relative value,
The “best RRF is the RRF that is closest to 1, based on the absolute value.
For example, consider the situation where an analyte produced a GC/MS RRF of 0.24, an LC/MS-APCI RRF of 0.66, and a LC/MS-ESI RRF of 1.49. In this case, the best RRF using the maximum value is the LC/MS-ESI result of 1.49 and the best value based on absolute closeness to 1 is the LC/MS-APCI result of 0.66 (as 0.66 is 0.34 from 1 whereas 1.49 is 0.49 from 1). In terms of relative closeness to 1, the best RRF is the LC/MS-ESI value of 1.49 (as 0.65 is 1.53 times less than 1 whereas 1.49 is 1.49 time higher than 1).
The distributions of RRF values obtained by the three approaches for establishing the “best” RRF are shown in Figure 7, and the statistical figures of merit are summarized in Table IV. As one might expect, the frequency distributions look very similar to the distribution for GC/MS (Figure 4), as GC/MS was the method for which most compounds produced a response. However, the influence of the LC/MS methods on the frequency distribution for the “best” RRFs is clear, as the LC/MS RRFs “smooth out” or “fill-in” the non-normal characteristics of the GC/MS distribution. One obvious effect is that the “best” distributions contain no “not detected” compounds. More importantly, however, good-responding compounds by LC/MS are, to a certain extent, poor-responding compounds in GC/MS, and thus the frequency of low responses is reduced and the frequency of high responders is increased. Ultimately the frequency distributions of the “best” approaches are more normal than are the distributions for any single approach.
Distributions of the best fit RRF values for compounds in the Nelson Dataset. (A) The distribution that represents the case in which the best fit is defined as the method that produced the maximum response. (B) The distribution that represents the case in which the best fit is defined as the method that produced the RRF closest to 1, measured as an absolute deviation. (C) The distribution that represents the case in which the best fit is defined as the method that produced the RRF closest to 1, measured as a percent. By nature of the best fit, all compounds produced a detectable response. In all three cases, the distribution is roughly normal (reflecting the influence of the GC/MS data). However, strongly responding compounds by either APCI or ESI populate the higher RRF windows, producing a more even distribution than for GC/MS alone (compare with Figure 3). APCI, atmospheric pressure chemical ionization; ESI, electrospray ionization; GC/MS, gas chromatography/mass spectrometry; RRF, relative response factor.
Statistical Data for the Nelson Dataset; Approaches Used to Establish the “Best” RRF for a Given Compound
This concept of “LC/MS makes up for the deficiencies of GC/MS” and vice versa is the cornerstone of the generally accepted practice that proper screening for organic extractables uses both of these orthogonal but complementary methods. Although the frequency distributions illustrated in Figure 7 provide some evidence of the wisdom of this practice, a stronger proof is offered in Figure 8, which illustrates which method (GC/MS, LC/MS-APCI, and LC/MS-ESI) produced the best RRF in each RRF window. As one might expect, in many of the RRF windows, the method that produced the “best” response was predominately GC/MS. However, we see at very low response and especially at the higher responses, LC/MS becomes the more dominant source of the “best” RRF. Although GC/MS is responsible for >60% of the “best” RRF values in the RRF range from 0.1 to 1.0, LC/MS becomes a major contributor of “best” RRF at RRF < 0.1 and the major contributor at RRF > 1.0. The ability of the complementary method of LC/MS to “fill in the gaps” in the GC/MS method is clearly illustrated in these data.
Distribution illustrating which screening method produces the optimum RRF value within a given RRF window. For example, in the RRF window of 0.2–0.3, approximately 80% of the optimum RRFs in that window were produced by GC/MS. In general, GC/MS produces a majority of the optimum RRF values for RFF values between 0.1 and 1.1, whereas LC/MS (representing the combination of LC/MS-APCI and LC/MS-ESI) produces the majority of the RRF values for RRF windows >1.1, thereby illustrating the complementary nature of the GC/MS and LC/MS methodologies. APCI, atmospheric pressure chemical ionization; ESI, electrospray ionization; GC/MS, gas chromatography/mass spectrometry; LC/MS, liquid chromatography/mass spectrometry; RRF, relative response factor.
It is intuitively obvious that there is no sound basis for presuming that the approach of “picking the highest response” would provide the best outcome in terms of reducing response variation, and the data in Table III establish that although this approach “fills in the gaps” (provides a response for each and every analyte), it produces the largest response variation (highest %RSD). Thus, this approach is not recommended by these authors. However, it is less clear as to which of the “picking the RRF closest to 1” approaches would produce the better outcome, absolute versus relative. Based on the statistical data in Table III, the absolute approach is recommended based on its slightly less variation, reflected in a slightly lower %RSD.
Application of the RRF Data in Setting the UF Values
Figures 9–11 present the % coverage obtained for the compounds present in the Nelson dataset, individually and as a strategy consisting of all three methods. Figures 9 and 10 address the methods’ ability to cover all the compounds in the dataset, whereas Figure 11 addresses the methods’ ability to address only those compounds that produced a response by an individual method. Note that the coverage data have been obtained by using the desired circumstance that the IS is chosen based on its response being equal to the mean RRF of the dataset.
Plot of percent (%) coverage versus UF value for the Nelson Dataset, all compounds. None of the methods individually are able to provide 100% coverage of all compounds in the dataset as there are a number of compounds that do not produce a response using a particular method. However, as all analytes produce a response in one or more of the methods, the combined methodology of GC/MS + LC/MS-APCI + LC/MS-ESI approaches 100% coverage as the UF increases. As noted by the arrows, a UF = 10 is the point of diminishing returns for both LC/MS methods, as further increases in UF produce minor increases in % coverage. APCI, atmospheric pressure chemical ionization; ESI, electrospray ionization; GC/MS, gas chromatography/mass spectrometry; LC/MS, liquid chromatography/mass spectrometry; UF, uncertainty factor.
Plot of percent (%) coverage versus UF value for the Nelson dataset, all compounds, expansion of plot along the RRF axis. This graph is the same as Figure 9, with the RRF (x-) axis expanded to show greater detail. As noted by the arrows, a UF = 4 is the point of diminishing returns for the GC/MS method, and UF = 3 is the point for the combination of the GC/MS and LC/MS methods, as further increases in UF produce minor increases in % coverage. APCI, atmospheric pressure chemical ionization; ESI, electrospray ionization; GC/MS, gas chromatography/mass spectrometry; LC/MS, liquid chromatography/mass spectrometry; RRF, relative response factor; UF, uncertainty factor.
Plot of percent (%) coverage versus UF value for the Nelson dataset, responding compounds only. Although the absolute value of the % coverage is increased in this Figure (versus Figures 9 and 10) as it accounts solely for responding compounds, the UF conclusions drawn from this figure mirror those drawn from Figures 9 and 10. APCI, atmospheric pressure chemical ionization; ESI, electrospray ionization; GC/MS, gas chromatography/mass spectrometry; LC/MS, liquid chromatography/mass spectrometry; UF, uncertainty factor. The arrows signify the UF values representing the "points of diminishing return) for both GC/MS and LC/MS.
The % coverage versus UF plots for the individual and combined methods for all compounds are shown in Figures 9 and 10, where Figure 10 is an expanded view of Figure 9, more effectively illustrating the lower UF region of the curves. Reflecting the concept of diminishing returns, one notes that the greatest gains in % coverage come at the lower UF values, and that eventually large increases in UF produce small gains in % coverage.
Based on the concept of diminishing returns, an acceptable UF value is established at that point in Figures 9–11 where further increasing the UF produces an incrementally small, and largely unrecognizable, increase in coverage. These points of diminishing return correspond to UF values of 4 and 10 for the GC and LC methods, respectively, and a value of 3 for the analytical approach that uses all three methods collaboratively.
In a certain sense, Figures 9 and 10 treat the individual methods overly critically, as these figures essentially ask the individual methods to cover compounds that do not even produce a response by that method. Perhaps a more relevant treatment of the individual methods is to ask them to cover those compounds that produce a response by that method. In the context of the AET, this is more relevant because the AET is applied to a peak that has been detected (i.e., the AET asks specifically if the detected peak’s response is greater than the AET response) and not to a peak that is undetected. As noted earlier, no UF is sufficiently large to achieve coverage for peaks that do not produce a response. Thus, Figure 11 provides a more relevant picture of AET coverage as it deals with only the detected compounds (i.e., compounds that produce a response).
Admittedly, the use of the concept of diminishing returns is a qualitative and somewhat subjective approach to establishing proper UF values. An alternate approach is to establish a minimum acceptable level of coverage and define an acceptable UF as that UF value that achieves this minimal acceptable level of coverage. Although such an approach appears to be more quantitative than the point of diminishing returns, this is the case only if the minimum acceptable level of coverage can be justified. Selection of a largely arbitrary minimum acceptable level is no more quantitative than the application of diminishing returns.
To address the concept of a minimal acceptable level, Table V lists various combinations of UF values applied to the various analytical methods and the resulting % coverage values. The UF strategy of applying a UF of 3 to all three analytical methods results in a coverage of 91.6%. A coverage of approximately 95% can be obtained by multiple UF combinations; two are highlighted in Table V (UF = 4 applied to all three methods or UF = 3 applied to GC/MS and UF = 5 applied to LC/MS-APCI and LC/MS-ESI). Use of a strategy that involves a UF = 4 for GC/MS and UF =10 for both LC/MS methods produces a 97.0% coverage.
Coverage Obtained at Selected Values of the Uncertainty Factor (UF)
Lastly, no UF, no matter how large, can cause an individual method to properly assess a nonresponding compound versus the AET. This lack of coverage of nonresponding compounds by an individual method is mitigated somewhat by use of multiple orthogonal but complementary methods, as the odds of a compound producing a detector response increase the more methods are employed. Although it is certainly possible that there are extractables or leachables out there that do not respond to any method in the described extractables and leachables screening strategy, and thus that no UF value will allow such compounds to be flagged, it surely is reasonable to suggest that such compounds (the unknown unknowns) are the rare exception rather than the rule.
The article to this point has established that use of a strategic approach to extractables and leachables screening involving multiple orthogonal and complementary chromatographic methods reduces the UF necessary to achieve a certain level of coverage, an outcome consistent with the concept of the “multidetector” approach to extractables and leachables screening (11). Although adequate reduction of the UF is accomplished using the combination of methods discussed herein (GC/MS, LC/MS-APCI, and LC/MS-ESI), it is noted that most testing laboratories include GC/MS with headspace sampling (HS-GC/MS) as a part of a complete screening strategy. It is quite certain that augmenting the analysis performed in this article with HS-GC/MS RRF data would further reduce the UF necessary to achieve adequate coverage (and increase the % coverage achieved at any given value of UF), as the HS-GC/MS methodology is particularly applicable to the volatile analytes that are not well-covered by either GC/MS or LC/MS. For example, HS-GC/MS increases % coverage by approximately 1.6% when a UF = 3 is applied. Similarly, addition of derivatization as an “add-on” to GC/MS for the strategic and comprehensive approach, as recently advocated by Zdravkovic (12), would have the same effect, reducing the UF required to achieve a certain level of coverage and increasing the % coverage at a certain UF value by filling in the GC/MS gaps of poorly responding compounds (acids, alcohols, and so forth).
Lastly, the available data provide insight into the proper ionization strategy for screening extracts and drug products by LC/MS. Two ionization modes can be employed, APCI and ESI, and there is some current debate as to the proper use of these methods in extractables and leachables screening. Without considering the subtilties of a specific ionization method being more or less appropriate for specific extraction solvents or drug products, there are >580 compounds in the Nelson dataset that have either an APCI RRF, an ESI RRF, or both. These data can be used to address the question of the applicability of these ionization techniques from a general perspective.
Of the compounds in this smaller data set, approximately 4% produce an ESI response but no APCI response. Thus, these compounds would be missed in a screening strategy that consisted of only LC/MS-APCI. A larger proportion, nearly 38%, produce an APCI response but no ESI response and thus would be missed in a screening strategy that consisted of LC/MS-ESI only. It is noted, however, that some of these “missed” compounds, by APCI or ESI, would be captured by GC/MS.
Earlier in this article, the aspect of “which method produces the best response?” was addressed in two ways, with the “best method” being defined as either the method that produces the largest RRF and/or the method that produces the RRF closest to 1. In this regard, neither ionization method has a clear advantage, as there is essentially a 50/50 split between APCI and ESI by both measures. That is, for the compounds that produce both an ESI and APCI response, ESI produces the “best response” 50% of the time and the other 50% of the time APCI produces the “best response”, regardless of how “best response” is defined.
It is not expected that this largely qualitative discussion lays to rest the debate about “APCI, ESI, or both?”. However, in these authors’ opinion, this discussion provides compelling evidence that strongly supports the contention that the LC/MS screening strategy that provides the greatest coverage for organic extractables and leachables is a strategy that employs both APCI and ESI ionization (both using + and − ion modes). Nevertheless, it is acknowledged that so doing increases the analytical effort, as APCI and ESI cannot be performed contemporarily (meaning that ESI and APCI analyses are performed separately).
Conclusion
Extracts are screened for organic extractables and drug products and medical devices are generally screened for organic leachables by an analytical strategy that includes three individual methods, GC/MS, LC/MS-APCI, and/or LC/MS-ESI. The suitability of this combination of methods for extractables and leachables screening is predicated on their ability to flag compounds that are present in the tested samples at levels at or above the AET, which represents a toxicological reporting threshold. In order to account for the circumstance that the chromatographic responses to extractables and leachables vary from compound to compound, the AET is adjusted downward via application of a UF. Using a database of RRF values for a large population of known extractables and leachables, it has been established that a UF = 3 is that value of the UF above which large increases in UF produce incrementally small increases in coverage (hence the term point of diminishing returns). Application of UF = 3 to each analytical method will cover nearly 92% of the compounds in the database, assuring that they will be flagged as being at or above the AET, if they are present in the test samples at levels equal to or greater than the AET. Coverages of approximately 95% can be obtained via several UF approaches, specifically application of a UF = 4 to all the analytical methods or application of a UF = 3 to GC/MS and a UF of 5 to the LC/MS methods (APCI and ESI).
It is the authors’ belief that these generalizations will be applicable to all testing laboratories whose analytical screening strategy includes GC/MS, LC/MS-APCI, and LC/MS-ESI, regardless of their specific instrumentation and operating conditions. However, clearly laboratories whose screening strategies differ could obtain general UF values from their RRF database that are different from those listed here. For example, if a laboratory implements GC/MS with and without derivatization as part of their screening strategy, then their UF values could differ from, and likely would be lower than, those listed herein. This is the case as presumably derivatization would improve the response of poorly responding compounds (underivatized), replacing low underivatized RRF values in the database with higher derivatized RRF values. This assertion merely echoes the generalization made previously, that the greater the number of complementary analytical methods used in the screening strategy, the lower the value of UF that is necessary to produce a desired coverage percentage.
Lastly, it is further concluded that application of both APCI and ESI ionization in LC/MS screening (as opposed to either method separately) provides the greatest coverage of extractables/leachables.
Conflict of Interest Declaration
The authors declare that they have no competing or conflicting interests, noting their relationship with Nelson Labs, a provider of extractables and leachables testing and consulting services.
- © PDA, Inc. 2022