Abstract
In the present work we compared different calculation approaches for their ability to accurately define microbiological control levels based on historical data. To that end, real microbiological data were used for simulation experiments. The results of our study confirmed that assuming a normal distribution is not appropriate for that purpose. In addition, assumption of a Poisson distribution generally underestimated the control level, and the predictive power for future values was highly insufficient. The non-parametric Excel percentile strongly predicted future values in our simulation experiments (although not as good as some of the parametric models). With the limited amount of data used in the simulations, the calculated control levels for the upper percentiles were on average higher and more variable compared to the parametric models. This was due to the fact that the largest observed value was generally defined as the control level. Accordingly, the Excel percentile is less robust towards outliers and requires more data to accurately define control levels as compared to parametric models. The negative binomial as well as the zero-inflated negative binomial distribution, both parametric models, had good predictive power for future values. Nonetheless, on basis of our simulation experiments, we saw no evidence to generally prefer the zero-inflated model over the non-inflated one. Finally, with our data, the gamma distribution on average had at least as good predictive power as the negative binomial distribution and zero-inflated negative binomial distribution for percentiles ≥98%, indicating that it may represent a viable option for calculating microbiological control levels at high percentiles. Presumably, this was based on the fact that the gamma distribution fitted the upper end of the distribution better than other models. Since in general microbiological control levels would be based on the upper percentiles, microbiologists may exclusively rely on the gamma distribution for calculation of their control levels. As the gamma distribution can conveniently be calculated in standard office calculation software, it may represent a superior alternative to the widely used percentile functions or other distribution models.
LAY ABSTRACT: During the manufacturing of pharmaceutical drug products, the counts of microorganisms are monitored in the cleanroom environment, water, the product's raw materials, and the final product. This enables manufacturers to ensure that high numbers of microorganisms that may impair the product's microbiological quality are detected before the product is released to the patient. Microbiological control levels must be set to determine at which number a count is considered too high. Exceeding such levels may require an investigation to determine the root cause explaining why such high numbers of microorganisms occurred, and a set of actions should be performed with the aim of eliminating this root cause. In order to really differentiate higher-than-usual counts, microbiological control levels should be based on historical data. In the present work we analyzed different calculation approaches towards that purpose. We used real microbiological data and performed simulation experiments to determine which statistical method could calculate the most realistic control levels that would provide the best prediction for future routine testing. Better predictions would ensure that only significant contaminations lead to an excursion of the microbiological control level, which would avoid wasting resources by investigating non-issues or normal/controlled conditions.
Introduction
Microbiological environmental monitoring (EM) of cleanroom areas and utilities evaluates if manufacturing processes, environments and products remain under adequate and controlled hygienic conditions. Microbiological testing of final drug product or raw materials verifies that the microbiological quality attributes of those products are met. For each of these microbiological tests, the counts observed in the test are compared with a microbiological control level that may be called action level, alert level, expectation level, specification level, or similar. Generally for monitoring activities, such as EM, different levels may be defined that would lead to different actions when exceeded. For instance, it is common practice in EM to set an alert and an action level. Exceeded alert levels do not necessarily require action but have to be followed up and the sample sites have to be monitored more closely than usual. Consecutively exceeding the alert level two times may be considered as exceeding the action level. If the action level is exceeded, an investigation should be performed to determine the origin of contamination and assess the criticality of the excursion. Based on this assessment, appropriate corrective and preventive actions should then be implemented to mitigate the risk of contamination and reduce the level of microorganisms in the environment. If the investigation indicates a clear risk for the product or if the microbiological quality attributes are not met, then the manufactured product should not be released from a microbiological point of view.
Microbiological control levels may be defined in order to comply with the minimum microbiological requirements set in the health authority guidelines. For instance, the European Union (EU) good manufacturing practice (GMP) guideline Annex I (1) and FDA guidance for industry (2) recommend action levels for EM in aseptic manufacturing areas that are defined according to cleanroom grades. For testing of non-sterile pharmaceutical products, microbiological acceptance criteria based on the European Pharmacopoeia (Ph. Eur.) 5.1.4 (3) and United States Pharmacopeia (USP) <1111> (4) are commonly applied. These state that a low level of microorganisms is tolerated if these are not to be considered objectionable. The numbers of microorganisms accepted in the total aerobic microbial counts and total combined yeasts and moulds counts are based on the product's route of administration. For product raw materials such as drug substances and excipients, the microbiological acceptance criteria are either based on individual pharmacopoeia monographs, pharmacopoeia general chapters (3, 4), or are set as internal limits (5).
The recommended microbiological control levels in the guidelines or pharmacopoeias are fixed and do not take into account the uniqueness of each manufacturing line or product type. Therefore, an additional level based on historical data should be added to monitor if the microbial counts remain within an expected range. Counts that are unusually high compared to historical data represent a warning signal even if within guideline or pharmacopeial action levels or acceptance criteria. This warning should be further investigated because it may signal that either a drift from normal conditions is occurring or that part of the manufactured drug product or raw material may not meet the microbiological acceptance criteria. An unusually high microbial count (single event) should be determined by assessing microbiological control levels based on historical data. Counts that exceed a historically based threshold many times in a row over a certain time period (multiple events) may also indicate an adverse trend. Defining microbiological control levels based on historical data is also in line with ICH Q10 (6). ICH Q10 states that the use of adequate tools for measurement and analysis of quality attributes is essential to verify that the operations remain within a state of control.
The setting of control levels only by graphical analysis of historical data would be too dependent on the microbiologist's personality (e.g., conservative or not) and experience. Microbiological control levels calculated on basis of mathematical models, on the other hand, are less prone to subjective estimation. In the pharmaceutical industry a large variety of statistical tools exist to monitor process performance and define control limits that differentiate single deviating events from baseline variability. It is, however, difficult to use these tools with microbiological data because
Microbiological data are generally not normal distributed.
Microbiological data often contain many zeroes.
Data are highly dispersed and often skewed to the right.
In some cases only small amounts of data are available (e.g., when skip-lot testing frequency is applied).
Several concepts to calculate microbiological control levels based on historical data have been reported and proposed to use either non-parametric percentiles or apply normal, Poisson, or negative binomial distribution models (7⇓⇓–10). Recently, Yang et al. (10) have suggested a very detailed procedure for determining data distribution and selection of appropriate models for estimating microbiological control levels. For microbiological data that contain many zeroes and that are highly dispersed, the authors recommended the use of a zero-inflated negative binomial (ZINB) distribution. Simulation studies were performed with fictitious data to support their proposal.
The high variety of approaches and statistical models suggested for determination of microbiological control levels may pose a source of confusion. Therefore, we decided to put some of the propositions from the scientific literature to a practical test. To that end, we used real data originating from microbiological EM of cleanrooms used for the manufacture of sterile and non-sterile drug products in different pharmaceutical manufacturing sites in Europe. These data were either directly fitted with different statistical models or used in simulation experiments aiming to evaluate how well future values could be predicted.
The goal of the present study was to determine which mathematical models or alternative calculation procedures may be most suitable for determination of historically based microbiological control levels.
Methods
To calculate microbiological control levels based on historical data, we applied the different statistical concepts described below. For all these approaches, upper percentiles (e.g., 99th percentile) were calculated, with the exception of formula VI, for which the definition for the action level was treated as a 99th percentile and the alert level as a 95th percentile. The percentile is a measure used in statistics that indicates the value below which a given percentage of observations fall. For instance, the 99th percentile is the value below which 99% of the data points fall. It may either be estimated assuming the parameters of a certain distribution model (parametric approach), or solely based on the obtained results without assumption of a particular distribution model (non-parametric approach).
I. Normal Distribution
The normal, or Gaussian, distribution may be the most well-known continuous probability distribution (Figure 1). In the pharmaceutical industry, it is extensively used for statistical process evaluation and improvement (e.g., Six Sigma). Therefore, some also try to apply it to calculate microbiological alert or action levels. Generally, levels are calculated based on the arithmetic mean (x̄) and the standard deviation (SD) of the samples. With the normal distribution, the mean ± 2 times the SD includes 95.46% of the data, and the mean ± 3 times the SD includes 99.73% of the data. Therefore, the first is often proposed as alert level, whereas the latter is proposed as action level (7, 12). For our comparative calculations we worked at upper-level percentiles. For instance, assuming a normal distribution, a 99th percentile equals
where x̄ is the arithmetic mean and SD the standard deviation.
Illustrations of the different statistical models used in this study. For preparation of these exemplary graphs, a hypothetical data set with an arithmetic mean of 2, a SD of 2, and a zero proportion of 33% was used.
In theory, the normal distribution ranges from negative infinity to positive infinity. In the present application, however, one would only use it for zero or positive counts because microbiological count results cannot be negative by definition. The fraction of such “impossible” results expected on the basis of a normal distribution becomes larger as the SD/mean ratio increases. Accordingly, the continuous gamma distribution, which automatically is restricted to positive values, may be considered a preferable alternative and is discussed below in V. Gamma Distribution.
II. Poisson Distribution
The Poisson distribution, which was first described nearly two centuries ago (13), is a discrete probability distribution, meaning it gives probability for x = 0 counts, x = 1 count, x = 2 counts, up to infinity, where all discrete probabilities sum to one (Figure 1). It expresses the probability of a given number of events occurring in a fixed interval of time and/or space if these events occur with a known average rate (λ) and independently of the time since the last event. In our context, this “average rate” λ equals the arithmetic mean of the historical colony-forming unit (CFU) counts. In a Poisson distribution, λ also represents the variance. Therefore, the SD of the Poisson distribution is defined as the square root of the arithmetic mean. The Poisson distribution is a mathematical model often used in microbiology. For instance, it quite accurately models the occurrence of microorganisms in a fixed volume of a well-mixed microbial suspension. To test whether there is excess variability in count data that would not be consistent with a Poisson distribution, Fisher's dispersion test (14) may be used.
III. Negative Binomial Distribution (Also known as Gamma–Poisson Distribution)
The classic definition of the negative binomial distribution is the probability of successes in a sequence of Bernoulli trials before a specified number of failures occur (Figure 1). Like the Poisson distribution, the negative binomial distribution is discrete. However, in contrast to the Poisson distribution, it is also defined by a second parameter besides the mean. This additional parameter may be used for adjusting the variance for a given mean—which is not possible for the Poisson distribution. Therefore, the negative binomial distribution may adequately fit data for which the variance clearly exceeds the mean (which, in statistical terminology, would be called an overdispersion of the data with respect to a Poisson distribution). In principle, the negative binomial distribution is a mixture of the Poisson distribution and the gamma distribution. It can be thought of as a Poisson-discrete distribution, where its mean λ is not constant but follows a gamma distribution, hence causing more variability than a Poisson distribution with a constant mean rate of λ (15). If the estimated dispersion parameter is close to zero (indicating that the data is not overdispersed), the distribution is equivalent to the Poisson distribution.
IV. Zero-Inflated Negative Binomial (ZINB) Distribution
Yang et al. (10) currently propose that a zero-inflated negative binomial distribution may be a more suitable model to fit microbiological EM data than the “classical” negative binomial distribution (Figure 1). Zero-inflation of a model means the assumption that two distinct components account for observed zero values. It is applied to explain extra zeroes in the data (i.e., more zeroes than would be explainable by a negative binomial distribution) (16). It can be parameterized with the same parameters as the negative binomial distribution (mean and extra-dispersion). In addition, an “excess zero parameter” is introduced that gives the additional proportion of zeroes (more than would be predicted from a negative binomial distribution). If it is equal to zero, then there are no excess zeroes and this distribution is equivalent to the “classical” negative binomial distribution. Should both the excess zero parameter as well as the dispersion parameter be close to zero, the distribution is equivalent to the Poisson distribution.
V. Gamma Distribution
The gamma distribution is a continuous distribution defined by two parameters, and unlike a normal distribution it can be a heavily skewed distribution (Figure 1). It is defined by the two parameters alpha and beta. To estimate the alpha and beta parameters for microbiological count data, it is recommended to use the method of moments:
where alpha and beta are both equated to the arithmetic mean x̄ and the sample standard deviation SD. A more widely used method to estimate alpha and beta would be the maximum likelihood method. However, this is usually not feasible for microbiological data, as it requires all values larger than zero.
In a simplistic view, the gamma distribution may be seen as a purely continuous version of the negative binomial distribution for much larger counts where the probability of equaling a specific count is quite small (theoretically zero). However, similar to the continuous normal distribution, one may compute probabilities of counts between ranges. Unlike the normal distribution, the gamma distribution ranges from greater than 0 to infinity. Therefore, a zero count is outside the possibility of a gamma distribution. As a consequence, it may not very accurately model the lower end of count data (i.e., the results close to zero). However, as it may potentially accurately model the more relevant higher end of the count data, it may still be valuable for microbiological control level calculation (as control levels in microbiology usually are larger than the 95th percentile, accurate modeling of the higher end of the count data is most important).
VI. Formula Proposed by Hussong and Madsen
Another formula, which was originally proposed by Hussong and Madsen (10), is the following:
where x̄ is the arithmetic mean of the historical data.
In contrast to the normal distribution, the proposal by Hussong and Madsen also integrates elements from the Poisson distribution. As explained previously, for the Poisson distribution the SD is defined as the square root of the mean. Thus, the term corresponds to the SD definition of the Poisson distribution. The approach to add 3 times the SD was presumably borrowed from Six Sigma concepts, which assume a normal distribution. For our comparative calculations we considered the formula proposed for the alert level as a 95th percentile and the formula for the action level as a 99th percentile.
VII. Non-Parametric Percentile Calculation
Another possibility is to use a non-parametric approach, which does not assume the data following a particular distribution. In that case, the observed historical values are used for calculation of the percentile by a ranking procedure. Several approaches for calculating a percentile exist (e.g., nearest rank, linear interpolation between closest rank). The present study was based on the two approaches incorporated in Excel 2010 (Microsoft Corp., Redmond, WA), referred to as PERCENTILE.INC and PERCENTILE.EXC function. If not otherwise indicated, we used the PERCENTILE.EXC function, as it requires a minimal amount of values for calculating a percentile (e.g., 99 values for calculating the 99th percentile or 19 samples for calculating the 95th percentile). We considered this more consistent with the classical statistical view that describes a percentile as “the value of a variable below which a certain percent of observations fall” (18) (accordingly, if a 99th percentile should be defined, at least 99 values must be available that can fall below the calculated number). The formula used for PERCENTILE.EXC weights at p or 0.99 at n + 1, so if n = 100 then the 99th percentile is the 99.99th highest value which interpolates between the 99th highest value and largest value or 0.01 times the 99th value plus 0.99 times the largest value. PERCENTILE.INC weights at p*(n – 1) + 1, so for n = 100 the 99th percentile would be 0.99*99 + 1 = 99.01, which is 0.01 times the 99th value + 0.99 times the 100th number. In general, the PERCENTILE.INC will always give a lower 99th upper percentile than PERCENTILE.EXC of slightly less than 1 rank value lower, for example, for n = 1000, PERCENTILE.EXC would give 990.99, while PERCENTILE.INC would give 990.01.
Calculation of the Percentiles of Interest
The percentiles of interest for the normal distribution were calculated using Minitab 16 (Minitab Inc., State College, PA). For the Poisson distribution, the negative binomial distribution, and the ZINB distribution, the percentiles of interest were calculated using the GENMOD Procedure of SAS 9.3 (SAS Institute Inc., Cary, NC), where the percentiles were calculated from the probability mass formulas for the individual distribution using the estimated parameters. The percentiles of interest for the gamma distribution were calculated in Excel 2010 (Microsoft Corp.) using the GAMMADIST function. As outlined above, the GAMMADIST function was defined by a method of moments approach using the following parameters:
Accordingly, the gamma parameters were equated on the basis of the sample mean (x̄) and sample SD. The formula proposed by Hussong and Madsen and the non-parametric percentile calculation were also calculated in Excel 2010. For the formula proposed by Hussong and Madsen, we used the formula proposed for calculation of the action level if comparisons were made at a 99th percentile and the formula proposed for calculation of the alert level if comparisons were made at a 95th percentile. For the formula proposed by Hussong and Madsen, we generally used the PERCENTILE.EXC function. The PERCENTILE.INC function was only used where specifically indicated. Calculated values were subjected to standard rounding rules in order to obtain integral numbers (i.e., ≥0.5 was rounded up and <0.5 was rounded down).
Chi-Squared Goodness-of-Fit Tests
Chi-squared goodness-of-fit tests were run and values calculated using XLSTAT Pro (Addinsoft, New York, NY) or Prism 6 (GraphPad, La Jolla, CA).
Data Used for Calculation
Historical data from microbiological EM (air and inanimate surfaces) were used for comparative calculations applying the approaches outlined above. Values that exceeded the action level as defined by the respective cleanliness zone were systematically excluded from calculation. These were seen as special causes and therefore unsuitable for definition of a historically expected baseline.
In total, we statistically evaluated 83 data sets. The examples we report here were randomly picked with the aim of covering different mean microbial numbers, and are representative for different data structures we encountered during our evaluation.
Simulation Experiments
For simulation experiments, the data sets indicated in Figures 2A–C and 3A–C were introduced as separate columns in Excel. Besides each value, a random number was generated in a second column by use of the function creating a random decimal number (RAND). Values were then sorted on basis of these random numbers, and the first 100 values were used to calculate the control level with each of the statistical concepts described above (the 100 values were chosen as the sample size for the control level calculation in order to also use the Excel percentile function, which requires at least 99 values in order to calculate the 99th percentile). Subsequently, we evaluated how well the control level calculated on the basis of this random subset explained the rest of the data set. For example, if the data set consisted of 737 values, a random selection of 100 values was used to calculate control levels, with the different statistical approaches and their explanatory power used to evaluate the remaining 637 values. To that end, the deviation from the intended coverage of 99% was assessed (e.g., if a control level only explained 97% of the remaining values, the target of 99% had been missed by 2%). Ten such simulation runs per data set were conducted. The same simulation experiment was also performed with an additional eight data sets that had low arithmetic means (x̄ < 1).
Exemplary results from microbiological environmental monitoring of pharmaceutical cleanrooms. On the left-hand side, control levels at the 99th percentile calculated with the different statistical concepts (I–V as defined right hand side of the figure, VI = Action level calculated with the formulae proposed by Hussong and Madsen, VII = Non-parametric percentile calculation) are indicated. In the graphs on the right-hand side, fitting results with the different parametric models are visualized. x̄ is the arithmetic mean, σ the SD, and N the sample size.
Exemplary results from microbiological environmental monitoring of pharmaceutical cleanrooms. On the left-hand side, control levels at the 99th percentile calculated with the different statistical concepts (I–V as defined in the key on the right-hand side of the figure, VI = Action level calculated with the formulae proposed by Hussong and Madsen, VII = Non-parametric percentile calculation) are indicated. In the graphs on the right-hand side, fitting results with the different parametric models are visualized. x̄ is the arithmetic mean, σ the SD, and N the sample size.
Results
Fitting of Data from Environmental Monitoring and Calculation of Microbiological Control Level at the 99th Percentile
In Figures 2 and 3, EM results of six different cleanrooms from different cleanliness zones are shown. The dashed lines in the left-hand side figures show the control levels calculated with the corresponding statistical approach. The right-hand side figures indicate fits of the different distributions to the data. Because the formula proposed by Hussong and Madsen and the non-parametric percentile calculation do not rely on a particular distribution, they could not be fitted to the data. The total number of values (N) for these different cleanrooms ranges from 214 to 737. The means of these data range from 0.2 to 38.7 CFU. For these figures, the units of y-axis and x-axis, respectively, are CFU/m3 for air monitoring (Figures 2B, 2C, 3A, 3B and 3C) and CFU/25 cm2 for surface testing (Figure 2A).
The normal distribution as well as the Poisson distribution showed poor fit of the data in most cases. The negative binomial and the ZINB distributions were in most cases very close to each other. Finally, the gamma distribution generally fitted the data similarly well as the negative binomial and the ZINB distribution, particularly for data with higher arithmetic means.
With the calculated control levels, marked differences were observed depending on the calculation approach used. For instance, in Figure 2B the calculated control level using the Poisson distribution would be 6 CFU, whereas the gamma distribution would indicate 21 CFU. This highlights the strong differences between the models and how important it is to use the correct one. In that case, using a control level of 6 CFU would result in 53 excursions (7.2% of total number of samples) compared to 7 excursions (0.8%) when using a control level of 21 CFU.
Predictive Power of the Different Calculation Approaches in Simulation Experiments
As outlined above, we evaluated how well the different calculation approaches explained the observed count results. However, such an evaluation is only retrospective and does not necessarily indicate how well future values are predicted. Obviously, the prediction of future values becomes more accurate the larger the data basis is for the definition of the control level. However, in microbiology the amount of available historical data is often rather small. Therefore, we next asked what the predictive power of the different calculation approaches would be for a data population, if a control level was calculated on basis of a subset of this population. To that end, we again used microbiology data from routine EM. We took the EM data obtained for a particular cleanroom and randomly generated subsets of 100 values. Subsequently, we used these subsets to calculate control levels with the different calculation approaches. Finally, we assessed how well these control levels explained the rest of the data population (i.e., the rest of the data obtained for that particular cleanroom). Towards that purpose, the deviation from the intended coverage of 95%, 98%, 99%, and 99.5% was assessed. Data sets used were those from Figures 2A, B, and C and 3A, B, and C. Detailed results for the 99th percentile are shown in Table I. Assuming normally distributed data, the average deviation from the 99th percentile was 1.74% (Table I) and the calculated control levels were consistently too low (Table II). The Poisson distribution and the formula proposed by Hussong and Madsen seriously underestimated the control levels in data sets with higher arithmetic means (Tables I and II). The negative binomial distribution seemed to be slightly superior in predicting future values than the ZINB (average deviation from the 99th percentile was 0.85% for the negative binomial distribution and 1.01% for the ZINB, respectively). Of note, this difference arose because the ZINB typically calculated a lower control level compared to the negative binomial distribution, which accounted for future values (Table II) less well. The non-parametric Excel percentile function had reasonable data coverage (Table I); however, the calculated control levels were often considerably higher compared to parametric statistical models (Table II). Consistent with that, coverage of the predicted values was often clearly higher than 99% (data not shown). Finally, with an average error of 0.68%, the gamma distribution was best at predicting future values with the data sets used in our study (Table I).
Results of Simulation Experiments with Data from Microbiological Environmental Monitoring. One hundred values were randomly drawn from the data population, and a control level (i.e., the 99th percentile) was calculated using the different approaches. Then how much of the remaining data population could be covered by these calculated control levels was assessed and compared to the percentile of interest (e.g., if a calculation approach covered 98% of the data population for a 99th percentile of interest, then the error would be 1%). Furthermore, a Chi-squared goodness-of-fit test was performed in order to identify the best-fitting parametric model for each data set, and the coverage of the control levels calculated always using the best-fitting model was assessed (indicated as “Best fit” in the table).
Mean and Range (in Brackets) of the Control Levels Calculated in the Simulation Experiments Summarized in Table I
Because the data used had rather high means, we repeated the same simulation for eight data sets with a low mean (x̄ < 1 CFU), which we considered particularly relevant for the gamma distribution. As the gamma distribution is continuous and microbiological data is discrete, we assumed that prediction may become less good for data sets with very low arithmetic mean (because then the discrete nature of the data should take most effect). Interestingly, this was not the case if the same simulation experiments were performed with eight additional data sets that had a mean <1 CFU (Table III), although a Chi-square goodness-of-fit test pointed towards use of the discrete statistical models (data not shown). The normal distribution, which also is continuous, underestimated the control limit, as did the Poisson distribution and the formula proposed by Hussong and Madsen. Again, there was no clear benefit in using the ZINB distribution as compared to the non-zero-inflated negative binomial model.
Results of Simulation Experiments with Data Sets with Low Means. One hundred values were randomly drawn from data populations with low means, and a control level (i.e., the 99th percentile) was calculated using the different approaches. Then how much of the remaining data population could be covered by these calculated control levels was assessed and compared to the percentile of interest (e.g., if a calculation approach covered 98% of the data population for a 99th percentile of interest, then the error would be 1%). Furthermore, a Chi-squared goodness-of-fit test was performed in order to identify the best-fitting parametric model for each data set, and the coverage of the control levels calculated always using the best-fitting model was assessed (indicated as “Best fit” in the table).
In Figure 4, results for the other percentiles are summarized. Also at other percentiles, the Poisson distribution had by far the lowest coverage of all distributions tested. With our data, the normal distribution could well predict the 95th percentile; however, it failed to adequately cover higher percentiles. In general, the non-inflated negative binomial model made better predictions compared to the ZINB distribution (Figure 4). In Figure 4 we differentiate between the Excel PERCENTILE.EXC and PERCENTILE.INC functions. While differences were minor at the lower percentiles, the PERCENTILE.EXC function made better predictions at the 99th percentile (for the 99.5th percentile, the PERCENTILE.EXC function could not be used because it required more than 100 values for calculation). For percentiles ≥98% the gamma distribution achieved the best predictions with the data sets used in this study, even when compared to a best-fit approach on the basis of the Chi-squared goodness-of-fit test.
Results of simulation experiments with data from microbiological environmental monitoring. One hundred values were randomly drawn from the data population, and control levels at different percentiles were calculated using the different approaches. Then how much of the remaining data population could be covered by these calculated control levels was assessed and compared to the percentile of interest (e.g., if a calculation approach covered 98% of the data population for a 99th percentile of interest, then the error would be 1%). The calculation approach with the best prediction is located on the top, whereas the calculation approach with the worst prediction is located on the bottom. With exception of the formula proposed by Hussong and Madsen and the Poisson distribution, size of the colored dots is proportional to the extent of error. The formula proposed by Hussong and Madsen for calculation of an alert level was treated like a 95th percentile, whereas the formula for calculation of an action level was treated like a 99th percentile. The Excel PERCENTILE.EXC function could not be calculated for the 99.5th percentile, as it required ≥198 values. Furthermore, a Chi-squared goodness-of-fit test was performed in order to identify the best-fitting parametric model for each data set, and the coverage of the control levels calculated always using the best-fitting model was assessed (indicated as “Best fit” in the figure).
Discussion
In the present work we analyzed different calculation approaches for their ability to accurately define microbiological control levels. To that end, simulation experiments with real microbiological data were performed. A random selection of 100 values from a data set was used to calculate control levels, and the explanatory power of these control levels for the remaining values was assessed. Although it is clear that statistically calculated control levels would be updated more frequently during routine application, we believe that these simulations provide valuable information regarding the predictive power of the different calculation approaches examined in this study. It must also be clearly understood that quantitative conclusions of our work are restricted to the data used in this study and may not be generalized for all microbiological count data from the environment (for this, much larger amounts of data need to be evaluated). However, given that the data used were from routine EM, it is reasonable to assume that similar trends may also exist with other data and that the basic conclusions of this article may have broad applicability.
The results of our study confirmed that assuming a normal distribution is typically not appropriate for microbiological data. Importantly, not a single data set we examined in the course of our work could be well fitted with a normal distribution. On that basis, we consider this model as inappropriate for determination of microbiological control levels and recommend against using it for that purpose. The main reason for this apparent lack of fit is that microbiological data generally have a higher variability than could be explained by a normal distribution. For lower percentiles, the inability to explain the full variability of the data is less problematic, as the highest values are not considered part of the usual count data anyway. This explains why prediction at a 95th percentile was among the best, but prediction got clearly worse for higher percentiles (Figure 4).
Assumption of a Poisson distribution worked to a certain extent for some data with low arithmetic mean (Figure 2A). However, the simulation experiments indicated that even for data with low arithmetic mean the microbiological control level was systematically underestimated in the majority of cases (Tables I, II, III and Figure 4). For data sets with higher arithmetic mean (Figures 2B, C and 3A, B, and C), the Poisson distribution distinctly underestimated the control level and the predictive power for future values was highly insufficient (Tables I, II, III and Figure 4). The Poisson distribution assumes that the variance/mean ratio is close to 1; if this ratio is significantly larger, the control level is set too low. During our evaluation, a variance/mean ratio as expected for Poisson-distributed data was observed in rare instances for data from high-grade cleanrooms and non-sterile product testing (in these cases the observed counts were basically confined to 0 or 1 CFU) (data not shown). In these special cases, the Poisson distribution could theoretically be applied. However, for such data all models typically indicated the same control level (i.e., 1 CFU). In order to evaluate whether the variance/mean ratio may be consistent with a Poisson distribution, the Fisher dispersion test (13) could be applied.
For the formula proposed by Hussong and Madsen, which represents a mixture of normal and Poisson distribution characteristics, a similar picture emerged as under the assumption of a Poisson distribution. It should be considered, however, that Hussong and Madsen designed their formula for microbiology data from higher-grade cleanrooms (ISO 4.8/5) and that the formula worked relatively well for data structures that would be expected from such areas. For instance, the predictive power for data with low means was superior to the Poisson distribution and normal distribution (Table III).
The non-parametric Excel percentile predicted future values well in our simulation experiments (Tables I and III). However, particularly with the PERCENTILE.EXC function, the calculated control levels were on average higher compared to the parametric models, and also the range of levels calculated for the different simulation runs was broader (Table II). With the 100 values used for the simulation, the highest observed value (or the highest observed value minus 1 CFU) was considered the control level by this Excel function when working at a 99th percentile. This strong dependency on the largest observed value explains the broad range of calculated control levels in our simulation experiments. If the percentile of interest would have been 95% instead of 99%, then the largest observed value would have had a smaller impact at a sample size of 100 values (but the same impact at a sample size of 20 values). In principle, the closer the number of values available for calculation of the control level is to the minimal sample size requirement of the PERCENTILE.EXC function, the stronger the impact of the highest observed value is. Accordingly, if the available sample size is close to the minimal requirement of the percentile function, whether the highest observed value could eventually represent an outlier causing an unrealistically high control level should be very carefully assessed. However, on which basis such a decision should be made is difficult to generalize and may even be harder to justify, representing a potential drawback of the Excel percentile function. When the available sample size is clearly larger than the minimal requirement of the PERCENTILE.EXC function, then the calculated control levels are less affected by the largest observed value. Therefore, the use of the PERCENTILE.EXC function may reasonably be expected to represent an appropriate solution if sufficient data is available to mitigate the difficulties outlined above. Of note, the PERCENTILE.INC function tended to underestimate microbiological control levels (data not shown) and accordingly made predictions at higher percentiles worse as compared to the PERCENTILE.EXC function (Figure 4).
Both the negative binomial distribution as well as the ZINB distribution fitted the microbiological data well and had good predictive power of future values (Tables I, II, III and Figure 4). However, on basis of our simulation experiments, we saw no evidence to generally prefer the ZINB over the corresponding non-inflated model, as it consistently made worse predictions. In addition, the algorithm employed for calculating the ZINB distribution did not find evidence for excess zeroes with 43 % of the data sets used, and accordingly applied the non-inflated negative binomial model. Thus, we saw no evidence to generally prefer the zero-inflated model as proposed by Yang et al. (10).
Finally, the gamma distribution on average made the best predictions in our simulation experiments, although it is a continuous distribution. Surprisingly, this was even the case for data sets with low arithmetic mean (i.e., with raw data dominated by low counts and many zeroes) (Table III), where Chi-squared goodness-of-fit values for the gamma distribution were worse compared to discrete models like the negative binomial or ZINB (data not shown). As discussed in the Methods section, the gamma distribution cannot reflect the count result “zero” and therefore has shortcomings in modeling the lower end of the data. However, because this lower end of the data is actually not very relevant when calculating a percentile >95%, it is indeed possible to achieve very good results with a model that only accurately predicts and fits the right tail of the data (i.e., the high count results) (Figure 5). This explains why the predictive power of the gamma distribution was not so good at the 95th percentile, but became increasingly better for higher percentiles (Figure 4). Of note, we used a method of moments approach to calculate the gamma distribution parameters α and β in order to deal with the zeroes in the data. The use of a maximum likelihood estimation, which is often applied to that end by statistical standard software, would not have been possible with such data because it applies logarithms.
Exemplary data set fitted with the negative binomial distribution (discrete) and the gamma distribution (continuous). The right-hand figure represents a magnification of the fits for the highest percentiles.
From a statistical point of view, a common approach would be to run goodness-of-fit tests for each of the models and then use the one that best explains the data (as was done in the “Best fit” approach in Figure 4). On the basis of our simulation results, it may be reasonable to confine such calculations on the negative binomial distribution, ZINB, and gamma distribution (although the necessity to include the ZINB may indeed be debatable, as we have not seen any clear benefit compared to the non-zero-inflated negative binomial model). However, goodness-of-fit-tests typically integrate all observed count results and provide a rating of how well a particular model explains all the data. But with our particular type of question (calculation of a control level at a percentile >95%), the goodness-of-fit for the low count numbers is less relevant. In other words, the conclusions from unmodified goodness-of-fit-tests may be misleading in the light of our question of interest, as these are influenced by the fitting accuracy for the (less relevant) low count results. As shown in Figure 5, the gamma distribution cannot fit the low count results well (up to 8 CFU) compared to the negative binomial distribution. Nonetheless, the high count results (larger than 8 CFU) are well fitted by the gamma distribution. Therefore, predictions for such large values (corresponding to high percentiles) may be good, although the overall fit is not the best. As shown in our simulations with real microbiological monitoring data, assuming a gamma distribution by default led on average to at least as good prediction of percentiles ≥98% when compared to a best-fit procedure (Figure 4). Accordingly, from a pragmatic point of view, microbiologists may also exclusively rely on the gamma distribution for calculation of their control levels. This approach is attractive because it can be performed entirely in standard office calculation software with the formulae provided in the Methods section and does not require use of specialized statistical software. Particularly for smaller companies, which may not be able to invest resources in establishing more sophisticated statistical frameworks and best-fit algorithms, application of the gamma distribution may be a superior alternative to the widely used Excel percentile functions if the amount of available data for calculation of microbiological control levels is limited.
In our study we relied on EM data. Importantly, the proposed approaches are applicable for any microbiological data that follow similar patterns that we observed in our study. During our internal data review, we have so far not encountered different patterns in data from other microbiological tests (e.g., microbial enumeration tests of non-sterile products, water, bioburden in intermediate products or bulk solutions), suggesting that the above outlined approaches may be broadly applicable for control level calculation. It should be noted that a minimal number of historical values must be defined for which statistical modeling is considered reasonable. We propose that not less than 50 values should be available for fitting of parametric models. However, in cases where the data can be expected to be very stable, a smaller number may be sufficient. For non-parametric Excel percentile functions, in general more data should be used as explained above. If not enough data is available for fitting of a statistical model, the control level may, for instance, be arbitrarily defined as a percentage of the specification limit or recommended action level as an interim solution. Also, if there are sufficient values but all of these are below the detection limit of the method used, every result above the detection limit may be considered an excursion from the control level.
Despite mathematical definition of control levels, there are still areas where decision making by the microbiologist must occur. First, data used for calculation of control levels from historical data should not include outlier values that are due to special causes. Excursions from a compendial limit, for instance, do not represent the normal state of control of a product or process. On that basis, such values should not be included in the body of data used for calculation of control levels. Similarly, data associated with major deviations should be critically reviewed. Further, the microbiologist must decide which percentile should be applied for the statistical calculation. The percentile directly affects the threshold of acceptance below which a microbiological count result is considered part of the expected variation. Therefore, the choice of the percentile has consequences on the number of excursions that occur, even if the process or product of interest is actually under control. For instance, if 1000 analyses per year are performed in EM, a 95th percentile or 99th percentile would mean that 50 or 10 values, respectively, are theoretically expected to exceed the control level even if the environment is completely stable and the statistical models fit the data perfectly. Therefore, working with too low of a percentile would waste resources by investigating non-issues or normal/controlled conditions. Even more importantly, in such a situation the exceeding of a control level would probably lose its function as a warning signal and no longer be taken seriously (imagine a fire alarm that goes off every other night—after two weeks, no one will leave the building anymore). On the other hand, setting the percentile extremely high is also no solution, because then unusually high counts that may indeed signal a potential drift from normal operation would not be detected. In any case, use of microbiological control levels should also be combined with an appropriate trending system to determine if clusters of counts indicate an adverse trend (19⇓⇓–22). If such a trending system is well designed, it also supports the detection of drifts from normal operation.
There are different ways to integrate control levels calculated from historical data in the microbiological monitoring program. For instance, an action level can be set based on regulatory or industry guidelines, and an additional alert can be set level based on historical data (7). Alternatively, a specification level from regulatory documents could also be supplemented with action and alert levels that are both based on historical data. With such an approach, a higher percentile should be chosen for the action level compared to the alert level. However, whether use of three levels confers advantages over the use of two levels may be debatable. The authors, for instance, do not see significant additional value for most applications if the historically based control level has been appropriately defined. Finally, if no specification or action level exists in regulatory and guidance documents, it would also be conceivable that all levels are calculated based on historical data. The latter approach may also be indicated when defining control levels for non-growth-based alternative microbiological methods. Because such methods allow for direct measurement of microorganisms, uncritical adoption of compendial levels expressed in CFU (which implies growth of microorganisms under the conditions of the test) would most probably lead to difficulties, as only few microorganisms from our environment can actually be cultivated (23). In all of the above cases, a clear distinction in the investigation procedure should be made between the alert and action level to avoid confusion.
Calculated microbiological control levels should be re-assessed after a defined time period or if sufficient additional results can be integrated in the calculation. In the view of the authors, if the re-calculated control level is equivalent or lower than the previous one, it may be implemented as such. However, if the re-calculated control level is significantly higher than the previous one, it may only be implemented if a risk assessment concludes that product quality is not worsening or that the increase will not affect product quality.
Conclusion
In conclusion, calculations assuming the normal distribution tended to underestimate variability and accordingly set the microbiological control level too low. The same was true for the Poisson distribution and the formula proposed by Hussong and Madsen, particularly if applied on data with higher arithmetic means. Accordingly, if used with data not suited for these calculation approaches, count results within the typical background noise of the data would be considered out of expectation and trigger unnecessary investigations. In contrast, the negative binomial distribution and gamma distribution appeared generally applicable for basically all data structures we examined. With our data, the gamma distribution on average had at least as good predictive power as the negative binomial distribution and ZINB distribution for percentiles ≥98%, indicating that it may represent a viable option for calculating microbiological control levels at high percentiles. In addition, the gamma distribution does not require expensive software for calculation, and because it does not employ iterative calculation processes no problems with algorithm convergence can occur. On that basis, we propose that use of the gamma distribution may represent a pragmatic, efficient, and easy-to-implement solution for calculating microbiological control levels.
Acknowledgements
We thank Jochen Giese, Abe Germansderfer, Urs Simmen, and Andreas Schötzau for their statistical support. Nicolas Weber, Doris Kattner, Michael Schiffer, and Emilia Rispoli are acknowledged for providing some of the data used in this study.
- © PDA, Inc. 2015