Group testing: Revisiting the ideas

The task of identification of randomly scattered “bad” items in a fixed set of objects is a frequent one, and there are many ways to deal with it. “Group testing” (GT) refers to the testing strategy aiming to effectively replace the inspection of single objects by the inspection of groups spanning more than one object. First announced by Dorfman in 1943, the methodology has underwent vigorous development, and though many related research still take place, the ground ideas remain the same. In the present paper, we revisit two classical GT algorithms: the Dorfman’s algorithm and the halving algorithm. Our fresh treatment of the latter and expository comparison of the two is devoted to dissemination of GT ideas, which are so important in the current COVID-19 induced pandemic situation.


Introduction
The task of identification of bad items in a given set of objects arises quite often.For example, consider identification of: (i) the infected patients in a fixed cohort or (ii) the defective items in the production batch.Usually, this identification task is a composite problem and spans many subtasks.One of such subtasks can be described as "an efficient utilization of resources devoted to testing of investigated objects".It turns out that, among plenitude of context dependent methods designed for the solution of this subtask, the appropriately chosen testing plan plays an exceptional role since it alone can reduce the testing costs substantially.This is the contextual target of the present paper.To be more precise, we focus on testing strategies widely known under the name of Group (or Pooled Sample) Testing (in what follows, we make use of an abbreviation GT).The core idea underlying GT strategy is an observation that, in many cases, the testing of single items can be replaced by the testing of a group spanning more than one item.Though it is difficult to trace back the exact date and inventor of this cornerstone idea (for a good historical account, see [13,Chap. 1]), without doubt, much of the credit goes to the pioneering work of Dorfman [12].In that paper, the blood testing problem was described, and the following scheme was suggested.Given N individual blood samples, pool them and test for the presence of an infection in the pooled sample; in case of the negative testfinish; in case of the positive test -retest each single patient.The rationale behind is clear: if the prevalence of the infection is low, one usually ends up with a single test applied to the pool instead of N tests applied individually.
Since appearance of Dorfman's work [12] in 1943, GT ideas were evolving in many directions and found important applications in molecular biology, quality control, computer science and other fields.Digging into the literature, one can observe that it is indeed very widespread across different disciplines.Because of this reason, some developments were overlapping and rediscovered by researchers working in the different fields.Our personal familiarity with the field also underwent this route: attracted by potential applications in the context of COVID-19 epidemics, we have rediscovered some well-known facts.Nonetheless, the attained experience and understanding of the importance of the tool inspired us to write a promotional paper on the topic.This is the main intent of the paper: we believe that, in the current pandemic situation, the spread of GT ideas and attraction of other researchers to the field is an important and meaningful task.We do not propose novel GT schemes or methodological improvements.Our presentation is primarily devoted to those unfamiliar with the subject aiming to provide a quick lightweight introduction "by example" without delving into details yet giving a flavor of the topic as a whole.Choosing a mathematical journal, we, first of all, were interested in the dissemination within the mathematically oriented community.Secondly, while getting familiar with the topic, we have encountered a lot of papers, where the subject was treated without sufficient mathematical rigorousness.We therefore felt that our rigorous treatment of the GT scheme H (see Section 2), unseen (or at least unobserved) by us, was a missing item in the existing literature.Finally, after submission of the initial version of the paper, we have discovered that our Proposition 2 adds some new information to what is known about classical Dorfman's scheme (see the comments in Section 2).
The remaining part of the paper is organized as follows.In Section 2, we provide some preliminaries, then describe and contrast two classical GT schemes.In Section 3, we give an accompanying discussion highlighting some relevant issues and skim through the related literature.Appendices A and B contain some mathematical derivations and tables.
Because of COVID and the exemplary nature, we attach the whole presentation to the biomedical context.

Two classical GT schemes
Consider the following setup.Assume that the prevalence of some disease (the fraction of the infected individuals) is equal to p ∈ (0, 1) in an infinite (or large enough) population.A cohort spanning N independent individuals has to be tested, and infected patients have to be identified.To achieve the goal, samples are collected from each individual.The applied test performs equally well for individual and for pooled samples: a situation might occur, e.g., when the test indicates the presence of the infection in the blood sample and there is no difference whether the latter is obtained from a single individual or from a pooled cohort of samples.For the situation described, physicians can choose different testing strategies.Let us assume that the following are three possible choices2 .Scheme A: Test each patient's sample.Scheme D: Conduct testing of the pooled sample.Test each member of the cohort separately only in case of detected infection in the pooled sample.Scheme H: Step 1. Test pooled sample of the whole cohort.Proceed to Step 2.
Step 2. If the test is positive, proceed to Step 3, otherwise, finish testing cohort.
Step 3. Divide the cohort into two parts consisting of the first and second halves, respectively.Apply the whole algorithm to the two obtained parts recursively.
Although it is not obvious at the first glance, Schemes D and H can be much more efficient as compared to Scheme A, provided prevalence p is low enough.To give a rigorous justification (along with the concept of efficiency), let us formally define the underlying model.
Consider the sample of N individuals.Put X i = 1, provided the test of ith individual is positive, and X i = 0 otherwise.Let S = S N = X 1 + • • • + X N be the total number of infected individuals in the sample, and let T = T N be the total number of tests applied to the cohort.
We start with Scheme D. The test is applied once if the result is negative, and it is further applied to each of N individuals otherwise, i.e., The above implies that X 1 , . . ., X N are independent identically distributed (i.i.d.) random variables each having Bernoulli distribution Be(p).Therefore, S has the binomial distribution Bin(N, p).Consequently, an average number of tests per cohort is where q := 1 − p.An average number of tests per individual, say t = t(N ), is Consider a function t : (0, ∞) → (0, ∞) given in (1).By equating its derivative to 0 we see that the stationary points solve equation which is a fixed point equation for g(N ) = q −N/2 (ln(1/q)) −1/2 , N ∈ (0, ∞), and hence, can be easily solved iteratively.It is further not difficult to prove that, for p in the region enclosing (0, 0.2), there exists a unique solution N p > 0 of (2), which is a minimizer of t(N ) (see Proposition 2 below).Then, turning back to economic/biomedical interpretation, we conclude that, having a cohort of N p (here and in the sequel, y stands for an integer part of y ∈ R) individuals, Scheme D results in a lowest average number of tests per person, which is possible when applying scheme of this type for a population having prevalence p. Scheme A, in contrast, always has a constant number of tests 1 per person.Therefore, an average (absolute) gain attained applying Scheme D instead of Scheme A is given by the difference Right panel in Fig. 1 shows the graph of p → 100G p , p ∈ (0, 0.2), which is an average gain measured by the number of tests saved per 100 individuals.The corresponding values are provided in Table B1 (see Appendix B).An accompanying graph of p → N p (see the left panel of Fig. 1) demonstrates dependence of an optimal sample size on p.To obtain a fast numerical evidence, assume that N is bounded away from zero and pN → 0. Then from (2) it follows that the optimal sample size satisfies Hence, assuming that p is small enough for pN ≈ 0 to hold, the above implies that For example, if p = 0.01, then we have G p ≈ 0.8, i.e., an approximate average gain is 80% or so.Now let us switch to the Scheme H. Its main features are summarized in the following proposition (for the proof, see Appendix A).
Proposition 1. Assume the Scheme H. Then (i) an average number of tests per person is given by Nonlinear Anal.Model.Control, 26(3):534-549 (ii) an average number of tests per person in the case of an infinitely large cohort is (iii) for a fixed p ∈ (0, 1), function t : N → (0, ∞) admits at most two minimizers N p : the value N = N p corresponding to optimal sample size is either 1/(2 log 2 (1/q)) or 1/(2 log 2 (1/q)) + 1.
Inspection of the results in the statement of the proposition leads to a quick comparison of Scheme H with A and D. Indeed, consider first the limit in (ii).Obviously, Hence, for q ≈ 1 (or alternatively p ≈ 0), t(∞) < 1.The latter means that, when the prevalence is low, this scheme always outperforms common sequential Scheme A. Again, to gain a quick quantitative insight, assume that p is small enough for pN ≈ 0 to hold.Then turning to (iii) and taking a "continuous" (undiscretized) version of N p equal to ln 2/(2 ln(1/q)) yields relationships (see Remark A1, Eq. (A.3)) Therefore, an approximation to an average gain G p = 1 − t(N p ) is 1 + 2p log 2 p. Taking, e.g., p = 0.01 results in G 0.01 ≈ 0.867.Considering analogous example given for Scheme D, we see that the gain has an increase close to 7%.In fact, this is not surprising (for a visual comparison of Schemes D and H on the linear and the log-log scales, see Fig. 1, and, for the numerical one, see Tables B1 and B2 in Appendix B) since, for Scheme D, we had G p ≈ 1 − 2 √ p and p log 2 p/ √ p → 0 as p → 0. Equality (4), however, exhibits some magic flavor.To see this, note that, for p ≈ 0, entropy I p of X ∼ Be(p) is asymptotically equivalent to p log 2 p since Consequently, (4) means that the optimal average number of tests per one individual scales like entropy of the prevalence of the infection.Keeping in a view the above relationship, it is not surprising that the significant number of works [1, 8,19] have approached the testing problem from the information theory perspective.In the next section, we provide additional comments regarding connections with the information theory.Here we end up with the previously mentioned Proposition 2, which is proved in Appendix A.
http://www.journals.vu.lt/nonlinear-analysis Figure 1.Scheme H (red) vs. Scheme D (black) on the linear and the log-log scale.
The above proposition can be viewed as a counterpart of Proposition 1.Note that it does not contain analytical expression of an optimal sample size.The latter was given by Samuels [35] and is either 1 + p −1/2 or 2 + p −1/2 .Of most importance is that Samuels [35] not only provided the analytical expression of the optimal sample size but has also shown that, for the case of Scheme D, the optimal sample size equals to 1 for p > 1 − (1/3) 1/3 ≈ 0.31.This, in turn, is in agreement with the fundamental fact of the GT theory discovered by Ungar [42]: if p (3 − √ 5)/2 ≈ 0.38, then there does not exist an algorithm that is better than individual one-by-one testing.
An interesting detail here is that our proof given in Appendix A differs from that of Samuels and leads to an exact analytical expression for the range of p (the set A above), where g p (N ) has a unique minimizer.

Discussion
Since its appearance, the Dorfman's scheme D was rigorously investigated by many authors (we refer to [23,35,39,40] to name a few).Talking about Scheme H, the situation is a bit different.To our best knowledge, the reference [46] is the only work close to ours both in nature of investigations and results.However, in that paper, the authors focus on the treatment of an asymptotic regime of Scheme H when p → 0. Majority of other references encountered by us provide instructions suitable for the practical application of Scheme H with a brief and nonrigorous theoretical background.For example, in the present context, it was currently afresh discussed by Gollier and Gossner [18], Mentus et al. [29] and Shani-Narkiss et al. [37].For an older reference discussing the case of nonhomogeneous population (i.e., the one in which the probability of being infected p may vary across individuals) and containing quite a large body of applied literature on halving algorithm (i.e., Scheme H), we refer to [4].
One should have noticed that halving, constituting the core of the Scheme H, yields another link to the information and algorithm theory in addition to the one already mentioned3 in Section 2. Namely, in its essence, Scheme H is nothing more than the quick sort (QS) algorithm designed to sort a set containing keys of two types (bad and good ones).It is well known that QS yields the best (up to the constant multiple) possible average performance among comparison-based algorithms: to sort an array having N nonconstant (i.e., random) items, the smallest average number of comparisons is of the order N ln N [10], and all randomized "divide-and-conquer" type algorithms (with QS being one among the rest) have expected time asymptotically equivalent to QS, which randomly splits sorted set into two equal subsets [11].Our formula (3) is just a confirmation of the well-known fact.To see this, note that, in the context of sorting task, (3) presents an average number of comparisons per item.Though the order is correct, we are inclined to think that the multiplier 1 0 ( 1 q x 2 1+ v log 2 N −1 dx) dv can be improved by making use of QS modification (or another comparison-based algorithm) designed to sort items with a small number of possible values (in our case, there are just two values: "sick" and "healthy").On the other hand, as already mentioned above, the order is optimal since though there are algorithms, which can beat QS when sorting integers, e.g., [2,41], they operate under different, i.e., noncomparison-based, mode.In our case, however, comparison is predefined by the setting of the problem at hand: we assume that biomedical tests can only be carried out by making use of comparison.
Though biomedical context is very frequent in applications, there are many others including engineering, environmental sciences, information theory, etc. (see [6, 15, 21, 22, 24-26, 28, 30, 32]).This "real life" contextual diversity brings many constraints to take into account despite the fact that the standard binomial setting, considered in Section 2, quite often can be regarded as a good starting approximation.To get a full understanding of the matter touched, below is a short list of key issues with a brief description of each.
• Heterogeneity of population.The prevalence of disease may depend on other factors (e.g., age and gender).• Imperfectness of the test.The test can have sensitivity and/or specificity below 1.
• Dilution effect.Pooling can reduce testing accuracy substantially.If this is the case, it is necessary to impose upper bound on the number of pooled samples.• Implementation costs.In Section 2, we silently assumed that implementation of the considered schemes only involves retesting related costs.However, it may involve others as well.• Dependence.It can happen that tested individuals are somehow related.
All these underpinnings have to be addressed carefully.Take, for example, the last one.From results presented in Section 2 one can infer that the application of the GT procedures is most effective when the prevalence is low (p ≈ 0).In such case, under classical assumption pN → λ > 0, the number of infected individuals S N can be well approximated by the Poisson distribution Pois(pN ), and the approximation remains quite accurate irrespectively of the nature of the dependence exhibited by summands (see [3,7,43] and references therein for results of this kind with possible extensions beyond the classical setting).It is therefore reasonable to assume that, after switching to Poisson approximation, at least some of the existing schemes can be carried over to the dependent case.Clearly, additional restrictions call for new theoretical investigations.
The set of directions of such investigations can be significantly appended by including other methods and GT related tasks.More concretely, the schemes considered in Section 2 broadly fall into the class of probabilistic GT schemes.Another widely adopted paradigm is called combinatorial approach.Within its framework, one does not assume any random mechanism and tries to make use of combinatorial methods in order to identify d bad items in the given group of N d objects (see monographs [13,14]).Speaking about tasks, up to now, we have focused only on the identification of bad items (or infected patients) under assumption that the prevalence p is known.In addition to the literature devoted to this task, there is a huge body of literature dealing with an estimation (both point and interval) of p from pooled samples observations as well as testing issues (see, e.g., [16,20,34] and references therein).
We hope that our discussion complies well with our initial goal stated in the introduction.To emphasize the relevance of similar promotional discussions in the present context, we point out a huge burst of papers devoted to similar problems (see, e.g., [5,17,31,33,36,38,44,45]).Besides that, we also note that some countries have already successfully applied pooling methodology for testing of the SARS-CoV-2 virus. 4able B1

Table B2 .
. Performance of Scheme D. Performance of Scheme H.