97-F-29. Takemura, Akimichi, "Some Superpopulation Models for Estimating the Number of Population Uniques", September 1997.

The number of the unique individuals in the population is of great importance in evaluating the disclosure risk of a microdata set. We approach this problem by considering some basic superpopulation models including the gamma-Poisson model of Bethlehem et al.(1990). We introduce Dirichlet-multinomial model which is closely related but more basic than the gamma-Poisson model, in the sense that binomial distribution is more basic than Poisson distribution. We also discuss the Ewens model and show that it can be obtained from the Dirichlet-multinomial model by a limiting argument similar to the law of small numbers. The multivariate Ewens distribution is a basic mathematical model used in genetics. Estimation of the number of the population uniques is particularly simple under the Ewens model.

Although these models might not necessarily well fit actual populations, they can be considered as basic mathematical models for our problem, as binomial and Poisson distributions are considered as basic models for count data.