birthday {stats}R Documentation

Probability of coincidences

Description

Computes approximate answers to a generalised birthday paradox problem. pbirthday computes the probability of a coincidence and qbirthday computes the smallest number of observations needed to have at least a specified probability of coincidence.

Usage

qbirthday(prob = 0.5, classes = 365, coincident = 2)
pbirthday(n, classes = 365, coincident = 2)

Arguments

classes

How many distinct categories the people could fall into

prob

The desired probability of coincidence

n

The number of people

coincident

The number of people to fall in the same category

Details

The birthday paradox is that a very small number of people, 23, suffices to have a 50–50 chance that two or more of them have the same birthday. This function generalises the calculation to probabilities other than 0.5, numbers of coincident events other than 2, and numbers of classes other than 365.

This formula is approximate. For coincident = 2 the exact computation is straightforward (see the examples) and may be preferable. The approximation based on Diaconis and Mosteller (1989), and was a mis-reading prior to R 2.13.1.

Value

qbirthday

Minimum number of people needed for a probability of at least prob that k or more of them have the same one out of classes equiprobable labels.

pbirthday

Probability of the specified coincidence.

References

Diaconis, P. and Mosteller F. (1989) Methods for studying coincidences. J. American Statistical Association, 84, 853–861.

Examples

require(graphics)

## the standard version
qbirthday()

## same 4-digit PIN number
qbirthday(classes = 10^4)

## 0.9 probability of three or more coincident birthdays
qbirthday(coincident = 3, prob = 0.9)

## Chance of 4 or more coincident birthdays in 150 people
pbirthday(150, coincident = 4)

## 100 or more coincident birthdays in 1000 people: very rare
pbirthday(1000, coincident = 100)

## Accuracy compared to exact calculation
x1 <-  sapply(10:100, pbirthday)
x2 <- 1-sapply(10:100, function(n) prod((365:(365-n+1))/rep(365,n)))
par(mfrow = c(2, 2))
plot(x1, x2, xlab = "approximate", ylab = "exact")
abline(0, 1)
plot(x1, x1-x2, xlab = "approximate", ylab = "error")
abline(h = 0)
plot(x1, x2, log = "xy", xlab = "approximate", ylab = "exact")
abline(0, 1)
plot(1-x1, 1-x2, log = "xy", xlab = "approximate", ylab = "exact")
abline(0, 1)

[Package stats version 2.13.2 Index]