Bernoulli Distribution: Definition, Properties and Applications

The Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli, is a foundational concept in probability theory and statistics. While seemingly simple, it forms the basis for many more complex models and algorithms. Understanding the Bernoulli distribution is crucial for anyone working with data, particularly in fields like machine learning, data science, and analytics.

Bernoulli distribution is distribution where two possible outcome exists, probability of success “p” and probability of failure “q=1-p”. This outcome is known as Bernoulli trial.

bernoulli distribution

What is the Bernoulli Distribution? 

Most of the discrete distribution are related with Bernoulli trials. An experiment is called Bernoulli trial if it has two possible outcomes namely success and failure. The probability of success as well as the probability of failure of a Bernoulli trial remains the same  from trial to trial. If p is the probability of success then q=1-p is the probability of failure. The name of Bernoulli trial is given in honor of James Bernoulli, who wrote in Latin, about the year of 1700 an important work on probability titled “Ars Conjectandi”.

A discrete random variable X is said to have a Bernoulli distribution if its probability function is defined as,

    \[ f(x;p)=p^{x}q^{1-x};  for..x=0,1 \]

Here , p is the parameter of the distribution which satisfying 0<=p<=1 and p+q=1.

Mean= E[X]= p and, Variance=Var[X]= pq; If q<1 then the mean of the Bernoulli distribution is greater than the variance. that means,

E[X]>Var[X] ; when q<1.

Formally, the distribution represents a random variable that takes on one of two values:

  • 1 (Success): Represents a positive outcome, often denoted as “success” or “true.”
  • 0 (Failure): Represents a negative outcome, often denoted as “failure” or “false.”

The distribution is characterized by a single parameter, p, which represents the probability of success. Consequently, the probability of failure is 1 – p, often denoted as q.

bernoulli curve

Properties of Bernoulli distribution

  • An experiment with two possible outcomes regarding as success and failure, the probability of success and failure is always remain same and the probability is 1/2 (half).
  • Mean of Bernoulli Distribution is  p.
  • Variance of Bernoulli Distribution is  pq.
  • Moment generating function is  {displaystyle q+pe^{t}}.

To truly understand the Bernoulli distribution, let’s delve into details of its key properties:

  1. Probability Mass Function (PMF): The PMF defines the probability of each possible outcome. For a Bernoulli random variable X:
    • P(X = 1) = p (Probability of success)
    • P(X = 0) = 1 – p = q (Probability of failure)
    We can express this more compactly as: P(X = x) = p<sup>x</sup> * (1-p)<sup>(1-x)</sup>, where x can be either 0 or 1.
  2. Mean (Expected Value): The expected value of a Bernoulli distribution is simply the probability of success, p. This makes intuitive sense – if you repeat a Bernoulli trial many times, the average value will tend towards the probability of success.
    • E[X] = p
  3. Variance: The variance measures the spread or dispersion of the distribution around its mean. For the Bernoulli distribution, the variance is:
    • Var(X) = p * (1 – p) = p * q
    Notice that the variance is maximized when p = 0.5 (equal probability of success and failure) and minimized when p = 0 or p = 1 (deterministic outcomes).
  4. Standard Deviation: The standard deviation is the square root of the variance and represents the typical deviation of the outcomes from the mean.
    • σ = sqrt(p * (1 – p))
  5. Support: The support of the Bernoulli distribution is the set of possible values the random variable can take. In this case, it’s simply {0, 1}.

Applications of Bernoulli distribution

Bernoulli distribution is a form of Binomial distribution where N=1. If N>1 then it is converted into Binomial distribution. so the real life application of Bernoulli distribution is very rare because N is greater than 1 almost all the study. we consider the applications as following,

  • There are a lot of areas where the application of binomial theorem is inevitable, even in the modern world areas such as computing. In computing areas, binomial theorem has been very useful such a in distribution of IP addresses. With binomial theorem, the automatic distribution of IP addresses is not only possible but also the distribution of virtual IP addresses.
  • Another field that used Binomial Theorem as the important tools is the nation’s economic prediction. Economists used binomial theorem to count probabilities that depend on numerous and very distributed variables to predict the way the economy will behave in the next few years. To be able to come up with realistic predictions, binomial theorem is used in this field.
  • Binomial Theorem has also been a great use in the architecture industry in design of infrastructure. It allows engineers, to calculate the magnitudes of the projects and thus delivering accurate estimates of not only the costs but also time required to construct them. For contractors, it is a very important tool to help ensuring the costing projects is competent enough to deliver profits.

Some other Practical Applications

Despite its simplicity, the Bernoulli distribution finds applications in a surprisingly wide range of fields:

  • Coin Flips and Card Draws: As mentioned earlier, modeling a single coin flip is a classic application. Similarly, drawing a single card from a deck and determining if it’s a heart (success) or not (failure) can be modeled with a Bernoulli distribution.
  • Medical Studies: In clinical trials, the Bernoulli distribution can model whether a patient responds to a treatment (success) or not (failure). The probability of success, ‘p’, would represent the effectiveness of the treatment.
  • Quality Control: In manufacturing, the Bernoulli distribution can represent whether a product passes inspection (success) or fails (failure). This helps assess the overall quality of the production process.
  • Click-Through Rates (CTR): In online advertising, the Bernoulli distribution can model whether a user clicks on an ad (success) or not (failure). Analyzing ‘p’ (the CTR) allows advertisers to optimize their campaigns.
  • Machine Learning: It forms the basis of many probabilistic models and is used in algorithms like logistic regression, where the output is the probability of belonging to a particular class. It’s also crucial in reinforcement learning for modeling binary rewards or outcomes.
  • Risk Assessment: In finance, the Bernoulli distribution can be used to model the probability of a loan defaulting (failure) or being repaid (success).

The Bernoulli Distribution vs. Related Distributions

Understanding the relationships between different probability distributions is crucial for building robust statistical models. Here’s how the Bernoulli distribution compares to some related distributions:

  • Binomial Distribution: The Binomial distribution is essentially the sum of n independent Bernoulli trials. Instead of modeling a single event, it models the number of successes in a fixed number of trials. For example, flipping a coin 10 times and counting the number of heads. If X is a Binomial random variable with parameters n and p, then X represents the number of successes in n Bernoulli trials, each with success probability p. Therefore, the Bernoulli distribution is a special case of the Binomial distribution where n = 1.
  • Categorical Distribution: The Bernoulli distribution is also a special case of the categorical distribution, where the number of categories is 2. The Categorical distribution models the probability of an event occurring among multiple, mutually exclusive categories (more than two). Think of rolling a die – there are six possible outcomes, and the Categorical distribution models the probability of each outcome.
  • Multinomial Distribution: Similar to how the Binomial distribution generalizes the Bernoulli distribution, the Multinomial distribution generalizes the Categorical distribution. It models the number of occurrences of each category in a fixed number of trials.

Understanding the Bernoulli Distribution in Python

Python offers excellent libraries for working with probability distributions, including the Bernoulli distribution. Here’s how you can use the scipy.stats module to generate and analyze Bernoulli random variables:

#python_codeimport scipy.stats as stats  
import matplotlib.pyplot as plt

# Define the probability of success (p)
p = 0.7

# Create a Bernoulli random variable
bernoulli_rv = stats.bernoulli(p)

# Generate a random sample of 1000 values
sample = bernoulli_rv.rvs(1000)

# Calculate the mean and variance
mean = bernoulli_rv.mean()
variance = bernoulli_rv.var()

print(f"Mean: {mean}")
print(f"Variance: {variance}")

# Plot the probability mass function
x = [0, 1]
pmf = bernoulli_rv.pmf(x)

plt.bar(x, pmf)
plt.xlabel("Outcome (0 or 1)")
plt.ylabel("Probability")
plt.title(f"Bernoulli Distribution (p = {p})")
plt.xticks(x)
plt.show()

This code snippet demonstrates how to:

  1. Import the necessary libraries (scipy.stats) for probability distributions and (matplotlib.pyplot) for plotting.
  2. Define the probability of success, p.
  3. Create a Bernoulli random variable using stats.bernoulli(p).
  4. Generate a random sample of 1000 values using rvs(1000). This simulates 1000 independent Bernoulli trials.
  5. Calculate the mean and variance using mean() and var().
  6. Plot the probability mass function (PMF) using pmf(x). This visualizes the probability of each outcome (0 and 1).

Conclusion

The Bernoulli distribution, while simple in its definition, is a powerful tool for modeling binary events. Its understanding is fundamental for anyone venturing into the world of probability, statistics, and machine learning. By mastering the concepts discussed in this guide, you’ll be well-equipped to tackle more complex statistical models and analyze data more effectively. Remember to practice with Python and explore the various applications to solidify your understanding. Data Science Blog

Share This:

You cannot copy content of this page