Econometrics A
(Econ 120A, Winter 2021)
Munpyung O
Department of Economics
University of California, San Diego
[email protected]
(Part 3)
* Please do not replicate or distribute without permission.
Probability Distributions
Munpyung O (UCSD) Econometrics A Part 3 2 / 47
Motivations
Why do we need these particular probability distributions?
Most of experiments can be represented by several known
distributions.
⇒ If we know the characteristics (distributional parameters) of
these major distributions, we do not need to construct
probability distributions.
Munpyung O (UCSD) Econometrics A Part 3 3 / 47
Probability distributions of discrete random variables
Discrete probability distributions:
Binomial distribution
Munpyung O (UCSD) Econometrics A Part 3 4 / 47
Probability distributions of discrete random variables Binomial distribution
Bernoulli trial (or binomial trial)
a random experiment with exactly two possible outcomes, “successâ€
and “failureâ€, in which the probability of success is the same every
time the experiment is conducted.
Examples: success and failure, on and off, 0 and 1, win and lose, head
and tail, go and stop, up and down, employed and unemployed, positive
and negative, increase and decrease, male and female, accepted and
rejected …
Munpyung O (UCSD) Econometrics A Part 3 5 / 47
Probability distributions of discrete random variables Binomial distribution
Bernoulli distribution
the probability distribution of a discrete random variable which takes
the value 1 with success probability of p and the value 0 with failure
probability of q = 1 − p.
The probability distribution (PMF) of the Bernoulli random variable:
P(X = x) = (
p if X = 1
1 − p if X = 0
p = P(X = 1) is the only parameter for the Bernoulli distribution.
E(X) = p, Var(X) = p(1 − p)
Munpyung O (UCSD) Econometrics A Part 3 6 / 47
Probability distributions of discrete random variables Binomial distribution
Bernoulli trial and Binomial distribution
Many problems in probability and statistics involve situations in
which an experiments with two possible outcomes is repeated
many times. (Repetition of the Bernoulli trial).
Properties of binomial experiment
1. The experiment consists of n identical trials.
2. Each trial is a Bernoulli trial that has only two possible
outcomes.
3. The probability of success in a trial is p and the probability of
failure is 1 − p.
4. The trials are independent.
Munpyung O (UCSD) Econometrics A Part 3 7 / 47
Probability distributions of discrete random variables Binomial distribution
For X ∼ Bin(n, p), the pmf (probability mass function) will be
denoted by
P(X = x) = b(x; n, p) =
n
x
p
x
(1 − p)
n−x =
n!
x! (n − x)! p
x
(1 − p)
n−x
= [ number of outcomes with X = x ]
× [ prob. of any particular outcome with X = x ]
where n : number of (Bernoulli) trials.
x : number of success in n trials (Binomial random variable).
n − x : number of failures.
p : the probability of a success on each trial.
1 − p : the probability of a failure on each trial.
Munpyung O (UCSD) Econometrics A Part 3 8 / 47
Probability distributions of discrete random variables Binomial distribution
Example: In a true-false exam, there are 4 questions. You have not
studied at all and decide to randomly guess the answers.
What is the probability that you get one question correct?
4 possible outcomes: C I I I, I C I I, I I C I, I I I C
Probability of getting any one question correct: 0.5
1
· 0.5
3
⇒ P(X = 1) =
4
1
0.5
1
· 0.5
3
Munpyung O (UCSD) Econometrics A Part 3 9 / 4
Probability distributions of discrete random variables Binomial distribution
Define X as the number of correct answers you got from the
true-false exam. Then the probability distribution for X is
X P(X)
0
4
0
0.5
0
(1 − 0.5)4−0 = 0.0625
1
4
1
0.5
1
(1 − 0.5)4−1 = 0.2500
2
4
2
0.5
2
(1 − 0.5)4−2 = 0.3750
3
4
3
0.5
3
(1 − 0.5)4−3 = 0.2500
4
4
4
0.5
4
(1 − 0.5)4−4 = 0.0625
Binomial distribution table
Munpyung O (UCSD) Econometrics A Part 3 10
Probability distributions of discrete random variables Binomial distribution
Example: In a multiple-choice exam, there are 5 questions and 4
choices for each question (a, b, c, d). You have not studied at all
and decide to randomly guess the answers. What is the probability
that you get 3 questions correct?
n = 5, x = 3, p = 0.25
5
3
0.253
(1 − 0.25)5−3 = 0.08789063
Munpyung O (UCSD) Econometrics A Part 3 11 / 47
Probability distributions of discrete random variables Binomial distribution
What is the probability of getting 478 heads from flipping a coin
1000 times?
Now we can answer the question by
P(X = 478) =
1000
478
0.5
478 (1 − 0.5)1000−478 = 0.00958781
Binomial distribution table?
How about E(X), and Var(X)?
Munpyung O (UCSD) Econometrics A Part 3 12 / 4
Probability distributions of discrete random variables Binomial distribution
The mean and variance of X
For a binomial experiment with n trials and probability p of success
on a given trial, the measures of center and spread are
Mean = µX = E(X) = np, Variance = σ
2
X = Var(X) = np(1 − p)
Example: In a multiple-choice exam, there are 5 questions and 4
choices for each question.
µx = E(X) = np = 5 · 0.25 = 1.25
σ
2
x = Var(X) = np(1 − p) = 5 · 0.25 · (1 − 0.25) = 0.9375
σx = Sd(X) = p
Var(X) = √
0.9375 = 0.9682458
Munpyung O (UCSD) Econometrics A Part 3 13 / 47
Probability distributions of discrete random variables Binomial distribution
Example: According to a 2014 Gallup poll, 56% of uninsured
Americans who plan to get health insurance say they will do so
through a government health insurance exchange.
1) What is the probability that in a random sample of 10 people
exactly 6 plan to get health insurance through a government
health insurance exchange?
2) What is the probability that in a random sample of 1000 people
exactly 600 plan to get health insurance through a government
health insurance exchange?
Munpyung O (UCSD) Econometrics A Part 3 14 / 47
Probability distributions of discrete random variables Binomial distribution
3) What are the expected value and the variance of X?
4) What is the probability that less than 600 people plan to get
health insurance through a government health insurance
exchange?
Munpyung O (UCSD) Econometrics A Part 3 15 / 47
Probability distributions of discrete random variables PMF and CDF
Probability mass function (pmf):
A PMF is a function that gives the probability that a discrete
random variable is exactly equal to some value.
f (x) = P(X = x),
X
all x
f (x) = 1
Example: Tossing a coin twice
Munpyung O (UCSD) Econometrics A Part 3 16 / 47
Probability distributions of discrete random variables PMF and CDF
Cumulative distribution function (cdf):
The cumulative distribution function (cdf) is the probability that the
variable takes a value less than or equal to x.
F(x) = P(X ≤ x)
Properties of CDFs
1. F(x) is non decreasing; If y ≥ x, then F(y) ≥ F(x).
2. lim
x → −∞
F(x) = 0 and lim
x → ∞
F(x) = 1
3. F(x) is right continuous. The function is continuous when a point is
approached from the right side.
Munpyung O (UCSD) Econometrics A Part 3 17 / 47
Probability distributions of discrete random variables PMF and CDF
Define X as the number of correct answers you got from the
true-false exam. Then the probability distribution for X is
X f (x) = P(X = x) F(x) = P(X ≤ x)
0 f (0) =
4
0
0.5
0
(1 − 0.5)4−0 = 0.0625 F(0) = 0.0625
1 f (1) =
4
1
0.5
1
(1 − 0.5)4−1 = 0.2500 F(1) = 0.3125
2 f (2) =
4
2
0.5
2
(1 − 0.5)4−2 = 0.3750 F(2) = 0.6875
3 f (3) =
4
3
0.5
3
(1 − 0.5)4−3 = 0.2500 F(3) = 0.9375
4 f (4) =
4
4
0.5
4
(1 − 0.5)4−4 = 0.0625 F(4) = 1.0000
Munpyung O (UCSD) Econometrics A Part 3 18
Probability distributions of discrete random variables PMF and CDF
0 1 2 3 4
0.05 0.15 0.25 0.35
x
pmf
Munpyung O (UCSD) Econometrics A Part 3 19 / 47
Probability distributions of discrete random variables PMF and CDF
0 1 2 3 4
0.2 0.4 0.6 0.8 1.0
x
cdf
Munpyung O (UCSD) Econometrics A Part 3 20 / 47
Probability distributions of continuous random variables
Continuous probability distributions:
Normal (Gaussian) distribution
Munpyung O (UCSD) Econometrics A Part 3 21 / 47
Probability distributions of continuous random variables
Continuous Random Variables:
A random variable is continuous if it can take infinitely many values.
Probability distribution describes how the probabilities are
distributed over all possible values.
We cannot construct the probability distribution table like
discrete probability distribution.
A probability distribution for a continuous random variable X is
specified by a mathematical function denoted by f (x) which is
called the (probability) density function (pdf).
The graph of a density function is a smooth curve.
Munpyung O (UCSD) Econometrics A Part 3 22 / 47
Probability distributions of continuous random variables
Definition (Probability Density Function (pdf))
f (x) = d
dx F(x) where Z ∞
−∞
f (x) dx = 1
Definition (Cumulative Distribution Function (cdf))
F(x) = P(X ≤ x) = Z x
−∞
f (u) du
F(x) is the area under the density curve to the left of X = x.
Munpyung O (UCSD) Econometrics A Part 3 23 / 47
Probability distributions of continuous random variables
Example: Normal distribution
pdf : f (x) = 1
σ
√
2Ï€
e
−
(x−µ)
2
2σ2
cdf : F(x) = P(X ≤ x) = Z x
−∞
f (u) du =
Z x
−∞
1
σ
√
2Ï€
e
−
(u−µ)
2
2σ2 du
Munpyung O (UCSD) Econometrics A Part 3 24 / 47
Probability distributions of continuous random variables
Properties of continuous probability distribution
1. P(X = x) = 0 for all x
2. Only meaningful probability is defined in some interval.
P(a ≤ X ≤ b) = Z b
a
f (x)dx
Munpyung O (UCSD) Econometrics A Part 3 25 / 47
Probability distributions of continuous random variables
3. P(a ≤ X ≤ b) = P(a < X < b) = P(a ≤ X < b) = P(a < X ≤ b)
(from the property 1 and 2).
4. The area under the curve is equal to 1, Z ∞
−∞
f (x)dx = 1
Munpyung O (UCSD) Econometrics A Part 3 26 / 47
Probability distributions of continuous random variables
Method of Probability Calculation
The probability that a continuous random variable X lies between a
lower limit a and an upper limit b is
P(a < X < b) = P(X < b) − P(X < a)
= F(b) − F(a)
=
Z b
a
f (x) dx
Munpyung O (UCSD) Econometrics A Part 3 27 / 47
Probability distributions of continuous random variables
Definition (Moments of a continuous random variable)
Expected value
µx = E(X) = Z ∞
−∞
x · f (x) dx
Variance
σ
2
x = Var(X) = Z ∞
−∞
(x − µx )
2
· f (x) dx
Munpyung O (UCSD) Econometrics A Part 3 28 / 47
Probability distributions of continuous random variables Normal (Gaussian) distribution
Normal (Gaussian) distribution
1. Bell shaped, unimodal, symmetric (Mean = Median = Mode)
2. The curve is continuous and does not touch X axis
(possible value of X is from −∞ to ∞.)
3. The location and shape of the normal curve determined entirely
by two distributional parameters, mean and standard deviation.
X ∼ N(µx , σ2
x
)
The most important probability distribution! Why?
Munpyung O (UCSD) Econometrics A Part 3 29 / 47
Probability distributions of continuous random variables Normal (Gaussian) distribution
Probability calculation for Normal distribution
1. Computing definite integrals
2. Empirical rule: 68 – 95 – 99.7% rule
3. Standardization of normal distribution
⇒ standard normal distribution table
Munpyung O (UCSD) Econometrics A Part 3 30 / 47
Probability distributions of continuous random variables Normal (Gaussian) distribution
1. Computing definite integrals
Formula for Normal distribution:
f (x; µ, σ) = 1
√
2πσ2
e
− 1
2
[
x−µ
σ ]
2
where π = 3.14, e = 2.718
To find P(a < X < b), we need to find the area under the normal curve.
P(a < X < b) = P(X < b) − P(X < a)
= F(b) − F(a)
=
Z b
a
1
√
2πσ2
e
− 1
2
[
x−µ
σ ]
2
dx
Munpyung O (UCSD) Econometrics A Part 3 31 / 47
Probability distributions of continuous random variables Normal (Gaussian) distribution
2. Empirical rule: 68 – 95 – 99.7% rule
Munpyung O (UCSD) Econometrics A Part 3 32 / 47
Probability distributions of continuous random variables Standard normal distribution
3. The Standard Normal Distribution
Normal distribution with µ = 0 and σ
2 = 1. ⇒ Z ∼ N(0, 1)
Properties of Standard Normal Distribution
Known distributional parameters:
Mean = 0; Standard deviation = 1
Bell-shaped, unimodal, symmetric about Z = 0
(Mean = Median = Mode = 0)
– Values of Z to the left of center are negative
– Values of Z to the right of center are positive
– Areas on both sides of center equal 0.5
The curve is continuous and does not touch Z axis
(possible value of Z is from −∞ to ∞)
Munpyung O (UCSD) Econometrics A Part 3 33 / 47
Probability distributions of continuous random variables Standard normal distribution
Standardization or Z-transformation:
Since each normally distributed variable has its own mean and
standard deviation, the shape and location of these curves will
vary.
To simplify the calculation of the area under the curve, we
standardize each value of X by expressing it as a Z-score, the
number of standard deviations away from the mean µ.
Z-score, z =
x − µ
σ
Munpyung O (UCSD) Econometrics A Part 3 34 / 47
Probability distributions of continuous random variables Standard normal distribution
Standardization (Z-transformation)
Normal distribution Standard normal distribution
X ∼ N(µ, σ2
) Z ∼ N(0, 1)
P(a ≤ X ≤ b) P
a−µ
σ < Z =
x−µ
σ <
b−µ
σ
Z b
a
1
√
2πσ2
e
− 1
2
[
x−µ
σ
]
2
dx Use std. normal table
E(Z) = 0 and Var(Z) = 1 since Z =
X−µ
σ
Munpyung O (UCSD) Econometrics A Part 3 35 / 47
Probability distributions of continuous random variables Standard normal distribution
Equality of nonstandard and standard normal curve areas
Munpyung O (UCSD) Econometrics A Part 3 36 / 47
Probability distributions of continuous random variables Standard normal distribution
Example 1: Finding areas under the std. normal distribution curve
a) P(Z < 1.83)
b) P(0 < Z < 1.83)
c) P(−1.45 < Z < 0)
d) P(Z > 1.25)
e) P(0.46 < Z < 1.75)
Munpyung O (UCSD) Econometrics A Part 3 37 / 47
Probability distributions of continuous random variables Standard normal distribution
Example 2: Finding areas under the normal distribution curve
a) when X ∼ N(10, 4
2
), P(X < 5) =?
b) when X ∼ N(26, 9
2
), P(X > 35) =?
c) when X ∼ N(−6, 2
2
), P(−9 < X < −2) =?
Munpyung O (UCSD) Econometrics A Part 3 38 / 47
Probability distributions of continuous random variables Standard normal distribution
Example 3: Finding z-value for a specific area
a) Find a z0 such that P(Z < z0) = 0.975
b) Find a z0 such that P(−z0 < Z < z0) = 0.7458
c) Find the value of a positive Z that has area 0.475 between 0
and z0.
d) Find the value of Z that has area 0.051 to its right.
e) when X ∼ N(10, 4), find the value of X that has area 0.025
to its left.
Munpyung O (UCSD) Econometrics A Part 3 39 / 47
Probability distributions of continuous random variables Applications of the Normal Distribution
Applications of the Normal Distribution
Example 4: A survey found that women spend on average $146 on
beauty products during the summer months. Assume the standard
deviation is $28.
Find the percentage of women who spend less than $160.00.
Assume the variable is normally distributed.
Example 5: The weights of packages of ground beef are normally
distributed with mean 1 pound and standard deviation 0.1. What is
the probability that a randomly selected package weighs between 0.8
and 0.85 pounds?
Munpyung O (UCSD) Econometrics A Part 3 40 / 47
Probability distributions of continuous random variables Applications of the Normal Distribution
Example 6: SAT scores are approximated well by a normal model,
N(1500, 3002
). Shannon is a randomly selected SAT taker, and
nothing is known about Shannon’s SAT aptitude. What is the
probability Shannon scores at least 1,630 on her SATs?
Munpyung O (UCSD) Econometrics A Part 3 41 / 47
Probability distributions of continuous random variables Applications of the Normal Distribution
Normal
x
900 1500 2100
1630 ⇒
Standard Normal
x
−2 0 2
0.43 X Z
Z-transformation
Munpyung O (UCSD) Econometrics A Part 3 42 / 47
Probability distributions of continuous random variables Normal Approximation to the Binomial Distribution
The Normal approximation to the Binomial distribution
The binomial distribution table in pages 813 and 821 shows the probability
of X only up to n = 20 and only for p = 0.01, · · · , 0.2, · · · , 0.9.
How can we get the probability of X when n > 20 and for other p values?
When np > 5 and n(1 − p) > 5, areas under the normal curve with
mean µ = np and standard deviation σ =
p
np(1 − p) can be used
to approximate binomial probabilities (An application of the CLT).
Bin(n, p) ≈ N(np, np(1 − p) )
Munpyung O (UCSD) Econometrics A Part 3 43 / 47
Probability distributions of continuous random variables Normal Approximation to the Binomial Distribution
Munpyung O (UCSD) Econometrics A Part 3 44 / 47
Probability distributions of continuous random variables Normal Approximation to the Binomial Distribution
For Normal approximation of Binomial, we need the correction for
continuity since we approximate discrete probability distribution by
continuous probability distribution.
Binomial Normal
P(X = a) P(a − 0.5 < X < a + 0.5)
P(X ≥ a) P(X > a − 0.5)
P(X > a) P(X > a + 0.5)
P(X ≤ a) P(X < a + 0.5)
P(X < a) P(X < a − 0.5)
P(a < X < b) P(a + 0.5 < X < b − 0.5)
Munpyung O (UCSD) Econometrics A Part 3 45 / 47
Probability distributions of continuous random variables Normal Approximation to the Binomial Distribution
Example 1 : Reading While Driving
A magazine reported that 6% of American drivers read the
newspaper while driving. If 300 drivers are selected at random, find
the probability that exactly 25 say they read the newspaper while
driving.
Munpyung O (UCSD) Econometrics A Part 3 46 / 47
Probability distributions of continuous random variables Normal Approximation to the Binomial Distribution
Example 2: According to a 2014 Gallup poll, 56% of uninsured
Americans who plan to get health insurance say they will do so
through a government health insurance exchange.
1) What is the probability that in a random sample of 10 people
exactly 6 plan to get health insurance through a government
health insurance exchange?
2) What is the probability that in a random sample of 100 people
exactly 60 plan to get health insurance through a government
health insurance exchange?
Munpyung O (UCSD) Econometrics A Part 3 47 / 47