Econometrics A

(Econ 120A, Winter 2021)

Munpyung O

Department of Economics

University of California, San Diego

[email protected]

(Part 3)

* Please do not replicate or distribute without permission.

Probability Distributions

Munpyung O (UCSD) Econometrics A Part 3 2 / 47

Motivations

Why do we need these particular probability distributions?

Most of experiments can be represented by several known

distributions.

â‡’ If we know the characteristics (distributional parameters) of

these major distributions, we do not need to construct

probability distributions.

Munpyung O (UCSD) Econometrics A Part 3 3 / 47

Probability distributions of discrete random variables

Discrete probability distributions:

Binomial distribution

Munpyung O (UCSD) Econometrics A Part 3 4 / 47

Probability distributions of discrete random variables Binomial distribution

Bernoulli trial (or binomial trial)

a random experiment with exactly two possible outcomes, â€œsuccessâ€

and â€œfailureâ€, in which the probability of success is the same every

time the experiment is conducted.

Examples: success and failure, on and off, 0 and 1, win and lose, head

and tail, go and stop, up and down, employed and unemployed, positive

and negative, increase and decrease, male and female, accepted and

rejected …

Munpyung O (UCSD) Econometrics A Part 3 5 / 47

Probability distributions of discrete random variables Binomial distribution

Bernoulli distribution

the probability distribution of a discrete random variable which takes

the value 1 with success probability of p and the value 0 with failure

probability of q = 1 âˆ’ p.

The probability distribution (PMF) of the Bernoulli random variable:

P(X = x) = (

p if X = 1

1 âˆ’ p if X = 0

p = P(X = 1) is the only parameter for the Bernoulli distribution.

E(X) = p, Var(X) = p(1 âˆ’ p)

Munpyung O (UCSD) Econometrics A Part 3 6 / 47

Probability distributions of discrete random variables Binomial distribution

Bernoulli trial and Binomial distribution

Many problems in probability and statistics involve situations in

which an experiments with two possible outcomes is repeated

many times. (Repetition of the Bernoulli trial).

Properties of binomial experiment

1. The experiment consists of n identical trials.

2. Each trial is a Bernoulli trial that has only two possible

outcomes.

3. The probability of success in a trial is p and the probability of

failure is 1 âˆ’ p.

4. The trials are independent.

Munpyung O (UCSD) Econometrics A Part 3 7 / 47

Probability distributions of discrete random variables Binomial distribution

For X âˆ¼ Bin(n, p), the pmf (probability mass function) will be

denoted by

P(X = x) = b(x; n, p) =

n

x

p

x

(1 âˆ’ p)

nâˆ’x =

n!

x! (n âˆ’ x)! p

x

(1 âˆ’ p)

nâˆ’x

= [ number of outcomes with X = x ]

Ã— [ prob. of any particular outcome with X = x ]

where n : number of (Bernoulli) trials.

x : number of success in n trials (Binomial random variable).

n âˆ’ x : number of failures.

p : the probability of a success on each trial.

1 âˆ’ p : the probability of a failure on each trial.

Munpyung O (UCSD) Econometrics A Part 3 8 / 47

Probability distributions of discrete random variables Binomial distribution

Example: In a true-false exam, there are 4 questions. You have not

studied at all and decide to randomly guess the answers.

What is the probability that you get one question correct?

4 possible outcomes: C I I I, I C I I, I I C I, I I I C

Probability of getting any one question correct: 0.5

1

Â· 0.5

3

â‡’ P(X = 1) =

4

1

0.5

1

Â· 0.5

3

Munpyung O (UCSD) Econometrics A Part 3 9 / 4

Probability distributions of discrete random variables Binomial distribution

Define X as the number of correct answers you got from the

true-false exam. Then the probability distribution for X is

X P(X)

0

4

0

0.5

0

(1 âˆ’ 0.5)4âˆ’0 = 0.0625

1

4

1

0.5

1

(1 âˆ’ 0.5)4âˆ’1 = 0.2500

2

4

2

0.5

2

(1 âˆ’ 0.5)4âˆ’2 = 0.3750

3

4

3

0.5

3

(1 âˆ’ 0.5)4âˆ’3 = 0.2500

4

4

4

0.5

4

(1 âˆ’ 0.5)4âˆ’4 = 0.0625

Binomial distribution table

Munpyung O (UCSD) Econometrics A Part 3 10

Probability distributions of discrete random variables Binomial distribution

Example: In a multiple-choice exam, there are 5 questions and 4

choices for each question (a, b, c, d). You have not studied at all

and decide to randomly guess the answers. What is the probability

that you get 3 questions correct?

n = 5, x = 3, p = 0.25

5

3

0.253

(1 âˆ’ 0.25)5âˆ’3 = 0.08789063

Munpyung O (UCSD) Econometrics A Part 3 11 / 47

Probability distributions of discrete random variables Binomial distribution

What is the probability of getting 478 heads from flipping a coin

1000 times?

Now we can answer the question by

P(X = 478) =

1000

478

0.5

478 (1 âˆ’ 0.5)1000âˆ’478 = 0.00958781

Binomial distribution table?

How about E(X), and Var(X)?

Munpyung O (UCSD) Econometrics A Part 3 12 / 4

Probability distributions of discrete random variables Binomial distribution

The mean and variance of X

For a binomial experiment with n trials and probability p of success

on a given trial, the measures of center and spread are

Mean = ÂµX = E(X) = np, Variance = Ïƒ

2

X = Var(X) = np(1 âˆ’ p)

Example: In a multiple-choice exam, there are 5 questions and 4

choices for each question.

Âµx = E(X) = np = 5 Â· 0.25 = 1.25

Ïƒ

2

x = Var(X) = np(1 âˆ’ p) = 5 Â· 0.25 Â· (1 âˆ’ 0.25) = 0.9375

Ïƒx = Sd(X) = p

Var(X) = âˆš

0.9375 = 0.9682458

Munpyung O (UCSD) Econometrics A Part 3 13 / 47

Probability distributions of discrete random variables Binomial distribution

Example: According to a 2014 Gallup poll, 56% of uninsured

Americans who plan to get health insurance say they will do so

through a government health insurance exchange.

1) What is the probability that in a random sample of 10 people

exactly 6 plan to get health insurance through a government

health insurance exchange?

2) What is the probability that in a random sample of 1000 people

exactly 600 plan to get health insurance through a government

health insurance exchange?

Munpyung O (UCSD) Econometrics A Part 3 14 / 47

Probability distributions of discrete random variables Binomial distribution

3) What are the expected value and the variance of X?

4) What is the probability that less than 600 people plan to get

health insurance through a government health insurance

exchange?

Munpyung O (UCSD) Econometrics A Part 3 15 / 47

Probability distributions of discrete random variables PMF and CDF

Probability mass function (pmf):

A PMF is a function that gives the probability that a discrete

random variable is exactly equal to some value.

f (x) = P(X = x),

X

all x

f (x) = 1

Example: Tossing a coin twice

Munpyung O (UCSD) Econometrics A Part 3 16 / 47

Probability distributions of discrete random variables PMF and CDF

Cumulative distribution function (cdf):

The cumulative distribution function (cdf) is the probability that the

variable takes a value less than or equal to x.

F(x) = P(X â‰¤ x)

Properties of CDFs

1. F(x) is non decreasing; If y â‰¥ x, then F(y) â‰¥ F(x).

2. lim

x â†’ âˆ’âˆž

F(x) = 0 and lim

x â†’ âˆž

F(x) = 1

3. F(x) is right continuous. The function is continuous when a point is

approached from the right side.

Munpyung O (UCSD) Econometrics A Part 3 17 / 47

Probability distributions of discrete random variables PMF and CDF

Define X as the number of correct answers you got from the

true-false exam. Then the probability distribution for X is

X f (x) = P(X = x) F(x) = P(X â‰¤ x)

0 f (0) =

4

0

0.5

0

(1 âˆ’ 0.5)4âˆ’0 = 0.0625 F(0) = 0.0625

1 f (1) =

4

1

0.5

1

(1 âˆ’ 0.5)4âˆ’1 = 0.2500 F(1) = 0.3125

2 f (2) =

4

2

0.5

2

(1 âˆ’ 0.5)4âˆ’2 = 0.3750 F(2) = 0.6875

3 f (3) =

4

3

0.5

3

(1 âˆ’ 0.5)4âˆ’3 = 0.2500 F(3) = 0.9375

4 f (4) =

4

4

0.5

4

(1 âˆ’ 0.5)4âˆ’4 = 0.0625 F(4) = 1.0000

Munpyung O (UCSD) Econometrics A Part 3 18

Probability distributions of discrete random variables PMF and CDF

0 1 2 3 4

0.05 0.15 0.25 0.35

x

pmf

Munpyung O (UCSD) Econometrics A Part 3 19 / 47

Probability distributions of discrete random variables PMF and CDF

0 1 2 3 4

0.2 0.4 0.6 0.8 1.0

x

cdf

Munpyung O (UCSD) Econometrics A Part 3 20 / 47

Probability distributions of continuous random variables

Continuous probability distributions:

Normal (Gaussian) distribution

Munpyung O (UCSD) Econometrics A Part 3 21 / 47

Probability distributions of continuous random variables

Continuous Random Variables:

A random variable is continuous if it can take infinitely many values.

Probability distribution describes how the probabilities are

distributed over all possible values.

We cannot construct the probability distribution table like

discrete probability distribution.

A probability distribution for a continuous random variable X is

specified by a mathematical function denoted by f (x) which is

called the (probability) density function (pdf).

The graph of a density function is a smooth curve.

Munpyung O (UCSD) Econometrics A Part 3 22 / 47

Probability distributions of continuous random variables

Definition (Probability Density Function (pdf))

f (x) = d

dx F(x) where Z âˆž

âˆ’âˆž

f (x) dx = 1

Definition (Cumulative Distribution Function (cdf))

F(x) = P(X â‰¤ x) = Z x

âˆ’âˆž

f (u) du

F(x) is the area under the density curve to the left of X = x.

Munpyung O (UCSD) Econometrics A Part 3 23 / 47

Probability distributions of continuous random variables

Example: Normal distribution

pdf : f (x) = 1

Ïƒ

âˆš

2Ï€

e

âˆ’

(xâˆ’Âµ)

2

2Ïƒ2

cdf : F(x) = P(X â‰¤ x) = Z x

âˆ’âˆž

f (u) du =

Z x

âˆ’âˆž

1

Ïƒ

âˆš

2Ï€

e

âˆ’

(uâˆ’Âµ)

2

2Ïƒ2 du

Munpyung O (UCSD) Econometrics A Part 3 24 / 47

Probability distributions of continuous random variables

Properties of continuous probability distribution

1. P(X = x) = 0 for all x

2. Only meaningful probability is defined in some interval.

P(a â‰¤ X â‰¤ b) = Z b

a

f (x)dx

Munpyung O (UCSD) Econometrics A Part 3 25 / 47

Probability distributions of continuous random variables

3. P(a â‰¤ X â‰¤ b) = P(a < X < b) = P(a â‰¤ X < b) = P(a < X â‰¤ b)

(from the property 1 and 2).

4. The area under the curve is equal to 1, Z âˆž

âˆ’âˆž

f (x)dx = 1

Munpyung O (UCSD) Econometrics A Part 3 26 / 47

Probability distributions of continuous random variables

Method of Probability Calculation

The probability that a continuous random variable X lies between a

lower limit a and an upper limit b is

P(a < X < b) = P(X < b) âˆ’ P(X < a)

= F(b) âˆ’ F(a)

=

Z b

a

f (x) dx

Munpyung O (UCSD) Econometrics A Part 3 27 / 47

Probability distributions of continuous random variables

Definition (Moments of a continuous random variable)

Expected value

Âµx = E(X) = Z âˆž

âˆ’âˆž

x Â· f (x) dx

Variance

Ïƒ

2

x = Var(X) = Z âˆž

âˆ’âˆž

(x âˆ’ Âµx )

2

Â· f (x) dx

Munpyung O (UCSD) Econometrics A Part 3 28 / 47

Probability distributions of continuous random variables Normal (Gaussian) distribution

Normal (Gaussian) distribution

1. Bell shaped, unimodal, symmetric (Mean = Median = Mode)

2. The curve is continuous and does not touch X axis

(possible value of X is from âˆ’âˆž to âˆž.)

3. The location and shape of the normal curve determined entirely

by two distributional parameters, mean and standard deviation.

X âˆ¼ N(Âµx , Ïƒ2

x

)

The most important probability distribution! Why?

Munpyung O (UCSD) Econometrics A Part 3 29 / 47

Probability distributions of continuous random variables Normal (Gaussian) distribution

Probability calculation for Normal distribution

1. Computing definite integrals

2. Empirical rule: 68 – 95 – 99.7% rule

3. Standardization of normal distribution

â‡’ standard normal distribution table

Munpyung O (UCSD) Econometrics A Part 3 30 / 47

Probability distributions of continuous random variables Normal (Gaussian) distribution

1. Computing definite integrals

Formula for Normal distribution:

f (x; Âµ, Ïƒ) = 1

âˆš

2Ï€Ïƒ2

e

âˆ’ 1

2

[

xâˆ’Âµ

Ïƒ ]

2

where Ï€ = 3.14, e = 2.718

To find P(a < X < b), we need to find the area under the normal curve.

P(a < X < b) = P(X < b) âˆ’ P(X < a)

= F(b) âˆ’ F(a)

=

Z b

a

1

âˆš

2Ï€Ïƒ2

e

âˆ’ 1

2

[

xâˆ’Âµ

Ïƒ ]

2

dx

Munpyung O (UCSD) Econometrics A Part 3 31 / 47

Probability distributions of continuous random variables Normal (Gaussian) distribution

2. Empirical rule: 68 – 95 – 99.7% rule

Munpyung O (UCSD) Econometrics A Part 3 32 / 47

Probability distributions of continuous random variables Standard normal distribution

3. The Standard Normal Distribution

Normal distribution with Âµ = 0 and Ïƒ

2 = 1. â‡’ Z âˆ¼ N(0, 1)

Properties of Standard Normal Distribution

Known distributional parameters:

Mean = 0; Standard deviation = 1

Bell-shaped, unimodal, symmetric about Z = 0

(Mean = Median = Mode = 0)

– Values of Z to the left of center are negative

– Values of Z to the right of center are positive

– Areas on both sides of center equal 0.5

The curve is continuous and does not touch Z axis

(possible value of Z is from âˆ’âˆž to âˆž)

Munpyung O (UCSD) Econometrics A Part 3 33 / 47

Probability distributions of continuous random variables Standard normal distribution

Standardization or Z-transformation:

Since each normally distributed variable has its own mean and

standard deviation, the shape and location of these curves will

vary.

To simplify the calculation of the area under the curve, we

standardize each value of X by expressing it as a Z-score, the

number of standard deviations away from the mean Âµ.

Z-score, z =

x âˆ’ Âµ

Ïƒ

Munpyung O (UCSD) Econometrics A Part 3 34 / 47

Probability distributions of continuous random variables Standard normal distribution

Standardization (Z-transformation)

Normal distribution Standard normal distribution

X âˆ¼ N(Âµ, Ïƒ2

) Z âˆ¼ N(0, 1)

P(a â‰¤ X â‰¤ b) P

aâˆ’Âµ

Ïƒ < Z =

xâˆ’Âµ

Ïƒ <

bâˆ’Âµ

Ïƒ

Z b

a

1

âˆš

2Ï€Ïƒ2

e

âˆ’ 1

2

[

xâˆ’Âµ

Ïƒ

]

2

dx Use std. normal table

E(Z) = 0 and Var(Z) = 1 since Z =

Xâˆ’Âµ

Ïƒ

Munpyung O (UCSD) Econometrics A Part 3 35 / 47

Probability distributions of continuous random variables Standard normal distribution

Equality of nonstandard and standard normal curve areas

Munpyung O (UCSD) Econometrics A Part 3 36 / 47

Probability distributions of continuous random variables Standard normal distribution

Example 1: Finding areas under the std. normal distribution curve

a) P(Z < 1.83)

b) P(0 < Z < 1.83)

c) P(âˆ’1.45 < Z < 0)

d) P(Z > 1.25)

e) P(0.46 < Z < 1.75)

Munpyung O (UCSD) Econometrics A Part 3 37 / 47

Probability distributions of continuous random variables Standard normal distribution

Example 2: Finding areas under the normal distribution curve

a) when X âˆ¼ N(10, 4

2

), P(X < 5) =?

b) when X âˆ¼ N(26, 9

2

), P(X > 35) =?

c) when X âˆ¼ N(âˆ’6, 2

2

), P(âˆ’9 < X < âˆ’2) =?

Munpyung O (UCSD) Econometrics A Part 3 38 / 47

Probability distributions of continuous random variables Standard normal distribution

Example 3: Finding z-value for a specific area

a) Find a z0 such that P(Z < z0) = 0.975

b) Find a z0 such that P(âˆ’z0 < Z < z0) = 0.7458

c) Find the value of a positive Z that has area 0.475 between 0

and z0.

d) Find the value of Z that has area 0.051 to its right.

e) when X âˆ¼ N(10, 4), find the value of X that has area 0.025

to its left.

Munpyung O (UCSD) Econometrics A Part 3 39 / 47

Probability distributions of continuous random variables Applications of the Normal Distribution

Applications of the Normal Distribution

Example 4: A survey found that women spend on average $146 on

beauty products during the summer months. Assume the standard

deviation is $28.

Find the percentage of women who spend less than $160.00.

Assume the variable is normally distributed.

Example 5: The weights of packages of ground beef are normally

distributed with mean 1 pound and standard deviation 0.1. What is

the probability that a randomly selected package weighs between 0.8

and 0.85 pounds?

Munpyung O (UCSD) Econometrics A Part 3 40 / 47

Probability distributions of continuous random variables Applications of the Normal Distribution

Example 6: SAT scores are approximated well by a normal model,

N(1500, 3002

). Shannon is a randomly selected SAT taker, and

nothing is known about Shannonâ€™s SAT aptitude. What is the

probability Shannon scores at least 1,630 on her SATs?

Munpyung O (UCSD) Econometrics A Part 3 41 / 47

Probability distributions of continuous random variables Applications of the Normal Distribution

Normal

x

900 1500 2100

1630 â‡’

Standard Normal

x

âˆ’2 0 2

0.43 X Z

Z-transformation

Munpyung O (UCSD) Econometrics A Part 3 42 / 47

Probability distributions of continuous random variables Normal Approximation to the Binomial Distribution

The Normal approximation to the Binomial distribution

The binomial distribution table in pages 813 and 821 shows the probability

of X only up to n = 20 and only for p = 0.01, Â· Â· Â· , 0.2, Â· Â· Â· , 0.9.

How can we get the probability of X when n > 20 and for other p values?

When np > 5 and n(1 âˆ’ p) > 5, areas under the normal curve with

mean Âµ = np and standard deviation Ïƒ =

p

np(1 âˆ’ p) can be used

to approximate binomial probabilities (An application of the CLT).

Bin(n, p) â‰ˆ N(np, np(1 âˆ’ p) )

Munpyung O (UCSD) Econometrics A Part 3 43 / 47

Probability distributions of continuous random variables Normal Approximation to the Binomial Distribution

Munpyung O (UCSD) Econometrics A Part 3 44 / 47

Probability distributions of continuous random variables Normal Approximation to the Binomial Distribution

For Normal approximation of Binomial, we need the correction for

continuity since we approximate discrete probability distribution by

continuous probability distribution.

Binomial Normal

P(X = a) P(a âˆ’ 0.5 < X < a + 0.5)

P(X â‰¥ a) P(X > a âˆ’ 0.5)

P(X > a) P(X > a + 0.5)

P(X â‰¤ a) P(X < a + 0.5)

P(X < a) P(X < a âˆ’ 0.5)

P(a < X < b) P(a + 0.5 < X < b âˆ’ 0.5)

Munpyung O (UCSD) Econometrics A Part 3 45 / 47

Probability distributions of continuous random variables Normal Approximation to the Binomial Distribution

Example 1 : Reading While Driving

A magazine reported that 6% of American drivers read the

newspaper while driving. If 300 drivers are selected at random, find

the probability that exactly 25 say they read the newspaper while

driving.

Munpyung O (UCSD) Econometrics A Part 3 46 / 47

Probability distributions of continuous random variables Normal Approximation to the Binomial Distribution

Example 2: According to a 2014 Gallup poll, 56% of uninsured

Americans who plan to get health insurance say they will do so

through a government health insurance exchange.

1) What is the probability that in a random sample of 10 people

exactly 6 plan to get health insurance through a government

health insurance exchange?

2) What is the probability that in a random sample of 100 people

exactly 60 plan to get health insurance through a government

health insurance exchange?

Munpyung O (UCSD) Econometrics A Part 3 47 / 47