Hypothesis Testing with Python

Table of Contents

Hypothesis testing is a data analysis method conducted to test one hypothesis (call null hypothesis, $H_0$) against another hypothesis (the alternative hypothesis, $H_1$).

We will use the random sample $X_1,\ldots, X_n$ (the data) to help decide between the hypothesis $H_0$ or $H_1$ with a fixed level of significance $\alpha$ which is the error to reject $H_0$ knowing that $H_0$ is correct.

$$\alpha = \mathbb{P}\left(H_0\mbox{ rejected } \mid H_0 \mbox{ true}\right)$$

Any Hypothesis testing procedure should derive the following


There are two types of Hypothesis testing procedures: parametric and non-parametric testing.

Parametric hypothesis testing

Testing the mean

The theory

We would like to test the following null hypothesis $$ H_0: \; \mu=\mu_0$$ versus $$ H_1: \; \mu\not=\mu_0$$ where $\mu_0$ is given from a random sample $X_1,\ldots,X_n$ assumed to be generated from a Normal distribution with mean $\mu$ and unknown variance $\sigma^2$.

The test statistics of the test is $$T=\sqrt{n}\,\displaystyle\frac{\overline{X}-\mu}{S}$$ where $\overline{X}$ is the sample mean and $S$ is the sample standard deviation.

Under $H_0$, $T$ is equal to $$T=\sqrt{n}\,\displaystyle\frac{\overline{X}-\mu_0}{S}$$ and follows a $t-$distribution with $n-1$ degrees of freedom.

Practice with Python

Let's generate a random sample of size 15, mean $\mu=-2$, and standard deviation $\sigma=2$ (then with variance $\sigma^2=4$).

We assume that the random sample x is generated from a Normal probability distribution with unknown mean and variance. We will test now the following hypothesis

$$ H_0: \; \mu=-2$$

versus $$ H_1: \; \mu\not=-2$$

The output above shows that $t_\mbox{obs}$ is

and the pvalue is

Since this p-value is greater than 0.05 (5%), we can conclude that we can accept the hypothesis $H_0$.

How can the test statistc and the p-value are computed?

Under $ H_0: \; \mu=-2$, the test statistics is then

The p-value is then computed as follows

And the p-value is

We can also perform the following hypothesis testing. It's called the lower-tail alternative test:

$$ H_0: \; \mu=-2$$

versus $$ H_1: \; \mu<-2$$

It's computed as follows

We can also perform the following hypothesis testing. It's called the upper-tail alternative test:

$$ H_0: \; \mu=-2$$

versus $$ H_1: \; \mu>-2$$

The p-value is computed as follows

Comparing two means

We have two types of hypothesis testing comparing two means:

Example: Compare the difference in blood pressure level for a group of patients before and after some drug treatment.

Example: Comparing between the salaries of a sample of men and women employees.

Paired sample t-test

We are going to the following hypothesis:

Example: we're comparing the grades between the Quiz 1 and the Quiz 2

We remove the missing values from the data

We have tested here the following hypothesis:

$$ H_0:\,\mbox{ the averages of the grades in Q1 and Q2 are equal}$$

versus $$ H_0:\,\mbox{ the averages of the grades in Q1 and Q2 are different}$$

We used a paired t-test and we can conclude that $H_0$ can't be rejected since the p-value is greater than 0.05 (5%) (the given level of significance).

We can test also if the mean $\mu_1$ of the grades of the Quiz 1 is higher than the mean $\mu_2$ of the grades of the Quiz 2:

$$ H_0: \, \mu_1\geq \mu_2$$

versus $$ H_1:\, \mu_1<\mu_2 $$

Conclusion: $H_0$ can't be rejected

A two-sample t-test

We will compare now the mean of the Quiz 1 between Section 1 and 2

We conclude that both sections have the same means of the grades in Quiz 1.

Testing the proportion

The theory

Assume that we would like to test the following Hypothesis: $$H_0\,: \,\,p=p_0,\;\; \mbox{ versus }H_1\,:\,\, p\not=p_0$$

where $p$ is a parameter proportion and $p_0$ is a given value of $p$. The hypothesis $H_0$ (null hypothesis) and $H_1$ will be tested from a given data on a random sample $X_1,\ldots, X_n$ with a Bernoulli distribution. The random variables $X_1,\ldots, X_n$ are binary variable with values $1$ and $0$ where each random variable $X_i$ takes the value 1 with probability $p$.

In most of the cases the data reported in this type of tests is the number $X$ of success among the $n$ trials. This later random variable $X$ is defined as follows: $$ X=X_1+\ldots+X_n=\sum_{k=1}^n X_k$$ and has a Binomial distribution with size $n$ and probability $p$.

The probability $p$ is often estimated using the random variable $\widehat{p}$:


When $n$ is large ($\geq 30$), the random variable $\widehat{p}$ follows approximatively a Normal distribution:

$$\widehat{p}\sim \mathcal{N}\left(p, \displaystyle\frac{p(1-p)}{n}\right).$$

Hence the random variable $Z=\sqrt{n}\displaystyle\frac{\widehat{p}-p}{\sqrt{\widehat{p}(1-\widehat{p})}}$ follows approximately a standard normal distribution.

The random variable $Z$ will be the test statistic of the the proportion hypothesis testing.

Example: According to the Washington Post, nearly 45% of all Americans are born with brown eyes, although their eyes don't necessarily stay brown. A random sample of 80 adults found 32 with brown eyes. Is there sufficient evidence at the .01 level to indicate that the proportion of brown-eyed adults differs from the proportion of Americans who are born with brown eyes?

In this example the sample size $n=80$, the random sample is $X_1,\ldots,X_n$ where each $X_i$ is representing an American adult, $X_i$ is $1$ if this adult has brown eyes and $0$ elsewhere. The random variable $X$ is the number of Americans with brown eyes among the 80 surveyed adults. In this example the observed of $X$ is 32.

We are testing here the following two hypothesis:

Under the hypothesis $H_0$ ($p=.45$), the random variable $Z$ is equal to: $$Z=\sqrt{80}\displaystyle\frac{\widehat{p}-.45}{\sqrt{.4\times .6}}$$

The observed value of $\widehat{p}$ from the data is: $$\widehat{p}_{\mbox{obs}}=\displaystyle\frac{32}{80}=.4$$

Then the observed value of $Z$ is $$Z_{\mbox{obs}}=\sqrt{80}\displaystyle\frac{.4-.45}{\sqrt{.4\times .6}}=-0.913$$

Since the alternative hypothesis is $H_1\,:\, p\not=.45$. The test is called two-sided test and the p-value of the test is computed as follows:

$$\begin{array}{rcl} \mbox{p-value}& = & \mathbb{P}\left(|Z|\geq |z_\mbox{obs}| \mid H_0 \mbox{ true }\right)\\ & = & \mathbb{P}\left(|Z|\geq 0.913 \mid p=.45\right) \\ & = & 2\times (1-F_Z(0.913)) \\ & = & 0.361 \end{array}$$

where $F_Z$ is a cumulative probability function (CDF) of a standard normal distribution.

Since $1\%=0.01$ is the level of significance of the test, and the p-value is greater than $1\%$, we can decide to not reject the null hypothesis $H_0$.

Practice with Python

Other types of alternative hypothesis

The alternative hypothesis $H_1$ can be also one the following:

In case of alternatine='less' we proceed as follows:

The pvalue is computed as follows

In case of alternatine='larger' we proceed as follows:

The pvalue is computed as follows

Example: Pizza-Hut claims that 90% of its order are delivered within 10 minutes of the time the order is placed. A sample of 100 order revealed that 82 were delivered within the promised time. At 10% significance level, can we conclude that at maximum 90% of the orders are delivered in less than 10 minutes?

We are testing in this example the following hypothesis testing:

We can also conclude that sample size is $n=100$ and the observed value for $X=92$.

The pvalue is computed as follows:

Example: Of a sample of 361 owners of retail service and business firms that had gone into bankruptcy, 105 reported having no professional assistance prior to opening the business. It is claimed that at most 25% of all members of this population had no professional assistance before opening the business. Test the aforementioned claim at $\alpha=0.01$

We're testing in this example the following hypothesis:

We can also conclude that sample size is 𝑛=361 and the observed value for $X=105$.

Comparing two proportions

Here we have two samples, defined by a proportion, and we want to see if we can make an assertion about whether the overall proportions of one of the underlying populations is greater than / less than / different to the other.

Example: we want to compare two different populations to see how their tests relate to each other:

Where $S_p$, called the pooled standard error, is computed as follows: $$ S_p=\sqrt{\widehat{p}(1-\widehat{p})\times \left(\displaystyle\frac{1}{n_A}+\displaystyle\frac{1}{n_B}\right)}$$ and $$\widehat{p}=\displaystyle\frac{n_A\widehat{p}_A+n_B\widehat{p}_B}{n_A+n_B}$$ is the pooled proportion.

Let $z_\mbox{obs}$ be the observed value of $Z$ under $H_0$. The pvalue associated to the two-sided alternative test can be computed as follows: $$\mbox{pvalue}=\mathbb{P}\left(|Z|\geq |z_\mbox{obs}|\mid H_0\mbox{ true }\right)$$ and the test statistic follows approximately the standard normal distribution.

In Python we proceed as follows:

We create Python arrays for the number of successes and for the sample sizes.

Let's see now how the test statistic and the pvalue are computed

Example: Let's consider the Titanic data

Importing the data

We encode then the variable Survived

We select Sex and Survived variables.

Probability of surviving by Gender

Comparing both probabilities

Exercise: Comparing the surviving probability between different other groups.