# Hypothesis Testing with Python¶

Hypothesis testing is a data analysis method conducted to test one hypothesis (call null hypothesis, $H_0$) against another hypothesis (the alternative hypothesis, $H_1$).

We will use the random sample $X_1,\ldots, X_n$ (the data) to help decide between the hypothesis $H_0$ or $H_1$ with a fixed level of significance $\alpha$ which is the error to reject $H_0$ knowing that $H_0$ is correct.

$$\alpha = \mathbb{P}\left(H_0\mbox{ rejected } \mid H_0 \mbox{ true}\right)$$

Any Hypothesis testing procedure should derive the following

• The test statistics $T$: It's a random variable computed from the random sample and where the probability distribution of $T$ is known when $H_0$ is true.

• The observed statistics $t_\mbox{obs}$ from $T$ computed from the observed random sample $x_1,\ldots,x_n$:

$$t_\mbox{obs}=T(x_1,\ldots,x_n)$$
• The \textbf{p-value}: It's the largest probability to reject $H_0$ assuming that $H_0$ correct. A smaller p-value means stronger evidence in favor of the alternative hypothesis.

There are two types of Hypothesis testing procedures: parametric and non-parametric testing.

• Parametric hypothesis testing is a testing procedure used when the hypothesis is based on comparing a population parameter to given values.

• Non-parametric hypothesis testing is used when the hypothesis is not based on a population parameter. It's about testing one assumption against its opposite.

## Parametric hypothesis testing¶

• Parametric tests are more powerful and reliable than non-parametric tests.

• The hypothesis is developed on the parameters of the population distribution.

• We will see in this chapter how to perform Hypothesis testing in the following cases:

• The mean: comparing to a given value, comparing between two means
• The proportion: comparing to a given value, comparing between two means

### Testing the mean¶

#### The theory¶

We would like to test the following null hypothesis $$H_0: \; \mu=\mu_0$$ versus $$H_1: \; \mu\not=\mu_0$$ where $\mu_0$ is given from a random sample $X_1,\ldots,X_n$ assumed to be generated from a Normal distribution with mean $\mu$ and unknown variance $\sigma^2$.

The test statistics of the test is $$T=\sqrt{n}\,\displaystyle\frac{\overline{X}-\mu}{S}$$ where $\overline{X}$ is the sample mean and $S$ is the sample standard deviation.

Under $H_0$, $T$ is equal to $$T=\sqrt{n}\,\displaystyle\frac{\overline{X}-\mu_0}{S}$$ and follows a $t-$distribution with $n-1$ degrees of freedom.

#### Practice with Python¶

Let's generate a random sample of size 15, mean $\mu=-2$, and standard deviation $\sigma=2$ (then with variance $\sigma^2=4$).

We assume that the random sample x is generated from a Normal probability distribution with unknown mean and variance. We will test now the following hypothesis

$$H_0: \; \mu=-2$$

versus $$H_1: \; \mu\not=-2$$

The output above shows that $t_\mbox{obs}$ is

and the pvalue is

Since this p-value is greater than 0.05 (5%), we can conclude that we can accept the hypothesis $H_0$.

How can the test statistc and the p-value are computed?

Under $H_0: \; \mu=-2$, the test statistics is then

The p-value is then computed as follows

And the p-value is

We can also perform the following hypothesis testing. It's called the lower-tail alternative test:

$$H_0: \; \mu=-2$$

versus $$H_1: \; \mu<-2$$

It's computed as follows

We can also perform the following hypothesis testing. It's called the upper-tail alternative test:

$$H_0: \; \mu=-2$$

versus $$H_1: \; \mu>-2$$

The p-value is computed as follows

### Comparing two means¶

We have two types of hypothesis testing comparing two means:

• A paired sample t-test is a dependent sample t-test, which is used to decide whether the mean difference between two observations of the same group is zero.

Example: Compare the difference in blood pressure level for a group of patients before and after some drug treatment.

• A two-sample t-test is used for comparing the significant difference between two independent groups. This test is also known as an independent samples t-test.

Example: Comparing between the salaries of a sample of men and women employees.

#### Paired sample t-test¶

We are going to the following hypothesis:

• $H_0:$ Mean difference between the two dependent samples is 0.
• $H_1$: Mean difference between the two dependent samples is not 0.

Example: we're comparing the grades between the Quiz 1 and the Quiz 2

We remove the missing values from the data

We have tested here the following hypothesis:

$$H_0:\,\mbox{ the averages of the grades in Q1 and Q2 are equal}$$

versus $$H_0:\,\mbox{ the averages of the grades in Q1 and Q2 are different}$$

We used a paired t-test and we can conclude that $H_0$ can't be rejected since the p-value is greater than 0.05 (5%) (the given level of significance).

We can test also if the mean $\mu_1$ of the grades of the Quiz 1 is higher than the mean $\mu_2$ of the grades of the Quiz 2:

$$H_0: \, \mu_1\geq \mu_2$$

versus $$H_1:\, \mu_1<\mu_2$$

Conclusion: $H_0$ can't be rejected

#### A two-sample t-test¶

We will compare now the mean of the Quiz 1 between Section 1 and 2

We conclude that both sections have the same means of the grades in Quiz 1.

### Testing the proportion¶

#### The theory¶

Assume that we would like to test the following Hypothesis: $$H_0\,: \,\,p=p_0,\;\; \mbox{ versus }H_1\,:\,\, p\not=p_0$$

where $p$ is a parameter proportion and $p_0$ is a given value of $p$. The hypothesis $H_0$ (null hypothesis) and $H_1$ will be tested from a given data on a random sample $X_1,\ldots, X_n$ with a Bernoulli distribution. The random variables $X_1,\ldots, X_n$ are binary variable with values $1$ and $0$ where each random variable $X_i$ takes the value 1 with probability $p$.

In most of the cases the data reported in this type of tests is the number $X$ of success among the $n$ trials. This later random variable $X$ is defined as follows: $$X=X_1+\ldots+X_n=\sum_{k=1}^n X_k$$ and has a Binomial distribution with size $n$ and probability $p$.

The probability $p$ is often estimated using the random variable $\widehat{p}$:

$$\widehat{p}=\displaystyle\frac{X}{n}.$$

When $n$ is large ($\geq 30$), the random variable $\widehat{p}$ follows approximatively a Normal distribution:

$$\widehat{p}\sim \mathcal{N}\left(p, \displaystyle\frac{p(1-p)}{n}\right).$$

Hence the random variable $Z=\sqrt{n}\displaystyle\frac{\widehat{p}-p}{\sqrt{\widehat{p}(1-\widehat{p})}}$ follows approximately a standard normal distribution.

The random variable $Z$ will be the test statistic of the the proportion hypothesis testing.

Example: According to the Washington Post, nearly 45% of all Americans are born with brown eyes, although their eyes don't necessarily stay brown. A random sample of 80 adults found 32 with brown eyes. Is there sufficient evidence at the .01 level to indicate that the proportion of brown-eyed adults differs from the proportion of Americans who are born with brown eyes?

In this example the sample size $n=80$, the random sample is $X_1,\ldots,X_n$ where each $X_i$ is representing an American adult, $X_i$ is $1$ if this adult has brown eyes and $0$ elsewhere. The random variable $X$ is the number of Americans with brown eyes among the 80 surveyed adults. In this example the observed of $X$ is 32.

We are testing here the following two hypothesis:

• Null Hypothesis: $H_0\,:\, p=.45$
• Alternative Hypothesis $H_1\,:\, p\not=.45$

Under the hypothesis $H_0$ ($p=.45$), the random variable $Z$ is equal to: $$Z=\sqrt{80}\displaystyle\frac{\widehat{p}-.45}{\sqrt{.4\times .6}}$$

The observed value of $\widehat{p}$ from the data is: $$\widehat{p}_{\mbox{obs}}=\displaystyle\frac{32}{80}=.4$$

Then the observed value of $Z$ is $$Z_{\mbox{obs}}=\sqrt{80}\displaystyle\frac{.4-.45}{\sqrt{.4\times .6}}=-0.913$$

Since the alternative hypothesis is $H_1\,:\, p\not=.45$. The test is called two-sided test and the p-value of the test is computed as follows:

$$\begin{array}{rcl} \mbox{p-value}& = & \mathbb{P}\left(|Z|\geq |z_\mbox{obs}| \mid H_0 \mbox{ true }\right)\\ & = & \mathbb{P}\left(|Z|\geq 0.913 \mid p=.45\right) \\ & = & 2\times (1-F_Z(0.913)) \\ & = & 0.361 \end{array}$$

where $F_Z$ is a cumulative probability function (CDF) of a standard normal distribution.

Since $1\%=0.01$ is the level of significance of the test, and the p-value is greater than $1\%$, we can decide to not reject the null hypothesis $H_0$.

#### Other types of alternative hypothesis¶

The alternative hypothesis $H_1$ can be also one the following:

• $H_1\,:\, p< p_0$: alternatine='smaller'
• $H_1\,:\, p> p_0$: alternatine='larger'

In case of alternatine='less' we proceed as follows:

The pvalue is computed as follows

In case of alternatine='larger' we proceed as follows:

The pvalue is computed as follows

Example: Pizza-Hut claims that 90% of its order are delivered within 10 minutes of the time the order is placed. A sample of 100 order revealed that 82 were delivered within the promised time. At 10% significance level, can we conclude that at maximum 90% of the orders are delivered in less than 10 minutes?

We are testing in this example the following hypothesis testing:

• $H_0\,:\, p\geq .90$
• $H_1\,:\, p <.90$ where $p$ is the proportion of the orders that are delivered within 10 minutes.

We can also conclude that sample size is $n=100$ and the observed value for $X=92$.

The pvalue is computed as follows:

Example: Of a sample of 361 owners of retail service and business firms that had gone into bankruptcy, 105 reported having no professional assistance prior to opening the business. It is claimed that at most 25% of all members of this population had no professional assistance before opening the business. Test the aforementioned claim at $\alpha=0.01$

We're testing in this example the following hypothesis:

• $H_0\,:\, p\geq .25$
• $H_1\,:\, p <.25$ where $p$ is the proportion of the bankrupted retail service owners who claim that they have no professional assistance.

We can also conclude that sample size is 𝑛=361 and the observed value for $X=105$.

### Comparing two proportions¶

Here we have two samples, defined by a proportion, and we want to see if we can make an assertion about whether the overall proportions of one of the underlying populations is greater than / less than / different to the other.

Example: we want to compare two different populations to see how their tests relate to each other:

• We have two samples A and B. Our null hypothesis is that the proportions from the two populations are the same $$H_0\,:\, p_A=p_B$$
• Our alternative hypothesis is that the proportions from the two populations are different $$H_1\,:\, p_A\not=p_B$$
• From the population $A$ we sampled $n_A=500$ tests and found $X_A=410$ passed
• From the other population $B$, we sampled $n_B=400$ tests and found $X_B=379$ passed
• We use a 2-sample z-test to check if the sample allows us to accept or reject the null hypothesis

We will use for this test statistic the following hypothesis testing:

$$Z=\displaystyle\frac{\widehat{p}_A-\widehat{p}_B}{S_p}$$

Where $S_p$, called the pooled standard error, is computed as follows: $$S_p=\sqrt{\widehat{p}(1-\widehat{p})\times \left(\displaystyle\frac{1}{n_A}+\displaystyle\frac{1}{n_B}\right)}$$ and $$\widehat{p}=\displaystyle\frac{n_A\widehat{p}_A+n_B\widehat{p}_B}{n_A+n_B}$$ is the pooled proportion.

Let $z_\mbox{obs}$ be the observed value of $Z$ under $H_0$. The pvalue associated to the two-sided alternative test can be computed as follows: $$\mbox{pvalue}=\mathbb{P}\left(|Z|\geq |z_\mbox{obs}|\mid H_0\mbox{ true }\right)$$ and the test statistic follows approximately the standard normal distribution.

In Python we proceed as follows:

We create Python arrays for the number of successes and for the sample sizes.

Let's see now how the test statistic and the pvalue are computed

Example: Let's consider the Titanic data

Importing the data

We encode then the variable Survived

We select Sex and Survived variables.

Probability of surviving by Gender

Comparing both probabilities

Exercise: Comparing the surviving probability between different other groups.