A chi-square test is used to analyze nominal (sometimes known as categorical) data. It is pronounced kai and is frequently written as a χ2 test. It’s used to compare the observed frequencies in each sample’s response categories. The null hypothesis of a chi-square test is that the nominal variables have no relationship, that they are independent. That means,

- H0: There is no relationship between the nominal variables or variables are independent.
- H1: H0 is not true.

## Creating or Importing data

In this step, we have to import our data into R or we can generate a data set for example.

Let’s create some nominal data:

```
set.seed(150)
data <- data.frame(sampleA = sample(c("Positive","Positive","Negative"), 300, replace = TRUE), sampleB = sample(c("Positive","Positive","Negative"), 300, replace = TRUE)) Perform the chi-square test using the chisq.test function: test <- chisq.test(x = data$sampleA, y = data$sampleB) Analyse the result: > test
```

Pearson’s Chi-squared test with Yates’ continuity correction,

```
data: data$sampleA and data$sampleB
X-squared = 1.7444, df = 1, p-value = 0.1866
p-value
```

## Interpretation of Chi-square test

To interpret the chi-square test we use p-value. If the p-value is less or equal to 0.05 then we may reject the null hypothesis that means the categorical variables are independent. The p-value is 0.1866, which is above the 5% significance level, therefore the null hypothesis cannot be rejected.

### Chi-Square (χ2) statistic

A large χ2 statistic means that the null hypothesis can be rejected. To determine how large it needs to be, the critical value can be found using the degrees of freedom and the significance level.

In our example, we have 1 degree of freedom. Using a table of probabilities for the χ2 distribution (example here), we can see that the critical χ2 value is 3.841. Therefore, the null hypothesis can be rejected where χ2 >= 3.841, but in this case, it is below 3.841 and the null hypothesis, therefore, cannot be rejected.

Learn Data Science and Machine Learning

Data Analysis Using R/R Studio

- Import data into R
- Principal component analysis (PCA) code
- Canonical correlation analysis (CCA) code
- Independent component analysis (ICA) code
- Cluster Analysis using R
- One-way ANOVA using R
- Two-way ANOVA using R
- Paired sample t-test using R
- Random Forest in R