Chi-Square test using R

Spread the love

A chi-square test is used to analyze nominal (sometimes known as categorical) data. It is pronounced kai and is frequently written as a χ2 test. It’s used to compare the observed frequencies in each sample’s response categories. The null hypothesis of a chi-square test is that the nominal variables have no relationship, that they are independent. That means,

  • H0: There is no relationship between the nominal variables or variables are independent.
  • H1: H0 is not true.

chi-square test

Creating or Importing data

In this step, we have to import our data into R or we can generate a data set for example.
Let’s create some nominal data:

data <- data.frame(sampleA = sample(c("Positive","Positive","Negative"), 300, replace = TRUE), sampleB = sample(c("Positive","Positive","Negative"), 300, replace = TRUE)) Perform the chi-square test using the chisq.test function: test <- chisq.test(x = data$sampleA, y = data$sampleB) Analyse the result: > test

Pearson’s Chi-squared test with Yates’ continuity correction,

data: data$sampleA and data$sampleB
X-squared = 1.7444, df = 1, p-value = 0.1866

Interpretation of Chi-square test

To interpret the chi-square test we use p-value. If the p-value is less or equal to 0.05 then we may reject the null hypothesis that means the categorical variables are independent. The p-value is 0.1866, which is above the 5% significance level, therefore the null hypothesis cannot be rejected.

Chi-Square (χ2) statistic

A large χ2 statistic means that the null hypothesis can be rejected. To determine how large it needs to be, the critical value can be found using the degrees of freedom and the significance level.

In our example, we have 1 degree of freedom. Using a table of probabilities for the χ2 distribution (example here), we can see that the critical χ2 value is 3.841. Therefore, the null hypothesis can be rejected where χ2 >= 3.841, but in this case, it is below 3.841 and the null hypothesis, therefore, cannot be rejected.

Learn Data Science and Machine Learning

Data Analysis Using R/R Studio

You cannot copy content of this page