One-Way ANOVA Test in R - Easy Guides

What is one-way ANOVA test?
Assumptions of ANOVA test
How one-way ANOVA test works?
Visualize your data and compute one-way ANOVA in R
- Import your data into R
- Check your data
- Visualize your data
- Compute one-way ANOVA test
- Interpret the result of one-way ANOVA tests
- Multiple pairwise-comparison between the means of groups
  - Tukey multiple pairwise-comparisons
  - Multiple comparisons using multcomp package
  - Pairewise t-test
- Check ANOVA assumptions: test validity?
  - Check the hom*ogeneity of variance assumption
  - Relaxing the hom*ogeneity of variance assumption
  - Check the normality assumption
- Non-parametric alternative to one-way ANOVA test
Summary
See also
Read more
Infos

The one-way analysis of variance (ANOVA), also known as one-factor ANOVA, is an extension of independent two-samples t-test for comparing means in a situation where there are more than two groups. In one-way ANOVA, the data is organized into several groups base on one single grouping variable (also called factor variable). This tutorial describes the basic principle of the one-way ANOVA test and provides practical anova test examples in R software.

ANOVA test hypotheses:

Null hypothesis: the means of the different groups are the same
Alternative hypothesis: At least one sample mean is not equal to the others.

Note that, if you have only two groups, you can use t-test. In this case the F-test and the t-test are equivalent.

Here we describe the requirement for ANOVA test. ANOVA test can be applied only when:

The observations are obtained independently and randomly from the population defined by the factor levels
The data of each factor level are normally distributed.
These normal populations have a common variance. (Levene’s test can be used to check this.)

Assume that we have 3 groups (A, B, C) to compare:

Compute the common variance, which is called variance within samples (\(S^2_{within}\)) or residual variance.
Compute the variance between sample means as follow:
- Compute the mean of each group
- Compute the variance between sample means (\(S^2_{between}\))
Produce F-statistic as the ratio of \(S^2_{between}/S^2_{within}\).

Note that, a lower ratio (ratio < 1) indicates that there are no significant difference between the means of the samples being compared. However, a higher ratio implies that the variation among group means are significant.

Import your data into R

Prepare your data as specified here: Best practices for preparing your data set for R
Save your data in an external .txt tab or .csv files
Import your data into R as follow:

# If .txt tab file, use thismy_data <- read.delim(file.choose())# Or, if .csv file, use thismy_data <- read.csv(file.choose())

Here, we’ll use the built-in R data set named PlantGrowth. It contains the weight of plants obtained under a control and two different treatment conditions.

my_data <- PlantGrowth

Check your data

To have an idea of what the data look like, we use the the function sample_n()[in dplyr package]. The sample_n() function randomly picks a few of the observations in the data frame to print out:

# Show a random sampleset.seed(1234)dplyr::sample_n(my_data, 10)

 weight group19 4.32 trt118 4.89 trt129 5.80 trt224 5.50 trt217 6.03 trt11 4.17 ctrl6 4.61 ctrl16 3.83 trt112 4.17 trt115 5.87 trt1

In R terminology, the column “group” is called factor and the different categories (“ctr”, “trt1”, “trt2”) are named factor levels. The levels are ordered alphabetically.

# Show the levelslevels(my_data$group)

[1] "ctrl" "trt1" "trt2"

If the levels are not automatically in the correct order, re-order them as follow:

my_data$group <- ordered(my_data$group, levels = c("ctrl", "trt1", "trt2"))

It’s possible to compute summary statistics (mean and sd) by groups using the dplyr package.

Compute summary statistics by groups - count, mean, sd:

library(dplyr)group_by(my_data, group) %>% summarise( count = n(), mean = mean(weight, na.rm = TRUE), sd = sd(weight, na.rm = TRUE) )

Source: local data frame [3 x 4] group count mean sd (fctr) (int) (dbl) (dbl)1 ctrl 10 5.032 0.58309142 trt1 10 4.661 0.79367573 trt2 10 5.526 0.4425733

Visualize your data

To use R base graphs read this: R base graphs. Here, we’ll use the ggpubr R package for an easy ggplot2-based data visualization.
Install the latest version of ggpubr from GitHub as follow (recommended):

# Installif(!require(devtools)) install.packages("devtools")devtools::install_github("kassambara/ggpubr")

Or, install from CRAN as follow:

install.packages("ggpubr")

Visualize your data with ggpubr:

# Box plots# ++++++++++++++++++++# Plot weight by group and color by grouplibrary("ggpubr")ggboxplot(my_data, x = "group", y = "weight", color = "group", palette = c("#00AFBB", "#E7B800", "#FC4E07"), order = c("ctrl", "trt1", "trt2"), ylab = "Weight", xlab = "Treatment")

One-way ANOVA Test in R

# Mean plots# ++++++++++++++++++++# Plot weight by group# Add error bars: mean_se# (other values include: mean_sd, mean_ci, median_iqr, ....)library("ggpubr")ggline(my_data, x = "group", y = "weight", add = c("mean_se", "jitter"), order = c("ctrl", "trt1", "trt2"), ylab = "Weight", xlab = "Treatment")

One-way ANOVA Test in R

If you still want to use R base graphs, type the following scripts:

# Box plotboxplot(weight ~ group, data = my_data, xlab = "Treatment", ylab = "Weight", frame = FALSE, col = c("#00AFBB", "#E7B800", "#FC4E07"))# plotmeanslibrary("gplots")plotmeans(weight ~ group, data = my_data, frame = FALSE, xlab = "Treatment", ylab = "Weight", main="Mean Plot with 95% CI")

Compute one-way ANOVA test

We want to know if there is any significant difference between the average weights of plants in the 3 experimental conditions.

The R function aov() can be used to answer to this question. The function summary.aov() is used to summarize the analysis of variance model.

# Compute the analysis of varianceres.aov <- aov(weight ~ group, data = my_data)# Summary of the analysissummary(res.aov)

 Df Sum Sq Mean Sq F value Pr(>F) group 2 3.766 1.8832 4.846 0.0159 *Residuals 27 10.492 0.3886 ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The output includes the columns F value and Pr(>F) corresponding to the p-value of the test.

Interpret the result of one-way ANOVA tests

As the p-value is less than the significance level 0.05, we can conclude that there are significant differences between the groups highlighted with “*" in the model summary.

Multiple pairwise-comparison between the means of groups

In one-way ANOVA test, a significant p-value indicates that some of the group means are different, but we don’t know which pairs of groups are different.

It’s possible to perform multiple pairwise-comparison, to determine if the mean difference between specific pairs of group are statistically significant.

Tukey multiple pairwise-comparisons

As the ANOVA test is significant, we can compute Tukey HSD (Tukey Honest Significant Differences, R function: TukeyHSD()) for performing multiple pairwise-comparison between the means of groups.

The function TukeyHD() takes the fitted ANOVA as an argument.

TukeyHSD(res.aov)

 Tukey multiple comparisons of means 95% family-wise confidence levelFit: aov(formula = weight ~ group, data = my_data)$group diff lwr upr p adjtrt1-ctrl -0.371 -1.0622161 0.3202161 0.3908711trt2-ctrl 0.494 -0.1972161 1.1852161 0.1979960trt2-trt1 0.865 0.1737839 1.5562161 0.0120064

diff: difference between means of the two groups
lwr, upr: the lower and the upper end point of the confidence interval at 95% (default)
p adj: p-value after adjustment for the multiple comparisons.

It can be seen from the output, that only the difference between trt2 and trt1 is significant with an adjusted p-value of 0.012.

Multiple comparisons using multcomp package

It’s possible to use the function glht() [in multcomp package] to perform multiple comparison procedures for an ANOVA. glht stands for general linear hypothesis tests. The simplified format is as follow:

glht(model, lincft)

model: a fitted model, for example an object returned by aov().
lincft(): a specification of the linear hypotheses to be tested. Multiple comparisons in ANOVA models are specified by objects returned from the function mcp().

Use glht() to perform multiple pairwise-comparisons for a one-way ANOVA:

library(multcomp)summary(glht(res.aov, linfct = mcp(group = "Tukey")))

 Simultaneous Tests for General Linear HypothesesMultiple Comparisons of Means: Tukey ContrastsFit: aov(formula = weight ~ group, data = my_data)Linear Hypotheses: Estimate Std. Error t value Pr(>|t|) trt1 - ctrl == 0 -0.3710 0.2788 -1.331 0.391 trt2 - ctrl == 0 0.4940 0.2788 1.772 0.198 trt2 - trt1 == 0 0.8650 0.2788 3.103 0.012 *---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1(Adjusted p values reported -- single-step method)

Pairewise t-test

The function pairewise.t.test() can be also used to calculate pairwise comparisons between group levels with corrections for multiple testing.

pairwise.t.test(my_data$weight, my_data$group, p.adjust.method = "BH")

 Pairwise comparisons using t tests with pooled SD data: my_data$weight and my_data$group ctrl trt1 trt1 0.194 - trt2 0.132 0.013P value adjustment method: BH

The result is a table of p-values for the pairwise comparisons. Here, the p-values have been adjusted by the Benjamini-Hochberg method.

Check ANOVA assumptions: test validity?

The ANOVA test assumes that, the data are normally distributed and the variance across groups are hom*ogeneous. We can check that with some diagnostic plots.

Check the hom*ogeneity of variance assumption

The residuals versus fits plot can be used to check the hom*ogeneity of variances.

In the plot below, there is no evident relationships between residuals and fitted values (the mean of each groups), which is good. So, we can assume the hom*ogeneity of variances.

# 1. hom*ogeneity of variancesplot(res.aov, 1)

One-way ANOVA Test in R

Points 17, 15, 4 are detected as outliers, which can severely affect normality and hom*ogeneity of variance. It can be useful to remove outliers to meet the test assumptions.

It’s also possible to use Bartlett’s test or Levene’s test to check the hom*ogeneity of variances.

We recommend Levene’s test, which is less sensitive to departures from normal distribution. The function leveneTest() [in car package] will be used:

library(car)leveneTest(weight ~ group, data = my_data)

Levene's Test for hom*ogeneity of Variance (center = median) Df F value Pr(>F)group 2 1.1192 0.3412 27

From the output above we can see that the p-value is not less than the significance level of 0.05. This means that there is no evidence to suggest that the variance across groups is statistically significantly different. Therefore, we can assume the hom*ogeneity of variances in the different treatment groups.

Relaxing the hom*ogeneity of variance assumption

The classical one-way ANOVA test requires an assumption of equal variances for all groups. In our example, the hom*ogeneity of variance assumption turned out to be fine: the Levene test is not significant.

How do we save our ANOVA test, in a situation where the hom*ogeneity of variance assumption is violated?

An alternative procedure (i.e.: Welch one-way test), that does not require that assumption have been implemented in the function oneway.test().

ANOVA test with no assumption of equal variances

oneway.test(weight ~ group, data = my_data)

Pairwise t-tests with no assumption of equal variances

pairwise.t.test(my_data$weight, my_data$group, p.adjust.method = "BH", pool.sd = FALSE)

Check the normality assumption

Normality plot of residuals. In the plot below, the quantiles of the residuals are plotted against the quantiles of the normal distribution. A 45-degree reference line is also plotted.

The normal probability plot of residuals is used to check the assumption that the residuals are normally distributed. It should approximately follow a straight line.

# 2. Normalityplot(res.aov, 2)

One-way ANOVA Test in R

As all the points fall approximately along this reference line, we can assume normality.

The conclusion above, is supported by the Shapiro-Wilk test on the ANOVA residuals (W = 0.96, p = 0.6) which finds no indication that normality is violated.

# Extract the residualsaov_residuals <- residuals(object = res.aov )# Run Shapiro-Wilk testshapiro.test(x = aov_residuals )

 Shapiro-Wilk normality testdata: aov_residualsW = 0.96607, p-value = 0.4379

Non-parametric alternative to one-way ANOVA test

Note that, a non-parametric alternative to one-way ANOVA is Kruskal-Wallis rank sum test, which can be used when ANNOVA assumptions are not met.

kruskal.test(weight ~ group, data = my_data)

 Kruskal-Wallis rank sum testdata: weight by groupKruskal-Wallis chi-squared = 7.9882, df = 2, p-value = 0.01842

Import your data from a .txt tab file: my_data <- read.delim(file.choose()). Here, we used my_data <- PlantGrowth.
Visualize your data: ggpubr::ggboxplot(my_data, x = “group”, y = “weight”, color = “group”)
Compute one-way ANOVA test: summary(aov(weight ~ group, data = my_data))
Tukey multiple pairwise-comparisons: TukeyHSD(res.aov)

Analysis of variance (ANOVA, parametric):
- One-Way ANOVA Test in R
- Two-Way ANOVA Test in R
- MANOVA Test in R: Multivariate Analysis of Variance
Kruskal-Wallis Test in R (non parametric alternative to one-way ANOVA)

(Quick-R: ANOVA/MANOVA)[http://www.statmethods.net/stats/anova.html]
(Quick-R: (M)ANOVA Assumptions)[http://www.statmethods.net/stats/anovaAssumptions.html]
(R and Analysis of Variance)[http://personality-project.org/r/r.guide/r.anova.html

This analysis has been performed using R software (ver. 3.2.4).

Enjoyed this article? I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.

Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!

Avez vous aimé cet article? Je vous serais très reconnaissant si vous aidiez à sa diffusion en l'envoyant par courriel à un ami ou en le partageant sur Twitter, Facebook ou Linked In.

Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!

Recommended for You!

Machine Learning Essentials: Practical Guide in R

Practical Guide to Cluster Analysis in R

Practical Guide to Principal Component Methods in R

R Graphics Essentials for Great Data Visualization

Network Analysis and Visualization in R

More books on R and data science

Recommended for you

This section contains best data science and self-development resources to help you on your path.

Coursera - Online Courses and Specialization

Books - Data Science

Our Books

Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
Network Analysis and Visualization in R by A. Kassambara (Datanovia)
Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)

Others

R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
Deep Learning with R by François Chollet & J.J. Allaire
Deep Learning with Python by François Chollet

Want to Learn More on R Programming and Data Science?

On Social Networks:

Get involved :
Click to follow us on Facebook and Google+ :
Comment this article by clicking on "Discussion" button (top-right position of this page)

One-Way ANOVA Test in R - Easy Guides - Wiki (2024)

FAQs

How to do a one-way ANOVA step by step? ›

One-way ANOVA procedure

Click on Analyze\Compare Means\One-way ANOVA.
Move your dependent continuous variable into the Dependent List box.
Move your independent categorical variable into the box Factor.
Click the Options button and click on Descriptive, hom*ogeneity of variance test, Brown-Forsythe, Welch and Means Plot.

More items...

Discover More Details ›

What is the primary purpose of the ANOVA test in R? ›

ANOVA tests whether any of the group means are different from the overall mean of the data by checking the variance of each individual group against the overall variance of the data.

Know More ›

What is a one-way ANOVA in simple terms? ›

One-Way ANOVA ("analysis of variance") compares the means of two or more independent groups in order to determine whether there is statistical evidence that the associated population means are significantly different. One-Way ANOVA is a parametric test. This test is also known as: One-Factor ANOVA.

How to use ANOVA to compare two models in R? ›

To compare the fits of two models, you can use the anova() function with the regression objects as two separate arguments. The anova() function will take the model objects as arguments, and return an ANOVA testing whether the more complex model is significantly better at capturing the data than the simpler model.

View Details ›

What is the formula for the one-way ANOVA? ›

One Way ANOVA Formula: The formula includes the calculation of the F-value through Mean Sum of Squares Between Groups (MSB) and Mean Sum of Squares Within Groups (MSW). F-value is calculated as F = MSB/MSW.

Find Out More ›

What are the 5 steps procedure in ANOVA? ›

We will run the ANOVA using the five-step approach.

Set up hypotheses and determine level of significance. H₀: μ₁ = μ₂ = μ₃ H₁: Means are not all equal α=0.05.
Select the appropriate test statistic. The test statistic is the F statistic for ANOVA, F=MSB/MSE.
Set up decision rule. ...
Compute the test statistic. ...
Conclusion.

Get More Info Here ›

How do you know when to use a one-way ANOVA test? ›

One-way ANOVA is typically used when you have a single independent variable, or factor, and your goal is to investigate if variations, or different levels of that factor have a measurable effect on a dependent variable.

View Details ›

What conditions are necessary in order to use a one-way ANOVA test? ›

Assumptions for One-Way ANOVA Test

The responses for each factor level have a normal population distribution.
These distributions have the same variance.
The data are independent.

Explore More ›

When should you use ANOVA instead of a T test? ›

The Student's t test is used to compare the means between two groups, whereas ANOVA is used to compare the means among three or more groups. In ANOVA, first gets a common P value. A significant P value of the ANOVA test indicates for at least one pair, between which the mean difference was statistically significant.

Know More ›

What is an example of a one-way and two way Anova? ›

One-way ANOVA: Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka) and race finish times in a marathon. Two-way ANOVA: Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka), runner age group (junior, senior, master's), and race finishing times in a marathon.

Discover More ›

How do you explain ANOVA in simple terms? ›

ANOVA, or Analysis of Variance, is a test used to determine differences between research results from three or more unrelated samples or groups.

Get More Info ›

How many observations are needed for a one-way ANOVA? ›

A one-way ANOVA compares three or more than three categorical groups to establish whether there is a difference between them. Within each group there should be three or more observations (here, this means walruses), and the means of the samples are compared.

Read On ›

How to tell if ANOVA is significant? ›

Essentially, if the “between” variance is much larger than the “within” variance, the factor is considered statistically significant. Recall, ANOVA seeks to determine a difference in means at each level of a factor. If the factor level impacts the mean, then that factor is statistically significant.

Keep Reading ›

Which package has ANOVA in R? ›

anova is a function in base R. Anova is a function in the car package. The former calculates type I tests, that is, each variable is added in sequential order.

Get More Info ›

Can you run ANOVA with unequal sample sizes in R? ›

Assumption Robustness with Unequal Samples

The main practical issue in one-way ANOVA is that unequal sample sizes affect the robustness of the equal variance assumption. ANOVA is considered robust to moderate departures from this assumption. But that's not true when the sample sizes are very different.

See Details ›

How do you calculate R from ANOVA? ›

R² = 1 - SSE / SST. in the usual ANOVA notation. ...
R²_adj = 1 - MSE / MST. since this emphasizes its natural relationship to the coefficient of determination. ...
R-squared = SS(Between Groups)/SS(Total) The Greek symbol "Eta-squared" is sometimes used to denote this quantity. ...
R-squared = 1 - SS(Error)/SS(Total) ...
Eta-squared =

Mar 21, 2001

Get More Info Here ›

One-Way ANOVA Test in R - Easy Guides - Wiki (2024)

Import your data into R

Check your data

Visualize your data

Compute one-way ANOVA test

Interpret the result of one-way ANOVA tests

Multiple pairwise-comparison between the means of groups

Tukey multiple pairwise-comparisons

Multiple comparisons using multcomp package

Pairewise t-test

Check ANOVA assumptions: test validity?

Check the hom*ogeneity of variance assumption

Relaxing the hom*ogeneity of variance assumption

Check the normality assumption

Non-parametric alternative to one-way ANOVA test

Recommended for You!

Recommended for you

Coursera - Online Courses and Specialization

Data science

Popular Courses Launched in 2020

Trending Courses

Books - Data Science

Our Books

Others

FAQs

How to do a one-way ANOVA step by step? ›

How do you explain ANOVA in simple terms? ›