One-Way ANOVA Test in R - Easy Guides - Wiki (2024)

    • What is one-way ANOVA test?
    • Assumptions of ANOVA test
    • How one-way ANOVA test works?
    • Visualize your data and compute one-way ANOVA in R
      • Import your data into R
      • Check your data
      • Visualize your data
      • Compute one-way ANOVA test
      • Interpret the result of one-way ANOVA tests
      • Multiple pairwise-comparison between the means of groups
        • Tukey multiple pairwise-comparisons
        • Multiple comparisons using multcomp package
        • Pairewise t-test
      • Check ANOVA assumptions: test validity?
        • Check the hom*ogeneity of variance assumption
        • Relaxing the hom*ogeneity of variance assumption
        • Check the normality assumption
      • Non-parametric alternative to one-way ANOVA test
    • Summary
    • See also
    • Read more
    • Infos

    The one-way analysis of variance (ANOVA), also known as one-factor ANOVA, is an extension of independent two-samples t-test for comparing means in a situation where there are more than two groups. In one-way ANOVA, the data is organized into several groups base on one single grouping variable (also called factor variable). This tutorial describes the basic principle of the one-way ANOVA test and provides practical anova test examples in R software.

    ANOVA test hypotheses:

    • Null hypothesis: the means of the different groups are the same
    • Alternative hypothesis: At least one sample mean is not equal to the others.

    Note that, if you have only two groups, you can use t-test. In this case the F-test and the t-test are equivalent.


    One-Way ANOVA Test in R - Easy Guides - Wiki (1)


    Here we describe the requirement for ANOVA test. ANOVA test can be applied only when:


    • The observations are obtained independently and randomly from the population defined by the factor levels
    • The data of each factor level are normally distributed.
    • These normal populations have a common variance. (Levene’s test can be used to check this.)

    Assume that we have 3 groups (A, B, C) to compare:

    1. Compute the common variance, which is called variance within samples (\(S^2_{within}\)) or residual variance.
    2. Compute the variance between sample means as follow:
      • Compute the mean of each group
      • Compute the variance between sample means (\(S^2_{between}\))
    3. Produce F-statistic as the ratio of \(S^2_{between}/S^2_{within}\).

    Note that, a lower ratio (ratio < 1) indicates that there are no significant difference between the means of the samples being compared. However, a higher ratio implies that the variation among group means are significant.

    Import your data into R

    1. Prepare your data as specified here: Best practices for preparing your data set for R

    2. Save your data in an external .txt tab or .csv files

    3. Import your data into R as follow:

    # If .txt tab file, use thismy_data <- read.delim(file.choose())# Or, if .csv file, use thismy_data <- read.csv(file.choose())

    Here, we’ll use the built-in R data set named PlantGrowth. It contains the weight of plants obtained under a control and two different treatment conditions.

    my_data <- PlantGrowth

    Check your data

    To have an idea of what the data look like, we use the the function sample_n()[in dplyr package]. The sample_n() function randomly picks a few of the observations in the data frame to print out:

    # Show a random sampleset.seed(1234)dplyr::sample_n(my_data, 10)
     weight group19 4.32 trt118 4.89 trt129 5.80 trt224 5.50 trt217 6.03 trt11 4.17 ctrl6 4.61 ctrl16 3.83 trt112 4.17 trt115 5.87 trt1

    In R terminology, the column “group” is called factor and the different categories (“ctr”, “trt1”, “trt2”) are named factor levels. The levels are ordered alphabetically.

    # Show the levelslevels(my_data$group)
    [1] "ctrl" "trt1" "trt2"

    If the levels are not automatically in the correct order, re-order them as follow:

    my_data$group <- ordered(my_data$group, levels = c("ctrl", "trt1", "trt2"))

    It’s possible to compute summary statistics (mean and sd) by groups using the dplyr package.

    • Compute summary statistics by groups - count, mean, sd:
    library(dplyr)group_by(my_data, group) %>% summarise( count = n(), mean = mean(weight, na.rm = TRUE), sd = sd(weight, na.rm = TRUE) )
    Source: local data frame [3 x 4] group count mean sd (fctr) (int) (dbl) (dbl)1 ctrl 10 5.032 0.58309142 trt1 10 4.661 0.79367573 trt2 10 5.526 0.4425733

    Visualize your data

    • To use R base graphs read this: R base graphs. Here, we’ll use the ggpubr R package for an easy ggplot2-based data visualization.

    • Install the latest version of ggpubr from GitHub as follow (recommended):

    # Installif(!require(devtools)) install.packages("devtools")devtools::install_github("kassambara/ggpubr")
    • Or, install from CRAN as follow:
    install.packages("ggpubr")
    • Visualize your data with ggpubr:
    # Box plots# ++++++++++++++++++++# Plot weight by group and color by grouplibrary("ggpubr")ggboxplot(my_data, x = "group", y = "weight", color = "group", palette = c("#00AFBB", "#E7B800", "#FC4E07"), order = c("ctrl", "trt1", "trt2"), ylab = "Weight", xlab = "Treatment")

    One-Way ANOVA Test in R - Easy Guides - Wiki (3)

    One-way ANOVA Test in R

    # Mean plots# ++++++++++++++++++++# Plot weight by group# Add error bars: mean_se# (other values include: mean_sd, mean_ci, median_iqr, ....)library("ggpubr")ggline(my_data, x = "group", y = "weight", add = c("mean_se", "jitter"), order = c("ctrl", "trt1", "trt2"), ylab = "Weight", xlab = "Treatment")

    One-Way ANOVA Test in R - Easy Guides - Wiki (4)

    One-way ANOVA Test in R

    If you still want to use R base graphs, type the following scripts:

    # Box plotboxplot(weight ~ group, data = my_data, xlab = "Treatment", ylab = "Weight", frame = FALSE, col = c("#00AFBB", "#E7B800", "#FC4E07"))# plotmeanslibrary("gplots")plotmeans(weight ~ group, data = my_data, frame = FALSE, xlab = "Treatment", ylab = "Weight", main="Mean Plot with 95% CI") 

    Compute one-way ANOVA test

    We want to know if there is any significant difference between the average weights of plants in the 3 experimental conditions.

    The R function aov() can be used to answer to this question. The function summary.aov() is used to summarize the analysis of variance model.

    # Compute the analysis of varianceres.aov <- aov(weight ~ group, data = my_data)# Summary of the analysissummary(res.aov)
     Df Sum Sq Mean Sq F value Pr(>F) group 2 3.766 1.8832 4.846 0.0159 *Residuals 27 10.492 0.3886 ---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    The output includes the columns F value and Pr(>F) corresponding to the p-value of the test.

    Interpret the result of one-way ANOVA tests

    As the p-value is less than the significance level 0.05, we can conclude that there are significant differences between the groups highlighted with “*" in the model summary.

    Multiple pairwise-comparison between the means of groups

    In one-way ANOVA test, a significant p-value indicates that some of the group means are different, but we don’t know which pairs of groups are different.

    It’s possible to perform multiple pairwise-comparison, to determine if the mean difference between specific pairs of group are statistically significant.

    Tukey multiple pairwise-comparisons

    As the ANOVA test is significant, we can compute Tukey HSD (Tukey Honest Significant Differences, R function: TukeyHSD()) for performing multiple pairwise-comparison between the means of groups.

    The function TukeyHD() takes the fitted ANOVA as an argument.

    TukeyHSD(res.aov)
     Tukey multiple comparisons of means 95% family-wise confidence levelFit: aov(formula = weight ~ group, data = my_data)$group diff lwr upr p adjtrt1-ctrl -0.371 -1.0622161 0.3202161 0.3908711trt2-ctrl 0.494 -0.1972161 1.1852161 0.1979960trt2-trt1 0.865 0.1737839 1.5562161 0.0120064
    • diff: difference between means of the two groups
    • lwr, upr: the lower and the upper end point of the confidence interval at 95% (default)
    • p adj: p-value after adjustment for the multiple comparisons.

    It can be seen from the output, that only the difference between trt2 and trt1 is significant with an adjusted p-value of 0.012.

    Multiple comparisons using multcomp package

    It’s possible to use the function glht() [in multcomp package] to perform multiple comparison procedures for an ANOVA. glht stands for general linear hypothesis tests. The simplified format is as follow:

    glht(model, lincft)
    • model: a fitted model, for example an object returned by aov().
    • lincft(): a specification of the linear hypotheses to be tested. Multiple comparisons in ANOVA models are specified by objects returned from the function mcp().

    Use glht() to perform multiple pairwise-comparisons for a one-way ANOVA:

    library(multcomp)summary(glht(res.aov, linfct = mcp(group = "Tukey")))
     Simultaneous Tests for General Linear HypothesesMultiple Comparisons of Means: Tukey ContrastsFit: aov(formula = weight ~ group, data = my_data)Linear Hypotheses: Estimate Std. Error t value Pr(>|t|) trt1 - ctrl == 0 -0.3710 0.2788 -1.331 0.391 trt2 - ctrl == 0 0.4940 0.2788 1.772 0.198 trt2 - trt1 == 0 0.8650 0.2788 3.103 0.012 *---Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1(Adjusted p values reported -- single-step method)

    Pairewise t-test

    The function pairewise.t.test() can be also used to calculate pairwise comparisons between group levels with corrections for multiple testing.

    pairwise.t.test(my_data$weight, my_data$group, p.adjust.method = "BH")
     Pairwise comparisons using t tests with pooled SD data: my_data$weight and my_data$group ctrl trt1 trt1 0.194 - trt2 0.132 0.013P value adjustment method: BH 

    The result is a table of p-values for the pairwise comparisons. Here, the p-values have been adjusted by the Benjamini-Hochberg method.

    Check ANOVA assumptions: test validity?

    The ANOVA test assumes that, the data are normally distributed and the variance across groups are hom*ogeneous. We can check that with some diagnostic plots.

    Check the hom*ogeneity of variance assumption

    The residuals versus fits plot can be used to check the hom*ogeneity of variances.

    In the plot below, there is no evident relationships between residuals and fitted values (the mean of each groups), which is good. So, we can assume the hom*ogeneity of variances.

    # 1. hom*ogeneity of variancesplot(res.aov, 1)

    One-Way ANOVA Test in R - Easy Guides - Wiki (5)

    One-way ANOVA Test in R

    Points 17, 15, 4 are detected as outliers, which can severely affect normality and hom*ogeneity of variance. It can be useful to remove outliers to meet the test assumptions.

    It’s also possible to use Bartlett’s test or Levene’s test to check the hom*ogeneity of variances.

    We recommend Levene’s test, which is less sensitive to departures from normal distribution. The function leveneTest() [in car package] will be used:

    library(car)leveneTest(weight ~ group, data = my_data)
    Levene's Test for hom*ogeneity of Variance (center = median) Df F value Pr(>F)group 2 1.1192 0.3412 27 

    From the output above we can see that the p-value is not less than the significance level of 0.05. This means that there is no evidence to suggest that the variance across groups is statistically significantly different. Therefore, we can assume the hom*ogeneity of variances in the different treatment groups.

    Relaxing the hom*ogeneity of variance assumption

    The classical one-way ANOVA test requires an assumption of equal variances for all groups. In our example, the hom*ogeneity of variance assumption turned out to be fine: the Levene test is not significant.

    How do we save our ANOVA test, in a situation where the hom*ogeneity of variance assumption is violated?

    An alternative procedure (i.e.: Welch one-way test), that does not require that assumption have been implemented in the function oneway.test().

    • ANOVA test with no assumption of equal variances
    oneway.test(weight ~ group, data = my_data)
    • Pairwise t-tests with no assumption of equal variances
    pairwise.t.test(my_data$weight, my_data$group, p.adjust.method = "BH", pool.sd = FALSE)

    Check the normality assumption

    Normality plot of residuals. In the plot below, the quantiles of the residuals are plotted against the quantiles of the normal distribution. A 45-degree reference line is also plotted.

    The normal probability plot of residuals is used to check the assumption that the residuals are normally distributed. It should approximately follow a straight line.

    # 2. Normalityplot(res.aov, 2)

    One-Way ANOVA Test in R - Easy Guides - Wiki (6)

    One-way ANOVA Test in R

    As all the points fall approximately along this reference line, we can assume normality.

    The conclusion above, is supported by the Shapiro-Wilk test on the ANOVA residuals (W = 0.96, p = 0.6) which finds no indication that normality is violated.

    # Extract the residualsaov_residuals <- residuals(object = res.aov )# Run Shapiro-Wilk testshapiro.test(x = aov_residuals )
     Shapiro-Wilk normality testdata: aov_residualsW = 0.96607, p-value = 0.4379

    Non-parametric alternative to one-way ANOVA test

    Note that, a non-parametric alternative to one-way ANOVA is Kruskal-Wallis rank sum test, which can be used when ANNOVA assumptions are not met.

    kruskal.test(weight ~ group, data = my_data)
     Kruskal-Wallis rank sum testdata: weight by groupKruskal-Wallis chi-squared = 7.9882, df = 2, p-value = 0.01842

    1. Import your data from a .txt tab file: my_data <- read.delim(file.choose()). Here, we used my_data <- PlantGrowth.
    2. Visualize your data: ggpubr::ggboxplot(my_data, x = “group”, y = “weight”, color = “group”)
    3. Compute one-way ANOVA test: summary(aov(weight ~ group, data = my_data))
    4. Tukey multiple pairwise-comparisons: TukeyHSD(res.aov)
    • Analysis of variance (ANOVA, parametric):
      • One-Way ANOVA Test in R
      • Two-Way ANOVA Test in R
      • MANOVA Test in R: Multivariate Analysis of Variance
    • Kruskal-Wallis Test in R (non parametric alternative to one-way ANOVA)

    This analysis has been performed using R software (ver. 3.2.4).


    Enjoyed this article? I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.

    Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!

    Avez vous aimé cet article? Je vous serais très reconnaissant si vous aidiez à sa diffusion en l'envoyant par courriel à un ami ou en le partageant sur Twitter, Facebook ou Linked In.

    Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!



    Recommended for You!


    Machine Learning Essentials: Practical Guide in R
    Practical Guide to Cluster Analysis in R
    Practical Guide to Principal Component Methods in R
    R Graphics Essentials for Great Data Visualization
    Network Analysis and Visualization in R
    More books on R and data science

    Recommended for you

    This section contains best data science and self-development resources to help you on your path.

    Coursera - Online Courses and Specialization

    Data science

    Popular Courses Launched in 2020

    Trending Courses

    Books - Data Science

    Our Books

    Others



    Want to Learn More on R Programming and Data Science?

    Follow us by Email

    On Social Networks:

    Get involved :
    Click to follow us on Facebook and Google+ :
    Comment this article by clicking on "Discussion" button (top-right position of this page)

    One-Way ANOVA Test in R - Easy Guides - Wiki (2024)

    FAQs

    How to do a one-way ANOVA step by step? ›

    One-way ANOVA procedure
    1. Click on Analyze\Compare Means\One-way ANOVA.
    2. Move your dependent continuous variable into the Dependent List box.
    3. Move your independent categorical variable into the box Factor.
    4. Click the Options button and click on Descriptive, hom*ogeneity of variance test, Brown-Forsythe, Welch and Means Plot.

    What is the primary purpose of the ANOVA test in R? ›

    ANOVA tests whether any of the group means are different from the overall mean of the data by checking the variance of each individual group against the overall variance of the data.

    What is a one-way ANOVA in simple terms? ›

    One-Way ANOVA ("analysis of variance") compares the means of two or more independent groups in order to determine whether there is statistical evidence that the associated population means are significantly different. One-Way ANOVA is a parametric test. This test is also known as: One-Factor ANOVA.

    How to use ANOVA to compare two models in R? ›

    To compare the fits of two models, you can use the anova() function with the regression objects as two separate arguments. The anova() function will take the model objects as arguments, and return an ANOVA testing whether the more complex model is significantly better at capturing the data than the simpler model.

    What is the formula for the one-way ANOVA? ›

    One Way ANOVA Formula: The formula includes the calculation of the F-value through Mean Sum of Squares Between Groups (MSB) and Mean Sum of Squares Within Groups (MSW). F-value is calculated as F = MSB/MSW.

    What are the 5 steps procedure in ANOVA? ›

    We will run the ANOVA using the five-step approach.
    • Set up hypotheses and determine level of significance. H0: μ1 = μ2 = μ3 H1: Means are not all equal α=0.05.
    • Select the appropriate test statistic. The test statistic is the F statistic for ANOVA, F=MSB/MSE.
    • Set up decision rule. ...
    • Compute the test statistic. ...
    • Conclusion.

    How do you know when to use a one-way ANOVA test? ›

    One-way ANOVA is typically used when you have a single independent variable, or factor, and your goal is to investigate if variations, or different levels of that factor have a measurable effect on a dependent variable.

    What conditions are necessary in order to use a one-way ANOVA test? ›

    Assumptions for One-Way ANOVA Test
    • The responses for each factor level have a normal population distribution.
    • These distributions have the same variance.
    • The data are independent.

    When should you use ANOVA instead of a T test? ›

    The Student's t test is used to compare the means between two groups, whereas ANOVA is used to compare the means among three or more groups. In ANOVA, first gets a common P value. A significant P value of the ANOVA test indicates for at least one pair, between which the mean difference was statistically significant.

    What is an example of a one-way and two way Anova? ›

    One-way ANOVA: Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka) and race finish times in a marathon. Two-way ANOVA: Testing the relationship between shoe brand (Nike, Adidas, Saucony, Hoka), runner age group (junior, senior, master's), and race finishing times in a marathon.

    How do you explain ANOVA in simple terms? ›

    ANOVA, or Analysis of Variance, is a test used to determine differences between research results from three or more unrelated samples or groups.

    How many observations are needed for a one-way ANOVA? ›

    A one-way ANOVA compares three or more than three categorical groups to establish whether there is a difference between them. Within each group there should be three or more observations (here, this means walruses), and the means of the samples are compared.

    How to tell if ANOVA is significant? ›

    Essentially, if the “between” variance is much larger than the “within” variance, the factor is considered statistically significant. Recall, ANOVA seeks to determine a difference in means at each level of a factor. If the factor level impacts the mean, then that factor is statistically significant.

    Which package has ANOVA in R? ›

    anova is a function in base R. Anova is a function in the car package. The former calculates type I tests, that is, each variable is added in sequential order.

    Can you run ANOVA with unequal sample sizes in R? ›

    Assumption Robustness with Unequal Samples

    The main practical issue in one-way ANOVA is that unequal sample sizes affect the robustness of the equal variance assumption. ANOVA is considered robust to moderate departures from this assumption. But that's not true when the sample sizes are very different.

    How do you calculate R from ANOVA? ›

    1. R2 = 1 - SSE / SST. in the usual ANOVA notation. ...
    2. R2adj = 1 - MSE / MST. since this emphasizes its natural relationship to the coefficient of determination. ...
    3. R-squared = SS(Between Groups)/SS(Total) The Greek symbol "Eta-squared" is sometimes used to denote this quantity. ...
    4. R-squared = 1 - SS(Error)/SS(Total) ...
    5. Eta-squared =
    Mar 21, 2001

    Top Articles
    Latest Posts
    Article information

    Author: Kelle Weber

    Last Updated:

    Views: 6419

    Rating: 4.2 / 5 (73 voted)

    Reviews: 88% of readers found this page helpful

    Author information

    Name: Kelle Weber

    Birthday: 2000-08-05

    Address: 6796 Juan Square, Markfort, MN 58988

    Phone: +8215934114615

    Job: Hospitality Director

    Hobby: tabletop games, Foreign language learning, Leather crafting, Horseback riding, Swimming, Knapping, Handball

    Introduction: My name is Kelle Weber, I am a magnificent, enchanting, fair, joyous, light, determined, joyous person who loves writing and wants to share my knowledge and understanding with you.