Analysis of Variance

Introduction

Analysis of variance (ANOVA) detects statistical differences among different groups.

Parameters

  • N-Way ANOVA: Whether to perform One-Way ANOVA or Two-Way ANOVA.
  • Outcome: The outcome is the dependent variable.
  • Treatment(s): Categorical variable representing the groups for the outcome.
  • Filters: Set conditions on columns to filter on the original dataset. If selected, only a subset of the original data would be used in the analytics.

Case Study

Imagine we are working on tumors and would like to know if a tumor is begnign or malign. An example of the dataset could be:

ID Diagnosis radius texture perimeter
842302 M 17.99 10.38 122.8
842517 M 20.57 17.77 132.9
84300903 M 19.69 21.25 130.0
8510426 B 13.54 14.36 87.46
8510653 B 13.08 15.71 85.63
8510824 B 9.504 12.44 60.34

We would like to have an analysis of variance of the radius with the different diagnosis groups.

We setup our parameters like this :

_images/setup.png

Review Result

The result view contains a Box Plot tab, an ANOVA Table tab, a Tukey’s Test tab, a Diagnostics tab and a Table tab.

The BoxPlot tab shows the distribution of the radius grouped by the diagnosis.

_images/boxplot.png

The ANOVA Table tab shows the results of the ANOVA analysis.

_images/anova-table.png

The Tukey’s Test tab shows the results of the Tukey’s Test analysis.

_images/tukey-test.png

The Diagnostics tab is composed of multiple tabs:

  • The Q-Q Plot tab shows the Q-Q plot of the ANOVA analysis. The Q-Q plot is a graphical representation of the quantiles of the data. The X-axis is a normal distribution and the Y-axis is the quantiles of the data.
_images/qq-plot.png
  • The Residual Plot tab shows the residuals of the ANOVA analysis. The residuals are the difference between the data and the fitted values. If the residuals are not randomly distributed, then a linear regression model is appropriate for the data.
_images/residual-plot.png
  • The Shapiro Wilk tab shows the results of the Shapiro-Wilk test. The Shapiro-Wilk test is a non-parametric test for normality. The Shapiro-Wilk test is a statistical test for the null hypothesis that the data is normally distributed.
_images/shapiro-wilk.png
  • The Bartlett and Levene tab are statistical tests that are used to determine whether or not the variances between the different groups are equal.
_images/barlett.png