Correlation Analysis

Introduction

Correlation analysis is a method of statistical evaluation used to study the relationship between two factuals. The analysis is useful to indicate whether there are possible connections established between variables in the same dataset. For example, in the advertising industry, there would be correlation between advertising spend and the ads impression rate. However, correlation is not always causal, most of the time they are misleading and show fake relationship between variables. In order to discover and understand the true causal, you are very welcome to use Causal Inference.

Parameters

  • Correlation target: The choosen to be studied with selected compared factors.
  • Compared factors: Features to compute the correlation with the target.
  • De-correlation (optional): If set, data are sampled to reduce correlation between the target and selected column. It is useful when you want to de-couple specific columns with the choosen target.
  • Number of displayed factors: Number of best correlated factors to be displayed.
  • Filters (optional): Set conditions on columns to filter on the original dataset. If selected, only a subset of the original data would be used in the analytics.
  • Bar Values: Whether values would be shown on the bar. Changing this control takes effect instantly.

Case Study

Imagine we are a bike rental shop owner. We have our bike demand data over two years and we would like to understand how does the weather change impact on rental demand. All weather values are normalised in the table.

An example of the dataset could be:

temp hum windspeed casual registered cnt dteday
0.344167 0.805833 0.160446 331 654 985 2011-01-01
0.363478 0.696087 0.248539 131 670 801 2011-01-02
0.196364 0.437273 0.248309 120 1229 1349 2011-01-03
0.2 0.590435 0.160296 108 1454 1562 2011-01-04
0.226957 0.436957 0.1869 82 1518 1600 2011-01-05
0.204348 0.518261 0.0895652 88 1518 1606 2011-01-06
0.196522 0.498696 0.168726 148 1362 1510 2011-01-07
0.165 0.535833 0.266804 68 891 959 2011-01-08
0.138333 0.434167 0.36195 54 768 822 2011-01-09

We set our parameters as following, where cnt stands for the rental demand count.

_images/setup3.png

Review Result

The result view contains a Chart tab, a Data tab and a Table tab.

The Chart tab provides an overview plot chart for showing the correlation between Compared factors and Correlation target. As we can tell, temp has a positive correlation with cnt while windspeed and hum (humidity) has negative correlation with cnt.

_images/overview_chart.png

Actable AI also gives breakdown views for each compared factors. For example, the following graph describe the correlation between temp and cnt. Most of data points are being covered by the regression calculation and the correlation coefficient is 0.622.

Actable AI uses spearman correlation coefficient to indicate how strong the correlation is. In nutshell, the value range for Spearman correlation coefficient is between 0-1.

  • 0.00-0.19 “very weak”
  • 0.20-0.39 “weak”
  • 0.40-0.59 “moderate”
  • 0.60-0.79 “strong”
  • 0.80-1.00 “very strong”
_images/cnt_vs_temp.png

The Data tab provides the spearman’s rank coefficient and the P-value for the correlation analysis. The p-value obtained from the calculator is a measure of how likely any observed correlation is due to chance. The range of P-value is between 0 (0%) - 1 (100%). Here, our null hypothesis is that there are no correlation between this factual and the target.

  • A close to 1 value suggests no correlation.
  • A close to 0 value suggests there is a very high probability that data have strong correlation.
_images/data_tab.png

The Table tab display the original dataset.