Regression

Introduction

A regression analysis predicts a continuous value based on other variables. For example, a regression analysis can be used to predict house prices based on their areas, number of rooms and locations.

Actable AI would use the entire table as the source data and automatically splits the table into three parts:

  • Train data: Rows in the table where predictors and target are filled.
  • Prediction data: Rows where target column is missing.
  • Validation data: Actable AI would sample a part of the data to verify the reliability of the trained model. This part of the data would also be used in the performance tunning stage if performance optimisation is selected.

Parameters

  • Predicted target: Choose one column whose missing values would be predicted.
  • Predictors: Columns that are used to predict the predicted target.
  • Optimize for performance: If selected, Actable AI will optimize it’s training to have the best performance. However, it would take more time for the process to be finished.
  • Explain predictions: If selected, the SHAP values (values explaining the prediction choices) would be displayed along with each prediction.
  • Cross-Validation: If selected, Actable AI would use different portions of the data to test and train a model on different iterations. It would help Actable AI to evaluate the trained model better but it would take more time for the process to be finished.
  • Validation percentage: The model is evaluated in the end via the validation dataset. By sliding this value, one can control the percentage of rows with a non-empty predicted target that is used for validation.
  • Prediction interval: Setting the low and high enables the model to return the value range by the given confidence range. Two new columns would be appended to indicate the low bound value and high bound value.
  • Quantile low: If prediction interval is selected, this value would be used to calculate the low bound of the prediction interval.
  • Quantile high: If prediction interval is selected, this value would be used to calculate the low bound of the prediction interval.
  • Biased Groups: Groups that are biased and are creating bias in other features of the dataset.
  • Debiased Features: Features that needs to be debiased from the biased groups.
  • Extra Columns (optional): Columns that are not in the predictors but would be displayed along with the returned results.
  • Filters (optional): Set conditions on columns to filter on the original dataset. If selected, only a subset of the original data would be used in the analytics.

Actable AI provides the debiased feature to handle columns with biased data distribution. Please refer to Debias in order to understand our debiased algorithm.

Case Study

Imagine we are a real estate company and would like to forecast rental prices for properties that remain on the market. An example of the dataset could be:

days_on_market initial_price location neighborhood number_of_bathrooms number_of_rooms sqft rental_price
10 2271 great south_side 1 0 4848 2271
1 2167 good downtown 1 1 674 2167
19 1883 poor westbrae 1 1 554 1883
3 2431 great south_side 1 0 529 2431
58 4463 poor westbrae 2 3 1190 4123.812

Now we added some new properties and would like to find out how much is the rental price for their condition:

days_on_market initial_price location neighborhood number_of_bathrooms number_of_rooms sqft rental_price
18 1725 poor westbrae 1 0 509  
49 1388 poor westbrae 1 0 481  
1 4677 good downtown 2 3 808  
30 1713 poor westbrae 1 1 522  
10 1903 good downtown 1 1 533  

We set our parameters as

_images/setup4.png

Review Result

The result view contains a Prediction tab, a Performance tab, a Leaderboard tab and a Table tab.

Prediction

The Prediction tab shows the prediction result for the rows which missed the target value. If Prediction interval is set, the table would have two new columns <target>_low and <target>_high. If Explain predictions is set, each predictor cell would come along with the SHAP value. We will cover this more when we explain Details table.

Performance

The Performance tab shows the performance of our model with the Root Mean Square Error (RMSE) metric (14.292) and the R-squared (R2) metric (1.0).

  • RMSE: Square Root of the Average of Squared Error is calculated as the square root of the second sample moment of the differences between predicted values and observed values.
  • R2: R-squared is the coefficient of determination. R squared indicates how much target is predictable from predictors. 0 means predictors have zero predictability of the target, while 1 means the target is fully predictable by predictors.
_images/performance_overall1.png

Actable AI does not only act as a model training tool but also try to provide the rationale behind the classification. There are two more tables provided: Important Features table and Details table.

The Important Features table tells that according to the training, Actable AI realises that some features (columns in the data) are more important than others. In the display, the larger value indicates the predictor is more important. For example in our case study, initial_price takes the most weight in making the prediction (1240.843).

_images/performance_important_features1.png

The Details table tells the rationale behind each classification decision made on rows. There are several new columns and values that comes along with the original table because we enabled Explain predictions and Prediction interval (see Parameters):

  • Prediction result: the column showed the result of the prediction.
  • Prediction interval columns: there are two columns named as <target>_low and <target>_high. The value can be interpreted as, by setting the quantile low value as X and quantile high value as Y, there is X% chance that the prediction value can be lower than the <target>_low boundary and Y% chance that the prediction value can be larger than the <target>_high boundary.
  • Prediction result: the column showed the result of the prediction.
  • Shapley value: the values that come on the side for each predictors’ cell are Shapley values, a concept coming from game theory to estimate contribution of each player in a final result. A Shapley value of a player is the average of difference in the results with and without that player in different co-operations with other players. To estimate the contribution of each feature in a Machine Learning model’s decision, for every combination of other factors one can build 2 models: one with the considered feature and one without it. Shapley value of that feature is the average of differences in predictions of the results between 2 models.

In the below example of predicting rental prices, the property on the 3rd row is predicted to be rented at $5341.075. Shapley value of location=”great” is 90.23. It means that with the presence of that feature (location being great), the predicted rental price increases by $90.23 compared to the predictions by models trained without that feature.

_images/explaination1.png

One must be careful not to interpret Shapley values as causal effect. You can read more about that here. For estimating causal effect, please check out our Causal Inference analysis instead.

Leaderboard

The Leaderboard tab shows the underlying models used to get a prediction. Actable AI uses state-of-the-art machine learning algorithms to get a prediction and uses the best algorithm to get the prediction.

On the table below, we can see the following information:
  • The name of the model trained.
  • The validation score of the model.
  • The training time of the model.
  • The prediction time of the model.
_images/leaderboard1.png

Table

The Table tab display the original dataset.