Segmentation

Introduction

Segmentation as the clustering algorithm groups similar data points into clusters and project them on a 2D plane for easy visualisation and interaction. We use deep learning to learn representations of the input variables and their interactions. Examples of this analytics include segmenting potential customers to different segments to have different target marketing strategies or clustering different transactions to help with fraud detection.

Parameters

  • Number of clusters: This is the number of groups that data points will be grouped into, typically should be from 3-20. Or you can use auto to let our algorithm pick the number of clusters for you.
  • Colour: Values chosen to select the colour for data points in each group. The default value is cluster_id which is the identification of the created groups. One can pick a categorical variable from the list of input variables.
  • Selected Features: Input variables used for segmentation.
  • Explain predictions: If selected, an additional column would be appended to state the reason behind the prediction for each row.
  • Displayed columns: Used for visualisation. The selected value will be displayed when you hover the dots in the scatter plot. Change this value would immediately change the scatter plot.
  • Size: Used for visualisation. Define the radius of displayed data point. Change this value would immediately change the scatter plot.
  • Extra Columns (optional): Columns that are not in the predictors but would display along with the returned results.
  • Filters (optional): Set conditions on columns to filter on the original dataset. If selected, only a subset of the original data would be used in the analytics.

Case Study

Imagine we are analysing the commercial data and we have individual’s quaterly spending and income data.

An example of the dataset could be:

INCOME SPEND
233000 150000
250000 187000
204000 172000
236000 178000
354000 163000
192000 148000
294000 153000

We set our parameters as

_images/setup5.png

Review Result

The result view contains a Chart tab, a Cluster tab, a Data tab and a Table tab.

Chart

The Chart tab display a scatter graph, where each data point stands for a individual row in the original chart. Each cluster would display in a colour. The following screenshot shows that for the data we feed in, the algorithm automatically decided to cluster the data into 11 different clusters.

_images/chart.png

Cluster

The Clusters tab display the decision behind each cluster. For example, when INCOME <= 211000.00 and 133500.00 < SPEND <= 153000.00 are predicted to belong to cluster 1, because 75% data point in cluster 1 statisfy this condition.

_images/clusters.png

Data

The Data tab gives several information on top of the original table.

  • Cluster_ID: The cluster where this row is clustered.
  • Projected_X, Projected_Y: Display location in chart as the data point.
  • Explanation: This column would only be displayed when Explain predictions option is ticked.
_images/data.png

Table

The Table tab display the original dataset.