Nov
5
2016

Sankey Chart in Tableau

>

This article explains how to create a Sankey diagram in Tableau.

Steps to Reproduce:

A Sankey diagram is a visualization used to depict a flow from one set of values to another. The things being connected are called nodes and the connections are called links. Sankeys are best used when you want to show a many-to-many mapping between two categorical dimensions. This chart is a good alternative to bar charts or pie charts, if you wish to show flow quantities. In Sankey diagrams the width of the arrows are proportional to the flow quantity it represents. So, attention is drawn immediately to the most important flows in the process system.

Sankey is an exciting, beautiful, gorgeous, efficient, informative (add any adjective that you like here) visual for the flow. Sankey diagrams are very good at showing particular kinds of complex information —

  1. How the user landed and navigated in a web site
  2. Where money came from & went to (budgets, contributions)
  3. Flows of energy from source to destination
  4. Flows of goods from place to place
  5. Facility management, process Engineering etc
  6. Value streams, cost allocation, visualization of cost flows and added value.

This Diwali I purchased a book “Data Visualisation: A Handbook for Data Driven Design” where Mr. Andy beautifully described the use of Sankey diagram. I always wanted to create Sankey diagram in Tableau without reshaping the data but this wasn’t possible earlier. Thanks to Zen Master Rody Zakovich who helped me in creating a Sankey that too without reshaping the data. We will be using a sample superstore data for this example.

Step 1:

Create a Calculated Field T  

    (INDEX ()-25)/4

1

 

 

 

So when your Index is 1 it will give you – 6, when you index will be 2 it will give you -5.75 so on.

Step 2:

As per Zen Master Jeffrey Shaffer, if we take 49 points for each dimension, we can easily plot the shape of smooth curve line.T should be in between -6 and 6 and if we take .25 increments we will get 49 values

So Create a Calculated Field “To pad

IF [Order Date] = {FIXED [Region], [Category]: MIN ([Order Date])} THEN 1 else 49 END

2

 

 

 

So when your Order Date matched with Min (Order date) it will assign a value 1 and for all other dates value will be 49.

Step 3:

Now we need a “S" shaped curve (sigmoid curve). So create another calculated field called sigmoid Function

1/ (1+EXP (1) ^-[T])

3

 

 

 

Step 4:

Create a new Bin of Size 1 called Padded

4

 

 

 

 

 

 

 

 

Step 5:

Build our Ranking Calculation.We are creating a running total for number of records and dividing by the overall Total which will give us a cumulative percentage.

[Rank 1] = RUNNING_SUM (sum ([Number of Records]))/ TOTAL (sum ([Number of Records]))

[Rank 2] = RUNNING_SUM (sum ([Number of Records]))/TOTAL (sum ([Number of Records]))

5

 

 

 

 

For Example:

Region   RUNNING_SUM (sum ([Number of Records]))    TOTAL (sum ([Number of Records]))

Central   2,323                                                                                       2,323

East        2,848                                                                                        2,848

South      1,620                                                                                       1,620

West       3,203                                                                                        3,203

Step 6:

Create a Curve Function:

[Curve] = [Rank 1] + (([Rank 2] – [Rank 1])*[Sigmoid Functions])

6

 

 

 

 

Step 7:

Drag Padded Bin on row shelf. Right Click and select “Show missing Values”. Change the Marks Card option from “Automatic” to “Line”. Select the Padded field which is available on Row shelf and put it into path marks card

7

 

 

 

 

 

 

 

 

 

Step 8:

Drag T calculated Field on Row Shelf and “Compute using” Padded (which is available on Path Marks card)

Step 9:

Drag Region and Category simultaneously and put it into the color marks card.

Step 10:

Drag the Curve calculated field on Column shelf. Our [Curve] calculation has three parts to it, [Rank 1], [Rank 2] and [T].We need each of these to address over different dimensions.

8

 

 

 

 

 

 

 

 

 

 

 

 

 

Step 11:

We now need to vary the lines by size, based on the number of records. Adding Sum ([Number of Records]) to Size won’t look good as the padded marks have no records associated with it. So create a calculated field called Size which has to be computed using Padded bins. This can only be possible if we use some Table calculation.

RUNNING_AVG ( MIN ({ FIXED [Region], [Category] : SUM([Number of Records]) }) )

9

 

 

 

Now drag that calculated Field on size marks card and Compute using [Padded]

Step 12:

Final result will look like the below image.

10

 

 

 

 

 

 

 

 

 

 

 

 

 

Step 13:

As per definition of Sankey, it depicts a flow from one set of values to another. It’s always good if you show the flow in your chart.So we will create two single bar stacked chart sheets showing the breakdown of Number of records for each dimension. If you want you can use the Percentage as well.

Drag “Number of records” on Row shelf and Region on “Color” as well as on “Text Label” cards. Duplicate the sheet and replace the Region Dimension by your Category Dimension.The sheet will look like the below image.

11

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Step 14:

Drag all the three sheets into the dashboard. Placed your Region Sheet on the left hand side and Category sheet on the Right side. Put Sankey in between both the sheets.

Go to Dashboard->Actions->Add Actions->Select Highlight Action. Make sure you should select “Hover” Action .This will highlight the entire chart on Mouse hover.

12

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Step 15:

The Final output will look like the below image.

13

 

 

>

About the Author: Rajeev Pandey

I'm Rajeev, a Tableau Lover, Data Evangelist from Hyderabad, India. I am a multidisciplinary designer working in data visualization, interaction design and innovation.Expertised in developing Tableau , Web focus based visualization and reporting applications.I am quick learner who can absorb new ideas and can communicate clearly and effectively.I love creativity and enjoy experimenting with various technologies.

3 Comments+ Add Comment

Leave a comment

You must be logged into post a comment.