This article explains the basic concept of Combine Sets in Tableau.
If you collect large sets of data that you want to visualize, you might find that limiting the amount of information displayed to an important subset of records helps you work with and answer questions about the data more effectively. There are many ways you can use sets to answer complex questions and compare cohorts of data. Below is one example through which you can use sets to define and compare subsets of data.This post is inspired by Kirill Eremenko lecture Tableau 10 Advanced Training: Master Tableau in Data Science available in Udemy.
What is Combine sets?
Sets are custom fields that define a subset of data based on some conditions. A set can be based on a computed condition. For example, a set may contain customers with sales over a certain threshold. Computed sets update as your data changes. Alternatively a set can be based on a specific data point in your view. You can use sets to compare and ask questions about a subset of data. For example, in a scatter plot showing product sales, you may select the products with the most sales and add them to a set. You may then create another set that contains the products with the highest return rates. With these two sets you can ask questions such as, “ What percent of my total sales is from high returned products?”
There is One Condition to implement the Combined Set, Both Sets should point to same Dimension i.e. sets need to be based on the same exact field. You can combine two sets to compare the members. When you combine sets you create a new set containing either the combination of all members, just the members that exist in both, or members that exist in one set but not the other. Combining sets allows you to answer complex questions to understand cohorts of data.
Take an example, lets us suppose our Indian Government are currently planning to invest on Top 500 Indian Startups. They are currently reviewing 500 potentially interesting startups and deciding on which ones they are going to invest in. Indian Government will invest on those companies who has shown significant growth in terms of
- High Revenue (2016)
- Low Expenses (2016)
- Growth Percentage (2016)
For getting the required data, let’s find out the TOP 500 Indian Startup. While Googling, I came across this wonderful website (http://www.startupranking.com) which contains all the required information I have downloaded the data and created our data set.
Note:*Any resemblance to real companies, people or accounts data is strictly a coincidence. Here, Revenue and Expenses data are completely fictitious.
As per the above requirement, the criteria for selection investment is basically a combination of High Revenue, Low Expenses and Growth %.There are so many ways through which you can solve this questions in Tableau but we will follow the “Combine Sets” approach. The most important point for creating a combine sets, they must be based on the same dimensions.
Let’s us first load the data into our tableau and put 2016 Revenue on column shelf and 2016 Expenses on Rows shelf. If you place one measure on the Rows shelf and another measure on the Columns shelf, you are asking Tableau to compare two numerical values. Typically, Tableau chooses a scatterplot as the default visualization in such cases. The initial view can be disappointing—a single mark, showing the sum for all values for the two measures. There are various ways to add detail to a basic scatterplot:
- You can use dimensions to add detail,
- You can add additional measures and/or dimensions to the Rows and Columns shelves to create multiple one-mark scatter plots in the view,
- You can disaggregate the data. (Or you can use any combination of these options.)
The best way to modify your original one-mark scatter plot to display more marks is by disaggregating the data. Tableau aggregates data in your view by default. The moment you uncheck the aggregate measures, you see a lot of marks—one for each row in your original data source:
- Select Analysis >Aggregate Measures.
Create two sets which should show High revenue and Low expenses by using a common dimension name ‘Company ID’. You can use any aggregation like ‘SUM’ because data is currently in lowest level of granularity. You can use “Load” Button to get the Min and Max range of values.
But currently we don’t know what cut-off Indian government decided for “High Revenue” and “Low Expenses”. So just to avoid this confusion, we will create a separate Revenue and Expense cutoff parameter which will help Indian official to set their Threshold value based on their Convenience.
Now let’s create two parameter – Revenue cutoff and Expenses Cutoff.
Now let’s again open the “low expense” and “High Revenue” set and removed the static threshold value with your new Parameters.
Follow the below steps for creating the Combine Sets
- Select two sets in the Data pane that you want to combine.
- Right-click (control-click on Mac) the sets and select Create Combined Set.
- In the Create Set dialog box, type a name for the new combined set.
- Verify that the two sets you want to combine are selected in the two drop-down menus.
- Select one of the following options for how to combine the sets:
- All Members in Both Sets – the combined set will contain all of the members from both sets.
- Shared Members in Both Sets – the combined set will only contain members that exist in both sets.
- Except Shared Members – the combined set will contain all members from the specified set that don't exist in the second set. These options are equivalent to subtracting one set from another
Refer the below links for more understanding on Joins
Now put the newly created combined Set on the color shelf and add the Parameter on the view .
Let’s reverse the 2016 Expense axis so that shared value will move to the first quadrant. Do some formatting like removing the Grid lines, bold the axis?
Now we are also interested in seeing the companies name whose growth Percentage is highest. Let’s create a set by using Companies name.
Add “Companies to be invested” on color shelf and “Top companies by Growth Rate” on shape marks card but now user has to struggle a lot to get the correct insight.
Now press control + Select “Companies to be invested” and “Top companies by Growth Rate” simultaneously and put on color Shelf.
Now we are interested in companies who is showing high growth rate along with High revenue and Low expenses. So we need to concentrate on (In,In and In,out) combination.
We can also enhance the dashboard by adding the Reference line and Drop lines which help surely us in better understanding of the dashboard.
Leave a comment
You must be logged into post a comment.