Data joining is a basic requirement in any data analysis. Data we often use in Tableau is taken from different sources which is present in the form of tables. If you want to combine two tables we have to use joins using specific fields known as columns. Joining is a method in which we join two or many tables using common fields. The result of combining tables is a virtual table which helps us analyze data easily. Tableau provides this feature to join data by using data pane feature available under edit data source in data menu.
Let’s see an example of joins- Suppose the CEO of your company asked you to make a Geographic sales dashboard in Tableau using geography and sales data tables. It is possible only by joining both these tables, using common column in both data tables.
Types of joins:
There are four types of joins mostly used in Tableau: inner, left, right and outer join. Usage of joins depends on data tables we are using and the required output.
Here only common rows from employee table is taken and remaining unmatched rows are neglected.
Right join is the opposite of the left join, which means that the common rows in the customer table are included and all rows in the employee table are added to form a new table.
Full Outer Join
Let’s get our hands dirty and see how joins work in Tableau! Download dataset from this link.
This data consists of three excel sheets: list of orders, order break down and sales targets of a company. Your boss asked you to prepare a dashboard showing sales & profits of company by region.
First let’s explore the dataset by loading it into Tableau.
After loading the file you can see that there are three data tables in the sheets shelf.
Explore the data present in each table and you can observe that we can see the geographical details are present in the List of Orders sheet and sales details are present in the Order Breakdown sheet. Now we have to combine these two sheets to make our desired dashboard. Tableau provides easy drag and drop option to join two sheets. By default inner join is applied on these sheets if you drag and drop to the data source shelf.
If you want to use other joins, select the join you want by clicking on the venn diagram between the two table names. We can use any join as per our requirement. Here Inner join will help us create the dashboard. Common columns are selected by Tableau and it joins tables according to them. If there are two or more common columns we can specify which column to use by selecting the name of the column in the drop down from the Venn diagram and it will join accordingly.
Now we can use this new table formed by joining both these tables to create the dashboard. This is the final dashboard which shows sales by geographic area and sales vs profits graph controlled by year.
The next important and useful feature in Tableau is Blending. Let’s see how it is different from joins!
- Consider an example where your boss asked you to make a graph which shows Sales by year and category and the Sales targets achieved by company. Here blending is useful in making this visualization.
Data blending is a method of combining data from different sources, normally we use joins for this but here the catch is that the data is in different form and from different sources. For example if we want to combine data from SQL and excel we can’t use joins in such cases data blending is very helpful to combine them.
It is useful in under following conditions:
- You want to combine data from different databases that are not supported by cross-database joins.
- Data is at different level of detail.
- Data needs cleaning.
- Joins cause duplicate data.
- You have a lot of data.
Requisites for using data blending:
We should decide the primary and secondary data sources such that one data source acts as the main table and the other acts as a secondary data source and there should be at least one common column in both these data sources to establish a connection between them.
Let’s see how data blending helps us to create a graph like in the example stated above. In the above example, your CEO asked you to make a graph to determine the category and the year in which sales targets were achieved. To do that we are going to use our excel sheet twice. Now let’s get started and make our primary data source consisting of a join of the three sheets present in our excel file.
If you see we made our primary data source ready and now we need to make our secondary data source ready which in this case is Sales Targets from same excel file.
Load “Sales Targets” table from sheets shelf and open sheet1 in Tableau to make our desired visualization.
Now in order to use both data sources in our visualization we need to make a relationship between them using edit relationships option from data tab in the toolbar.
Tableau is intelligent enough to make relations automatically, but for our convenience we can make custom relationships like in the image. After making relationships we must activate relationships between data sources by clicking on the chain link as shown below.
After linking, we can use all the dimensions and measures in our visualization.
Sometimes if a large data source is loaded, it takes lot of time to load into Tableau. In order to reduce load time we can aggregate data source by using filter option on data source.
Joins and data blending helps us in combining data without much effort. We can also use cross database join to combine our data. There are so many insights that can be unlocked from the data we have. Explore more by experimenting with data. Practice more to be perfect.
“Happy Data Visualization!!”
This article was contributed by Perceptive Analytics. Juturu Pavan, Prudhvi Sai Ram, Saneesh Veetil and Chaitanya Sagar contributed to this article.
Perceptive Analytics provides Tableau Consulting, data analytics, business intelligence and reporting services to e-commerce, retail, healthcare and pharmaceutical industries. Our client roster includes Fortune 500 and NYSE listed companies in the USA and India.
Leave a comment
You must be logged into post a comment.