This article explains the basic concepts of Data Blending and joining in Tableau.
1) The difference between joining and blending data:
Joining your data can only be done when the data comes from the same source, for example from two sheet tabs within a single Excel file. If that same information was stored in separate Excel files you would need to do a data blend in Tableau. A blend is always required if the data is stored in two separate “data sources” within Tableau. So even if your data is very closely related and exists in two separate files or databases, you will have to do a data blend if you are combining the data in Tableau.
When blending data, the first data source used in your view will dictate how your worksheet view in Tableau is built. The secondary (blended) data source will be able to contribute extra information, but will not be able to change the overall structure of the view. The secondary data source’s values can be aggregated and applied to the existing view after you have established a “relationship” by assigning a variable that both the primary and secondary data sources have in common. When you set a data source as Primary, you ensure that all rows from that source will be included in your view along with any rows in the Secondary source(s) that match on a common field; rows that don’t match in the Primary source won’t appear in the results. In a simple words, Data blending is the ability to bring data from multiple data sources into one Tableau view, without the need for any special coding.
2) When to use data blending:
It is generally preferable to avoid data blending when you can combine the two data sources outside of Tableau. If this is not an option, then you must identify at least one common variable shared by the two data sources you want to blend together. When possible, go for a join rather than a blend. If you need to combine two data sources and for whatever reason cannot manage to join the data outside of Tableau, your only option is a data blend.
A simple example is having (a) a data source with three columns including location names and latitude/longitude values, and (b) a data source with location names and detailed information about each location. You could build a map using (a) and then blend in extra supplemental information using (b), where a relationship is built by connecting the data sources based on the location names.
3) When to use joining:
You can only use joining when your data comes from the same underlying source (for example, the same Excel file or Access file).
4) When are you unable to blend data from two or more sources?
If there are no variables shared between each data source then you will not be able to do a data blend, because there is no information that can be related from one source to the other. However, this does not mean that the column headers (variable names) need to be an exact match. You can edit the relationships manually to point Tableau to the variables that have matching underlying values.
For example, if I am blending information together based on countries and source (a) calls it “Country” while source (b) calls it “Locations”, I can edit a relationship manually to blend on these two variables. If the two column headers are an exact match, Tableau may automatically establish the link for you.
5) When are you unable to join data?
If the data comes from different underlying files you will not be able to do a join within Tableau. I recommend preparing your data before importing it into Tableau (there are many great tools available, one being Alteryx, that can help with this). In my opinion blending and joining in Tableau should be a last resort for times when you are unable to shape your data into one coherent file for analysis.
6) Why Blending Performs Left Join
When tableau queries a secondary data source it only returns information for field members that are shared by Primary data source because of that we are seeing Blending contains a Left join
In a Nutshell,
1. In the data window, the blue tick indicates the Primary data source (Sample-Superstore). An orange tick indicates a secondary data source (Sheet1 (Sample Database 2)).
2. Blending runs a query on each data source and joins the aggregated results of those queries.
3. The first field you drop onto a worksheet defines the primary datasource. Any other data source you use will be a secondary data source. The secondary source fields are shown on shelves with the orange tick marks.
4. A default blend is equivalent to a left outer join. However, by switching which data source is primary, or by filtering nulls, it is possible to emulate left, right and inner joins.
5. You cannot emulate a full outer join using blending.
6. Sometimes you see “NULL” values when you blend. This is when there is no corresponding row based on the join key in the secondary data source. In the example below, http://flyingsquidstudios.com/fckeditor/editor/filemanager/browser/default/connectors/test.html [Segment] is in the primary datasource. http://hollyhilltransport.com/jobs/?share=email [Customer Name] is from the secondary data source. Sam and Will have no data in the secondary data source.
7. If you do not override the default choice, Tableau will automatically switch the blend field to the most appropriate for the view.
8. The * is often appears when we are dealing with Blending. What it means is that the primary and secondary data sources do not share data that enables you to create the view you have attempted to draw. The * is an indication that your data sources do not contain enough information to blend and display the view you have created.
Leave a comment
You must be logged into post a comment.