May
23
2018

Data Preparation Gets a New Boost with Tableau Prep Tool

Tableau Project Maestro

Data preparation is an iterative and very important process for exploring, combining, cleaning and transforming raw data into curated datasets for data integration, data science, data discovery and business intelligence analytics. Data preparation plays key role in businesses and any decision to make using data. Data preparation is made easy by using data preparation tools for analysts and so many businesses.

In real life people spend upto 80% of time preparing data and 20% analyzing it. It made people at Tableau to think about it and they wanted to make it easier with their project Maestro. They renamed this tool as Tableau Prep tool and released it recently. Let’s explore latest Tableau Prep tool and know it’s capabilities.

Tableau Prep Tool

Tableau Prep is a new tool from Tableau which is used to help people combine, shape and clean their data for analysis quickly. It reduces struggle during data preparation to do complex tasks such as joins, unions, pivots and aggregation with a drag and drop flexibility without any scripting.

Some important helpful features in this new tool are:

  • Quick text cleaning.
  • Fast visual filters.
  • Debugging made simpler.
  • Remove columns or steps using one click.

Now let’s go on and dive into Tableau prep tool to see how it works. For that we need to download and install it from here. Download data from this link to get start our journey!

Open tableau prep tool after installation and you can see like this below.

If you explore the connections section Tableau provides flexibility to connect files directly from server, csv files and from database too. Tableau provides wide range of options to connect your data like shown below.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Now load our excel file to get started. You will get data like below. Here we get our input step which is configurable to single table or multiple tables using union option through matching patterns. On right we can see list of field of that table.

Now using flow block we can proceed further by adding step or other options like join, union. We here add one other step to move forward and see how it shows us the data present in it.


Now we are going to add simple cleaning step and after that we can see profile pane and data grid with it below flow pane. Using this sheet we are going to learn how group and replace feature works in tableau.

 

 


 

 

 

 

 

 

 

 

Profile pane shows all distributions, columns present in that table. Now explore this profile pane a little bit and if we see data grid shows the low level profile of data. Group & Replace feature in tableau prep helps us to group multiple values and replace them with a single value. Here if we see in profile pane columns contains multiple values with different names. So we have to clean our data to be consistent. So we start it by going into menu option of first column Airbnb listings.

Select Group & Replace option with manual selection from menu as shown below.

After selecting manual selection we will get the new window with all the column values. Whatever value we select first will become replacing value.

 

Select all values which should be grouped and we will get preview of the remaining and new values on the editor. If you look there is a paper clip icon which shows that it is a grouped value. Do the same thing for the remaining values to make this column consistent.


If we want to add new values like beds into bed group we can add them manually by using + icon present near editor.

Now we are done with group and replacing of airbnb column. Let’s move on to the next column Misspellings which contains values with different spelling mistakes. We can say that it like a spell-checker. We are using the pronunciation option from menu in group and replace section which uses an algorithm to do the necessary corrections.

 

It will automatically corrects the spelling and move into groups with a correct value as a replace value. We can modify the replace value if we want by selecting it through edit option.

 

We can manually add to group and edit by selecting the group name. Let’s move onto the last column Name formatting which contains last name first in some values. We can correct this by using common characters option in group and replace selection which uses N-grams algorithm which recognizes characters and group them.

 

Here most commonly occurred value is taken as replace value and it groups remaining value into it as shown below. As we know no algorithm is 100 % perfect so we ensure that everything in place by rechecking it manually to make sure they are not incorrectly grouped.

 

Sometime algorithm missed some values and we can group manually by selecting and adding them as shown below.


If they are any typos we have to correct them manually and tableau prep automatically groups with the correct value.


 

 

 

 

 

 

 

 

 

 

 

We have to correct it by manual typing as shown below.

Here I have shown simple example of how we can use data prep tool for preparing our data for analysis.

Conclusion

Here in this article I explained one example of data preparation using this tool. There are many options to explore more like joins, aggregates and pivots. This Tableau Prep tool is compatible with all previous versions from 10.0 onwards which makes it easier for people to dive in data easily. This tool allows us to publish our output to servers or use with other tableau products directly which makes it more easier to work with it. Keep practicing more to get more out of data.

“Happy Data Visualization!!”

Author Bio

This article was contributed by Perceptive Analytics. Juturu Pavan, Jyothirmai Thondamallu, Saneesh Veetil and Chaitanya Sagar contributed to this article.

Perceptive Analytics provides Tableau Consulting, data analytics, business intelligence and reporting services to e-commerce, retail, healthcare and pharmaceutical industries. Our client roster includes Fortune 500 and NYSE listed companies in the USA and India.

About the Author: Rajeev Pandey

I am a multidisciplinary designer working in data visualization, interaction design and innovation. Expertise in developing Tableau, Web focus based visualization and reporting applications. I have a passion for analyzing, dissecting, and manipulating data sets as well as, building beautiful dashboard. Naturally talented in communicating between technology and business needs. Diverse and experienced in plenty of different domains .I am quick learner who can absorb new ideas and can communicate clearly and effectively.I love creativity and enjoy experimenting with various technologies.

Leave a comment

You must be logged into post a comment.