I just got back from a great week in Boston at the Inspire conference. It was exciting being around 800+ other data enthusiasts, though I have to say it was humbling too, seeing all the great work everyone else is doing.
Here’s a snapshot of me and my coworkers at the conference:
I especially enjoyed a talk by Chris Love and Team from the Information Lab, showcasing projects they’ve done with Alteryx. It was inspiring seeing how Alteryx is being used to pull together such a broad range of data to create visualizations that are much more than the sum of their parts.
In today’s post and video I’m going to share a quick way to rapidly populate the select tool in Alteryx if you need to change the data type or rename a bunch of fields in your data stream. While Alteryx’s visual interface does result in extremely simple and intuitive workflows, one downside for hard-core developers is the loss in productivity experienced by using a GUI vs. generating code directly. Fortunately there are some workarounds…
Imagine that you have a dataset with over 100 generic column names. You already know the final set of column names that you’d like to use, the question is how can you quickly overlay the new names over the old column names in you data. Below is a screenshot showing the normal manual method for renaming each field or change the data type using the select tool.
Of course that’s all well and good, except you don’t want to have to manually type all the new names in. If you want to do a bulk change, then you can use the Save Field Configuration option.
One of the most common tasks in any ETL project is converting data types. When data begins in the form of a text file or (even worse) a formatted excel file, one often has to load the data as text fields initially and convert to proper data types after parsing out the relevant data.
Thankfully Alteryx has a lot of built in features to facilitate this. One is the Multi-Field tool of which I gave an overview in my earlier post. This handy tool lets you apply the same function across a whole set of columns without having to laboriously replicate it separately for each column as you must do in SQL select statements.
In many cases, however, one doesn’t even need to use any explicit data conversion tool in Alteryx – instead, you can just use the handy select tool which let’s you easily specify the data type you want while also allowing you to rename fields. In the screenshot below, I’m using a select tool to convert generic field names (i.e. Field_1, Field_2, etc.) into more descriptive fields while at the same time converting them to the desired data type.
While this tool works great most of the time, one always seems to run into a catch eventually as I did the other day. I was trying to load some data from CSV files that had extremely precise numbers stored as text (such as “53.534859348793489433498”). When I tried to use the simple select tool to convert these text fields to numbers, I received an error message at run time that the field I was trying to convert “had more precision than a double. Some precision was lost.”
While Alteryx has many great functions right out of the box, I realized one thing they’re lacking is an increment function (or at least I haven’t found one). Anyone who’s worked with SQL is familiar with the Identity data type that allows you to specify a seed and an increment value as follows:
IDENTITY [ (seed ,increment) ]
This allows you to create a column that automatically generates new, unique integers by incrementing the starting “seed” value.
Occasionally when loading text-based keys or natural keys into a normalized database it’s helpful to be able to generate incremented IDs. Below is a snapshot of how I was able to accomplish this in Alteryx using the multi-row function.
The first data input is simply an excel file with a column of text values (item descriptions) as seen below.
One common task most analysts encounter is the need to setup dynamic filters in their data workflows. In SQL this is usually accomplished via a sub-query which is joined in to a main query. In Alteryx, where each step is accomplished via dedicated tools or nodes, the process is similar but much more visual. Usually it involves some combination of the summarize, join, and filter tools.
Here at the TABS Group, we process lots of CPG sales data. Sometimes it comes in a weekly format and we often apply a threshold to remove items which have little to no sales over a two year period. Below is an example how this is easily accomplished via Alteryx.