I’ve been busy on a bunch of big projects lately as you can probably tell by my neglected blog. Today I tried a new tool and thought I’d share how it works and how it can be improved on using a batch macro. My goal was to compute a bunch of correlation factors using Alteryx. Below is a sample of my data:
As you can see, each product has five weeks of data. I wanted to correlate each of my five factor columns to the Dollar Share column to see which factor was most predictive of the dollar share. The goal was to return the R Squared value of each correlation just as you could easily do in Excel using the RSQR() function. The R Squared function in Excel is built on the Pearson Product Moment Correlation function. I found the Pearson Correlation tool in the Data Investigation tab in Alteryx. Once that was dropped into the workflow, I selected which variables I wanted to compute a correlation.
Ok, ok, maybe this isn’t technically hacking, but it sure feels like it! While at the Alteryx Inspire conference this year I learned that you can actually access the XML behind any workflow tool effectively bypassing the GUI entirely. Granted, you won’t need to do this very often, since they do have a good GUI, but there are cases where it sure comes in handy. In one of my recent posts, I showed how this can be a big help with the select tool, and today I’ll show how I just used it on the formula tool.
One of the most common tasks in any ETL project is converting data types. When data begins in the form of a text file or (even worse) a formatted excel file, one often has to load the data as text fields initially and convert to proper data types after parsing out the relevant data.
Thankfully Alteryx has a lot of built in features to facilitate this. One is the Multi-Field tool of which I gave an overview in my earlier post. This handy tool lets you apply the same function across a whole set of columns without having to laboriously replicate it separately for each column as you must do in SQL select statements.
In many cases, however, one doesn’t even need to use any explicit data conversion tool in Alteryx – instead, you can just use the handy select tool which let’s you easily specify the data type you want while also allowing you to rename fields. In the screenshot below, I’m using a select tool to convert generic field names (i.e. Field_1, Field_2, etc.) into more descriptive fields while at the same time converting them to the desired data type.
While this tool works great most of the time, one always seems to run into a catch eventually as I did the other day. I was trying to load some data from CSV files that had extremely precise numbers stored as text (such as “53.534859348793489433498”). When I tried to use the simple select tool to convert these text fields to numbers, I received an error message at run time that the field I was trying to convert “had more precision than a double. Some precision was lost.”
While Alteryx has many great functions right out of the box, I realized one thing they’re lacking is an increment function (or at least I haven’t found one). Anyone who’s worked with SQL is familiar with the Identity data type that allows you to specify a seed and an increment value as follows:
IDENTITY [ (seed ,increment) ]
This allows you to create a column that automatically generates new, unique integers by incrementing the starting “seed” value.
Occasionally when loading text-based keys or natural keys into a normalized database it’s helpful to be able to generate incremented IDs. Below is a snapshot of how I was able to accomplish this in Alteryx using the multi-row function.
The first data input is simply an excel file with a column of text values (item descriptions) as seen below.