Fuzzy Joins for Grouping Values Into Buckets

At my company we often work with panel data and need to be able to easily group individual panelists into buckets or profiles.  On a recent project I needed to identify which panelists were Light, Medium, Heavy or Extreme buyers within specific product categories based on how many times they purchased that category each year.

The criteria for what constitutes a certain buyer group was different for each category so I couldn’t just use a simple mathematical formula to figure out which group each buyer falls into.  Instead I needed to compare my panelist data with a criteria file to identify which group a user falls into.  Since this kind of task is commonly encountered and cannot be solved via a simple join, I thought I’d write a quick post explaining how I solved this issue with Alteryx.

Below is a screenshot of my panelist data showing the transaction count for each user ID/Category combination.

Continue reading “Fuzzy Joins for Grouping Values Into Buckets”

Calculating Correlations and R Squared Values with Alteryx

I’ve been busy on a bunch of big projects lately as you can probably tell by my neglected blog.  Today I tried a new tool and thought I’d share how it works and how it can be improved on using a batch macro.  My goal was to compute a bunch of correlation factors using Alteryx.  Below is a sample of my data:

As you can see, each product has five weeks of data.  I wanted to correlate each of my five factor columns to the Dollar Share column to see which factor was most predictive of the dollar share.  The goal was to return the R Squared value of each correlation just as you could easily do in Excel using the RSQR() function.  The R Squared function in Excel is built on the Pearson Product Moment Correlation function.   I found the Pearson Correlation tool in the Data Investigation tab in Alteryx.  Once that was dropped into the workflow, I selected which variables I wanted to compute a correlation.

Continue reading “Calculating Correlations and R Squared Values with Alteryx”

Dynamic Field Renaming in Alteryx

In my work at TABS Analytics, I often find myself building re-usable alteryx workflows for processing sales data coming from syndicated sources such as Nielsen or IRI.  The goal is to build tools that can be applied to data coming from multiple different clients.  Though the basic content of the data is the same, the field names and format are often different depending on the client.  A basic measure such as sales dollars, could come with many different titles such as (Dollars, Sales Dollars, DOLLARS, Value Sales, etc.)  In this example I’m going to walk though how I use the dynamic rename tool to “iron out” these variations up front to avoid errors downstream.

Imagine that we have two different data sources with slightly different field names as shown below.

The goal is to build a workflow that can load both data sources, properly identify the fields, and process without errors.  In this case, the first thing we’ll need to do is rename the fields so that they are consistently identified throughout the rest of the workflow.  The standard Select tool won’t work since it will throw errors if you try to map two different raw field names to the same output name (i.e. mapping Dollars and Sales Dollars to both be renamed as just Dollars).

Continue reading “Dynamic Field Renaming in Alteryx”

Alteryx 10.0 Review

I just upgraded to Alteryx 10.0 today and wanted to share my findings.  Overall the upgrade was quite seamless though they do require an uninstall of the prior version effectively preventing true side-by-side comparisons on the same box.  Below are some key new features that jumped out at me.

Browse Anywhere

Probably the most popular feature everyone was looking forward to was the new “Browse Anywhere” option which is indeed very handy.  You can now view a sample of data flowing through any tool by simply selecting the tool and viewing the Results panel (formerly called output).  Note how the results screen now has several icons on the left that represent different viewing tabs.

Continue reading “Alteryx 10.0 Review”

Spreading Time Periods with Alteryx

One challenge most data integrators eventually face is the need to spread data down to different time periods.  This could mean spreading weekly data down to days, or  multi-week events down to individual weeks.  This can be challenging with traditional databases since spreading typically involves generating an unspecified number of new rows of data which SQL is not really designed to do.

Continue reading “Spreading Time Periods with Alteryx”

Hacking Alteryx – Editing XML Directly

Ok, ok, maybe this isn’t technically hacking, but it sure feels like it!  While at the Alteryx Inspire conference this year I learned that you can actually access the XML behind any workflow tool effectively bypassing the GUI entirely.  Granted, you won’t need to do this very often, since they do have a good GUI, but there are cases where it sure comes in handy.  In one of my recent posts, I showed how this can be a big help with the select tool, and today I’ll show how I just used it on the formula tool.

Continue reading “Hacking Alteryx – Editing XML Directly”

Alteryx Inspire 2015 Highlights

I just got back from a great week in Boston at the Inspire conference.  It was exciting being around 800+ other data enthusiasts, though I have to say it was humbling too, seeing all the great work everyone else is doing.

Here’s a snapshot of me and my coworkers at the conference:

I especially enjoyed a talk by Chris Love and Team from the Information Lab, showcasing projects they’ve done with Alteryx.  It was inspiring seeing how Alteryx is being used to pull together such a broad range of data to create visualizations that are much more than the sum of their parts.

Continue reading “Alteryx Inspire 2015 Highlights”

Automate Populating Select Lists in Alteryx

In today’s post and video I’m going to share a quick way to rapidly populate the select tool in Alteryx if you need to change the data type or rename a bunch of fields in your data stream.  While Alteryx’s visual interface does result in extremely simple and intuitive workflows, one downside for hard-core developers is the loss in productivity experienced by using a GUI vs. generating code directly.  Fortunately there are some workarounds…

Imagine that you have a dataset with over 100 generic column names.  You already know the final set of column names that you’d like to use, the question is how can you quickly overlay the new names over the old column names in you data.  Below is a screenshot showing the normal manual method for renaming each field or change the data type using the select tool.

Of course that’s all well and good, except you don’t want to have to manually type all the new names in.  If you want to do a bulk change, then you can use the Save Field Configuration option.

Continue reading “Automate Populating Select Lists in Alteryx”

Converting Data Types in Alteryx

One of the most common tasks in any ETL project is converting data types.  When data begins in the form of a text file or (even worse) a formatted excel file, one often has to load the data as text fields initially and convert to proper data types after parsing out the relevant data.

Thankfully Alteryx has a lot of built in features to facilitate this.  One is the Multi-Field tool of which I gave an overview in my earlier post.  This handy tool lets you apply the same function across a whole set of columns without having to laboriously replicate it separately for each column as you must do in SQL select statements.

In many cases, however, one doesn’t even need to use any explicit data conversion tool in Alteryx – instead, you can just use the handy select tool which let’s you easily specify the data type you want while also allowing you to rename fields.  In the screenshot below, I’m using a select tool to convert generic field names (i.e. Field_1, Field_2, etc.) into more descriptive fields while at the same time converting them to the desired data type.

While this tool works great most of the time, one always seems to run into a catch eventually as I did the other day.  I was trying to load some data from CSV files that had extremely precise numbers stored as text (such as “53.534859348793489433498”).  When I tried to use the simple select tool to convert these text fields to numbers, I received an error message at run time that the field I was trying to convert “had more precision than a double. Some precision was lost.”

Continue reading “Converting Data Types in Alteryx”