
I’ve been pretty quiet lately for a good reason. I’ve been totally immersed in a new and exciting project of migrating many of our systems over to a new in-memory database system called Exasol. They claim to be the world’s fastest in-memory database system and so far they haven’t let me down. Complex data shaping tasks that used to take 1.5 hrs in SQL now complete in 7 minutes or less! In this post I’ll walk you through some of the reasons we chose Exasol and what my migration experience has been like.
How I Heard About It
My background, like many, has always been in MS SQL. Once we began using Alteryx, I enjoyed the in-database tool integration Alteryx has with SQL Server and began migrating many of my complex workflows over to Alteryx and out of stored procedures primarily for the other benefits Alteryx provides (flexible ease of use, maintenance, and adaptation). The beauty of this arrangement is that Alteryx essentially serves as an easy-to-use GUI that translates your business processes into a query that is then executed on the database itself.
Of course one of the side-benefits of porting your complex queries into In-Database Alteryx workflows is that you essentially become database platform agnostic. Now that all your actual logic is handled by Alteryx, the only thing left in your database is underlying tables of data and a few simple queries or procedures for loading data. If you want to switch to a different database platform, all you have to do is migrate the data tables, re-point your Alteryx connections, and you’re basically good to go. No complex re-writing of SQL code needed. Once I realized this, I got excited to test out some new databases that would be better suited for our analytic processing needs.
Initially, I had been keeping a keen eye on Snowflake, a new cloud-native database system with a cool architecture that automatically scales up and down based on your usage. While Alteryx connects fine with snowflake, they do not as yet support in-database processing which is obviously a requirement if your dealing with large scale data workloads using the methods outlined above. Snowflake says In-Database Alteryx support should be available sometime this year. While I was waiting for that to come out, our good friends over at Interworks (a great consulting company we use regularly for Tableau support) began telling me about Exasol – a little-know startup based in Germany. Around the same time, the news came out that Atleryx had begun supporting in-database tools with Exasol so I jumped at the chance to try it out.
Key Benefits of Exasol
Speed is obviously the biggest reason people choose in-memory databases. Exasol, like many other analytic databases uses a vertical, columner-based architecture that can be highly compressed to fit in memory. Based on my testing so far, the speed difference vs. a traditional SQL database is really impressive. Most queries are nearly instantaneous, and the speed hardly seems affected by the number of rows in your table. In fact, they specifically say in the manual that there is no need to archive old data as it will not negatively affect performance and may help optimize the indexes in some cases.
The Self-Maintaining features are another big benefit in my opinion. Gone are the days of setting up traces, running query optimizers, adding and removing indexes, and all the old DBA tasks common to SQL and Oracle. All those “necessary evils” are automatically handled in the background by Exasol freeing up your time to focus on the actual data and analysis.
I think the third reason many will like Exasol is it’s native support for modern languages like LUA, PYTHON, R, and many other common languages that can easily be used to create procedures (called scripts) and functions. This means that if you need to be able to handle more complex logic in your stored procedures, you can do so in a more natural-language method vs. being limited by the fairly combersome structure of TSQL. By default, scripts in Exasol are written in the LUA language which is actually quite nice.
Things We Encountered During our Migration
One noticeable difference with Exasol is that they have fewer official data types. Most numbers (including integers) are simply stored as a DECIMAL data type. For integers you simply specify a zero precision. For more on this, visit my thread on their forum.
Exasol’s Explus SQL client is quite intuitive and easy to use ( though not quite as graphical as SSMS). One thing that took a little getting used to is that you must add a colon (;) at the end of every SQL command before it will let you run it.
Another big adjustment coming from SQL Server is that Exasol is case sensitive both in terms of the data as well as scripting and SQL language itself. Because of this, it’s best to convert all your table and column names to UPPER CASE. The SQL client will convert all SQL code to upper case before executing, so if your table or field name is in lower case, it will create an error unless you surround the name in quotes. You’ll also want to watch out for case differences when trying to join two text fields. When writing scripts in Lua, you must use lower case. I found this out the hard way after wasting time trying to debug a simple script in which I had used the word “IF” rather than “if”.
The Exaplus SQL client has a simply SQL command to import data from a local or remote text file which seems to be the fastest way to get bulk data in and out of Exasol. Alteryx unfortunately does not have a “BULK OUTPUT” tool for Exasol yet as they do with MS SQL Server, which means the only “out of the box” option is to write data using the normal ODBC connection in Alteryx which is very slow. My solution was to build a custom macro in Alteryx that writes my data to a local text file and then uses the command line interface for Exaplus to load the data to the appropriate table. This Macro is finally working fairly smoothly for me and will probably be the subject of a future blog post, but my hope is that Alteryx will soon develop their own Bulk Output tool which would be far simpler.
Once you data tables have been moved into Exasol, you can re-use any existing Alteryx IN-DB workflows that you may have by just installing the Exasol ODBC driver and connectors and re-pointing your In-DB workflows to the new server. You may find that you need to go through some parts of your workflow and adapt language-specific formulas such as date comparison formulas that you may have setup in the IN-DB formula tool in Alteryx. If you have a really long, complex in-DB workflow it can be helpful to add some “Write Data” tools throughout the workflow at strategic points to cache the data in temp tables. This greatly simplifies the query execution plan and it can speed up performance despite the added overhead of writing and reading to a temp table.
Overall, the migration went quite smoothly and I was able to migrate production processes within weeks not months. There’s a bunch more I could tell you but I better quit here for now. In future posts I’ll cover deciding how to size your Exasol instance, things from MS SQL that I wish Exasol had, and take you through a walk-through of the Exaplus tool.
As of August 2017, Alteryx supports In-DB for Snowflake. https://community.alteryx.com/t5/Analytics-Blog/Snowflake-In-DB-Functionality-is-Now-Available-Making-11-0-Even/ba-p/77268
Given this information, which would be better to start with, Exasol or Snowflake?
LikeLike
Exasol, definitely.I fought for 5 years to get Exasol in my current company. It took one day to make believers of everyone. Snowflake, and other Cloud solutions have lower initial costs to get going, but very quickly end up costing much more than a 4+1 cluster from Exasol (on commodity hardware). And *nothing* will match the performance of an Exasol cluster.
LikeLike
I agree Exasol will probably always be faster since it’s all in-memory, but if you have a lot of data, it doesn’t scale as well as Snowflake. We’re beginning to migrate to snowflake lately because it scales much easier and so far is costing us a lot less than Exasol since our usage is cyclical.
LikeLike
I think Exasol will still be faster than snowflake but snowflake may be more cost-effective depending on your workload. I really haven’t been able to test Snowflake yet so unfortunately it’s tough to do a true comparison. I would be interested in your thoughts if you end up trying snowflake.
LikeLike
Can anyone reply with what they think about exasol vs snowflake…company, products, innovations, ease of use, deployment time, key differences, costs, and customer support? Thanks you in advance
LikeLike
Craig, we’ve begun converting to Snowflake. They’re product is a little rough around edges still but overall we really like it and believe in their long-term roadmap and model. It’s more scalable than Exasol since you only pay for the minutes you use and there’s no need to ever upgrade the underlying hardware cluster as you grow. For a cyclical user (which is common for data warehouses) I think snowflake will be quite a bit cheaper in the long run. I hope to post many more articles on our experience as we continue to convert to them going forward.
LikeLike
Haven’t been here in a while. I felt I needed to respond to the more recent posts. We did a POC with Snowflake. We had a Snowflake tech on speed dial for the duration. Snowflake was slow, and they could do nothing to speed it up. Conversely, you don’t need to do anything to speed up Exasol. Just load the data and go. Exasol is NOT an on-prem only option. You have more options with Exasol as far as licensing. It scales just fine as SaaS, and only pay for what you use. Lastly, a modern data warehouse is going to be used 24 x 7 to integrate and serve data to other systems.
LikeLike