SQL - Order of Execution Benefits

Understanding how to optimize load processes in HQL/SQL for a better utilization of your cluster.

From time to time I dump some random knowledge to colleagues and I thought this is maybe also relevant for you : )

SQL is a highly structured language, which means it is important to stick to the rules. On the other hand, everything that is not forbidden — is allowed.

Usually, you will order a query like this:

  1. SELECT
  2. FROM
  3. JOIN or OUTER JOIN with ON
  4. WHERE
  5. GROUP BY and optionally HAVING
  6. ORDER BY

But the order of execution is slightly different:

  1. FROM
  2. JOIN or OUTER JOIN with ON
  3. WHERE
  4. GROUP BY and optionally HAVING
  5. SELECT
  6. ORDER BY

Since the FROM is executed before the SELECT we can use the data for multiple operations — at once!

Imagine you work with a big table, like user events, click or access logs and you want to extract data that can not be filtered out by pushing down predicates. This means for every time you want to extract a specific set of data, a full table scan is performed. That is quite expensive, slow, and can be avoided.

Let us populate two tables from the same source. The source tables contain 1000000000 entries.

So in total, we had to iterate twice over the same data source which is an unnecessary waste of resources.

We can optimize that, by merging both queries into one.

By making use of the order of execution, we half the amount of entries we iterate and therefore can save resources on our cluster.

You will notice that the data is still moved in a sequence way after the first steps are executed. To enable the parallel execution on a cluster, you have to enable parallel execution for Hive.

You can see, with this simple example already how you half the amount of processed data. I hope this will help anyone in the future. : )

Data Engineer with a passion for games and beer ;D

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store