Understanding how to optimize load processes in HQL/SQL for a better utilization of your cluster.

From time to time I dump some random knowledge to colleagues and I thought this is maybe also relevant for you : )

SQL is a highly structured language, which means it is important to stick to the rules. On the other hand, everything that is not forbidden — is allowed.

Usually, you will order a query like this:

  2. FROM
  3. JOIN or OUTER JOIN with ON
  4. WHERE
  5. GROUP BY and optionally HAVING

But the order of execution is slightly different:

  1. FROM
  2. JOIN

