BigQuery: SQL concepts

This article is a part of series where I would sum up some useful SQL concepts and tips that can be used in BigQuery.

Important SQL concepts that you should know for BigQuery:

  • CTE (Common Table Expression)

A Common Table Expression (CTE) is the result set of a query which exists temporarily and for use only within the context of a larger query, usually starting with “WITH”. Much like a derived table, the result of a CTE is not stored and exists only for the duration of the query. CTEs, like database views and derived tables, enable users to more easily write and maintain complex queries via increased readability and simplification. This reduction in complexity is achieved by deconstructing ordinarily complex queries into simple blocks to be used, and reused if necessary, in rewriting the query. 

  • Window functions

A window function computes values over a group of rows and returns a single result for each row. This is different from an aggregate function, which returns a single result for a group of rows. A window function includes an OVER clause, which defines a window of rows around the row being evaluated.

  • UNNEST

UNNEST operator is used to convert an ARRAY into a set of rows, also known as “flattening”: it takes an ARRAY and returns a table with a single row for each element in the ARRAY.

  • STRUCT vs ARRAY

A STRUCT is like a bundle of “columns” inside a single column. Each field within a STRUCT has a mandatory data type and an optional name. A field within a STRUCT can be an ARRAY, but you’re still having one instance of each field.

An ARRAY is a bundle of “rows” or a list inside a single row. All elements in an ARRAY need to be of the same type, such as INT64, STRING, STRUCT and so on. You cannot have an ARRAY directly under another ARRAY, but you can have an ARRAY of STRUCTS which contains an array.

  • Aggregations and pivoting
  • User-defined functions
  • Joins (inner, left)
  • Views
  • TABLESAMPLE
  • Partitioning/Clustering/Sharding

Some useful SQL queries:

  • Working with highest values in a table

To find the most expensive product in a table you can use a few different methods, like subquery, MAX_BY or ANY_VALUE:

  • Extracting nth highest value from a table

A classic SQL interview question “How do you find the second highest salary from the employee table?” can be solved in a few ways. Using a subquery to filter out the maximum salary is a common approach. However, if you’d need to find nth highest salary, stacking multiple subqueries becomes both messy and impractical.

Another way to solve it is by using OFFSET: to extract second highest salary from the table use OFFSET (1) to skip the first result – the highest salary – and select the next salary in the ordered list. Similarly, to find the nth highest salary you can easily use OFFSET(n-1).

Leave a Reply Text

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.