BigQuery: SQL concepts
This article is a part of series where I would sum up some useful SQL concepts and tips that can be used in BigQuery.
Important SQL concepts that you should know for BigQuery:
- CTE (Common Table Expression)
A Common Table Expression (CTE) is the result set of a query which exists temporarily and for use only within the context of a larger query, usually starting with “WITH”. Much like a derived table, the result of a CTE is not stored and exists only for the duration of the query. CTEs, like database views and derived tables, enable users to more easily write and maintain complex queries via increased readability and simplification. This reduction in complexity is achieved by deconstructing ordinarily complex queries into simple blocks to be used, and reused if necessary, in rewriting the query.
- Window functions
A window function computes values over a group of rows and returns a single result for each row. This is different from an aggregate function, which returns a single result for a group of rows. A window function includes an OVER
clause, which defines a window of rows around the row being evaluated.
UNNEST
UNNEST operator is used to convert an ARRAY
into a set of rows, also known as “flattening”: it takes an ARRAY
and returns a table with a single row for each element in the ARRAY
.
STRUCT
vsARRAY
A STRUCT
is like a bundle of “columns” inside a single column. Each field within a STRUCT
has a mandatory data type and an optional name. A field within a STRUCT
can be an ARRAY
, but you’re still having one instance of each field.
An ARRAY
is a bundle of “rows” or a list inside a single row. All elements in an ARRAY
need to be of the same type, such as INT64
, STRING
, STRUCT
and so on. You cannot have an ARRAY
directly under another ARRAY, but you can have an ARRAY
of STRUCTS
which contains an array.
- Aggregations and pivoting
- User-defined functions
- Joins (inner, left)
- Views
- TABLESAMPLE
- Partitioning/Clustering/Sharding
Some useful SQL queries:
- Working with highest values in a table
To find the most expensive product in a table you can use a few different methods, like subquery, MAX_BY
or ANY_VALUE
:
- Extracting nth highest value from a table
A classic SQL interview question “How do you find the second highest salary from the employee table?” can be solved in a few ways. Using a subquery to filter out the maximum salary is a common approach. However, if you’d need to find nth highest salary, stacking multiple subqueries becomes both messy and impractical.
Another way to solve it is by using OFFSET
: to extract second highest salary from the table use OFFSET (1)
to skip the first result – the highest salary – and select the next salary in the ordered list. Similarly, to find the nth highest salary you can easily use OFFSET(n-1)
.