Posted On: Dec 21, 2017
Amazon Redshift now uses late materialization to reduce the amount of data scanned and improve performance for queries with predicate filters.
Late materialization (row-level filtering) reduces I/O for queries with filters by batching and factoring in the filtering of predicate(s) before fetching data blocks in the next column. For example, consider a query with filters on two columns operating on a CUSTOMER_DETAIL table having a million rows:
SELECT FIRST_NAME, LAST_NAME, EMAIL, CITY, SALES_REP, FROM CUSTOMER_DETAIL WHERE CUSTOMER_SINCE_DATE < ’01-01-2000’ AND CUSTOMER_STATUS_LEVEL IN (’Gold’ , ’Silver’) ;
With late materialization, Redshift fetches a batch of data from the CUSTOMER_SINCE_DATE and CUSTOMER_STATUS_LEVEL columns, then applies the respective predicates. If only 10% of the CUSTOMER_DETAIL table rows satisfy the predicate filters, Redshift can potentially save 90% of the I/O for remaining columns, thus improving query performance. Also, by scanning columns in batches rather than a row at a time, performance is further improved.
With this release, we have added a new ‘is_rlf_scan’ column to the STL_SCAN table. If a query uses late materialization, then this column will have a value of 't' or TRUE; otherwise, it will have a value of 'f' or FALSE.
Amazon Redshift enables late materialization by default. It is automatically triggered for a query when Redshift detects that the query can improve performance. With the latest build 1.0.1583, late materialization is available in all AWS regions.