AWS Database Blog

Optimize your database storage for Oracle workloads on AWS, Part 2: Using hybrid partitioning and ILM data movement policies

This is the second post of a two-part series. In Part 1, we explored how you can use Automatic Data Optimization (ADO) and Oracle Information Lifecycle Management (ILM) policies for data compression. In this post, we demonstrate how to use Heat Map statistics to monitor data usage and integrate this information with hybrid partitioning and ILM data movement policies to move data to more cost-effective storage solutions.

Hybrid partitioning based on data access pattern

In this use case, you use Heat Map data tracking and hybrid partitioning to efficiently move data to external storage based on its age and usage patterns. To accomplish this, you use Heat Map statistics alongside a custom procedure. For Amazon Relational Database Service (Amazon RDS) for Oracle, you need to integrate your instance with Amazon Elastic File System (Amazon EFS). Then you create a custom procedure to migrate an internal partition to an external partition on Amazon EFS based on the condition you define. The following diagram illustrates this architecture.

ILM-arch

In this example, you migration partitions that haven’t been accessed in the last 6 months using a condition such as FULL_SCAN < (sysdate - 180), where the FULL_SCAN column in the user_heat_map_segment view represents the latest timestamp when the corresponding segment (or partition, in this example) underwent a full table scan.

For Amazon RDS for Oracle, the implementation steps are as follows:

  1. Create an EFS file system and integrate Amazon EFS with the RDS for Oracle instance.
  2. Create an accessory Amazon Elastic Compute Cloud (Amazon EC2) instance that you will use as part of these instructions. Also, create a security group inbound rule to allow communication between the EC2 instance with Amazon EFS.
  3. On the AWS Management Console, from Amazon EC2, mount Amazon EFS by running the following code (Replace fs-0cac1d632b3bf7f1c.efs.us-east-1.amazonaws.com with your EFS identifier):
    sudo mount -t nfs -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-0cac1d632b3bf7f1c.efs.us-east-1.amazonaws.com:/ /EFSroot
  4. As a user with execute privilege on rdsadmin.rdsadmin_util, create a directory for the EFS file system in Amazon RDS for Oracle (replace p_path_on_efs with correct EFS identifier):
    SQL> BEGIN
    rdsadmin.rdsadmin_util.create_directory_efs(
    p_directory_name => 'DATA_PUMP_DIR_EFS',
    p_path_on_efs => '/rdsefs-fs-0cac1d632b3bf7f1c');
        END;
    /

    If above directory is created as other user, then grant read-write access to demo.

    SQL> grant read, write on directory DATA_PUMP_DIR_EFS to demo;

    Use this directory to keep the files related to external partitions. You can keep files for multiple external partitions in the same Oracle directory.

    Internal partitioned tables combined with Oracle external partitioned tables form a more general partitioning method called hybrid partitioned tables. You can either use a hybrid partitioned table or create an external table to move your aged partition to an external partition. In this example, we use a hybrid partitioned table.

  5. If there are existing ILM policies defined on the table, delete them before converting to a hybrid partitioned table:
    SQL>ALTER TABLE DEMO.ORDERS ILM DELETE_ALL;
  6. Convert the table to a hybrid partitioned table by adding external partition attributes to the ORDERS table:
    SQL> ALTER TABLE DEMO.ORDERS
         ADD EXTERNAL PARTITION ATTRIBUTES
         ( TYPE oracle_datapump 
           DEFAULT DIRECTORY DATA_PUMP_DIR_EFS
           REJECT LIMIT UNLIMITED
         );
    Table altered.
    
    -- Check if table is converted to hybrid-partitioned table 
    
    SQL> SELECT TABLE_NAME, HYBRID FROM USER_TABLES WHERE TABLE_NAME ='ORDERS';
    
    TABLE_NAME	HYB
    --------------- ---
    ORDERS		YES
  7. Optionally, define ILM compression policies on the hybrid partitioned table, as explained in Part 1 of this series. For this use case, we assume that these ILM compression policies have already been applied, and the data is compressed based on its lifecycle.
  8. To identify suitable partitions for conversion to external partitions, evaluate your specific requirements and data access patterns. Use Heat Map statistics to pinpoint the last modification and last access dates for each partition. For example, you can find partitions that haven’t been accessed in the last 6 months with the following SQL condition:
    SQL>select object_name, subobject_name, segment_read_time, full_scan, 
    lookup_scan from user_heat_map_segment where 
       (object_name = 'ORDERS' and full_scan < (sysdate-180)) 
        or 
       (EXISTS (select 1 from user_heat_map_segment where object_name in 
       (select index_name from user_indexes where table_name = 'ORDERS') 
        and lookup_scan < (sysdate-180)
        and object_name = 'ORDERS' and full_scan < (sysdate-180)))
  9. For testing, access data from specific partitions to simulate actual access patterns for hot data by updating their access timestamps in the Heat Map. This will prevent those partitions from being flagged for migration to external storage based on their last access date.
       SQL> select * from DEMO.ORDERS partition(ORDERS_HOT) where rownum < 2;
    
       SQL> select * from DEMO.ORDERS partition (ORDERS_WARM) where rownum < 2;
    
       SQL> select * from DEMO.ORDERS partition (ORDERS_COLD) where rownum < 2;
  10. Create a procedure to move the internal partition to an external partition based on the Heat Map tracking data. The following pseudocode outlines the logic of the move_to_extpartition procedure, which automates the process of identifying and moving older partitions to external partitions based on the Heat Map tracking data:
    Procedure move_to_extpartition(owner, table_name, days) 
    For each partition meeting the specified criteria (full_scan or lookup_scan older than certain days) 
    Create an external table, 
    exchange the partition with the external table, 
    and drop the external table 
    Handle exceptions 
        End Procedure 

    The following is example code:

    SQL> CREATE OR REPLACE PROCEDURE move_to_extpartition(
            p_owner IN VARCHAR2, 
            p_table_name IN VARCHAR2, 
            p_days IN NUMBER
            ) AUTHID CURRENT_USER
        IS 
          partition_name VARCHAR2(50); 
          ext_table VARCHAR2(50); 
          location VARCHAR2(100);
        BEGIN 
          DBMS_OUTPUT.PUT_LINE('Table: ' || p_table_name);
    
        FOR partition_rec IN ( 
               select object_name, subobject_name, segment_read_time, full_scan, 
    lookup_scan from user_heat_map_segment where 
       (object_name = p_table_name and full_scan < (sysdate-p_days)) 
        or 
       (EXISTS (select 1 from user_heat_map_segment where object_name in 
       (select index_name from user_indexes where table_name = p_table_name) 
        and lookup_scan < (sysdate-p_days)
        and object_name = p_table_name and full_scan < (sysdate-p_days)))
        )
    
        LOOP 
            partition_name := partition_rec.SUBOBJECT_NAME; 
            ext_table := p_owner || '.' || partition_name || '_EXT'; 
            location := partition_name || '_ext.dmp';
    
            -- Debugging outputs
            DBMS_OUTPUT.PUT_LINE('External Table: ' || ext_table);
            DBMS_OUTPUT.PUT_LINE('Partition Name: ' || partition_name);
            DBMS_OUTPUT.PUT_LINE('Location: ' || location);
    
            -- Create the external table 
            EXECUTE IMMEDIATE 'CREATE TABLE ' || ext_table || ' 
                               ORGANIZATION EXTERNAL 
                               ( 
                                 TYPE oracle_datapump 
                                 DEFAULT DIRECTORY DATA_PUMP_DIR_EFS
                                 LOCATION (''' || location || ''') 
                               ) 
                               REJECT LIMIT UNLIMITED 
                               AS SELECT * FROM ' || p_owner || '.' || p_table_name || ' PARTITION (' || partition_name || ')';
    
            -- Debugging output
            DBMS_OUTPUT.PUT_LINE('Created external table for partition: ' || partition_name);
    
            -- Exchange the partition with the external table 
            EXECUTE IMMEDIATE 'ALTER TABLE ' || p_owner || '.' || p_table_name || ' EXCHANGE PARTITION ' || partition_name || ' WITH TABLE ' || ext_table;
    
            -- Debugging output
            DBMS_OUTPUT.PUT_LINE('Exchanged partition: ' || partition_name);
    
            -- Drop the external table 
            EXECUTE IMMEDIATE 'DROP TABLE ' || ext_table || ' PURGE';
    
            -- Debugging output
            DBMS_OUTPUT.PUT_LINE('Dropped external table: ' || ext_table);
        END LOOP;
    EXCEPTION 
        WHEN OTHERS THEN 
            DBMS_OUTPUT.PUT_LINE('Error: ' || SQLERRM); 
    END;
    /
  11. To run the procedure, you need to pass the owner, table name, and NUM_OF_DAYS, which specifies the number of days a partition has gone without access:
    SQL> exec move_to_extpartition('<OWNER>', '<TABLENAME>', <NUM_OF_DAYS>);
  12. Schedule this procedure to run periodically for migrating older partitions. For testing, we run this manually and set a condition value of 0.01 days (approximately 15 minutes) for rapid testing:
    -- Check at least one of partition in DEMO.ORDERS is NOT read in last 0.01 days (in reality 180 days). 
    
     SQL> col object_name format a10
     SQL> col subobject_name format a15
     SQL> alter session set nls_date_format = 'Mon/DD/YYYY HH24:MI:SS';
     SQL> select object_name, subobject_name, segment_read_time, full_scan,
      lookup_scan from user_heat_map_segment where
      object_name = 'ORDERS' and full_scan < (sysdate-0.01))
      or
      (EXISTS (select 1 from user_heat_map_segment where object_name in
      (select index_name from user_indexes where table_name = 'ORDERS')
      and lookup_scan < (sysdate-0.01)
      and object_name = 'ORDERS' and full_scan < (sysdate-0.01)))
    
    SUBOBJECT_NAME   SEGMENT_READ_TIME    FULL_SCAN            LOOKUP_SCAN
    --------------- -------------------- -----------           ------------     
    ORDERS_HIST	                         Jul/24/2024           16:26:27
    
    SQL> set serverout on
    SQL> exec move_to_extpartition('DEMO', 'ORDERS', .01);

    After an internal partition is successfully converted to an external partition, it should no longer appear in the USER_SEGMENTS view. Additionally, the NUM_ROWS column in the user_tab_partitions should display a null value, as shown in the following code:

    SQL> select SEGMENT_NAME , PARTITION_NAME , bytes/(1024*1024) "Size MB"
        from user_segments where SEGMENT_NAME  like 'ORDERS';
    
    SEGMENT_NAME         PARTITION_NAME          Size MB
    -------------------- -------------------- ----------
    ORDERS              ORDERS_COLD                  64
    ORDERS              ORDERS_HOT                  104
    ORDERS              ORDERS_WARM                  40
    
    delete me SQL> select PARTITION_NAME, NUM_ROWS from user_tab_partitions where TABLE_NAME='ORDERS';
    
    PARTITION_NAME         NUM_ROWS
    -------------------- ----------
    ORDERS_HIST             
    ORDERS_COLD             2256769
    ORDERS_HOT               744235
    ORDERS_WARM             1503235
  13. Verify that the external partition was created in Amazon EFS:
    SQL> SELECT table_name, type_name, default_directory_name DIRECTORY, reject_limit
         FROM user_xternal_part_tables;
    
    TABLE_NAME	TYPE_NAME	      DIRECTORY	          REJECT_LIMIT    
    -----------  ---------------   -------------------- --------------- 
    ORDERS		ORACLE_DATAPUMP   DATA_PUMP_DIR_EFS    UNLIMITED
    
    SQL> select filename,type 
         from table(rdsadmin.rds_file_util.listdir(p_directory => 'DATA_PUMP_DIR_EFS')) 
         where filename like '%dmp';
    
    FILENAME		              TYPE	
    ------------------------------   ---------                        
    ORDERS_HIST_ext.dmp	              file
  14. Verify that you are able to access data from both the internal and external partitions:
    SQL> select count(*) from demo.orders;
      
    COUNT(*)
    ----------
    10000000
    
    SQL> select count(*) from demo.orders partition (ORDERS_HIST);
      COUNT(*)
    ----------
       5495761
  15. Check the size of table again after converting the partition to an external partition and running the ILM policies:
    SQL> select sum(bytes)/1048576 from user_segments where segment_name = 'ORDERS';
    
    SUM(BYTES)/1048576
    ------------------
    	       208

After implementing hybrid partitioning, the size of the table DEMO.ORDERS further reduced from 1,363 MB to 208 MB.

In RDS Custom for Oracle and self-managed databases running on Amazon EC2, you can create external partitions directly in Amazon Simple Storage Service (Amazon S3) using DBMS_CLOUD, which will further reduce the cost. For more details, see Use the DBMS_CLOUD package in Amazon RDS Custom for Oracle for direct Amazon S3 integration.

External partitions will be read-only partitions and won’t get backed up as a part of Amazon RDS automated and manual snapshots. You should take a backup of your EFS and S3 directories separately. When restoring an RDS for Oracle instance from a snapshot or performing point-in-time recovery (PITR), the external partitions stored in Amazon EFS won’t be automatically restored. However, if you apply the same option group as the source database during the restore process, and the option group is configured to mount the same EFS file system containing the external partition data, the restored RDS instance will have access to the external partitions.

Storage tiering based on data access pattern

ILM data movement policies allow you to automatically move data between different types of storage tiers based on the data’s age, activity level, and importance. This helps optimize storage costs and improve performance by storing frequently accessed, high-value data on high-performance, more expensive storage like io2 or io1, and moving less frequently accessed data or cold data to lower-cost storage like gp3 or gp2. The following diagram illustrates this architecture.

ILM-self-arch

At the time of writing this post, storage tiering is not supported in Amazon RDS for Oracle and RDS Custom for Oracle. Storage tiering-based data management can only be used in self-managed databases running on Amazon EC2.

The steps to implement storage tiering-based ADO are as follows:

  1. Assuming that your database is already using an io2 or io1 EBS volume, add another EBS volume (gp3 or gp2) and attach it to the database EC2 instance.
  2. Create a separate logical volume or ASM data diskgroup for the newly attached EBS volume to store your cold data or less frequently accessed data. You should not add this volume to your existing ASM data diskgroup.
  3. Create two separate tablespaces using different storage tiers. For example:
    SQL>Create tablespace HOT_DATA datafile '+DATA' size 10G
    SQL>Create tablespace COLD_DATA datafile '+DATA_COLD' size 10G
    SQL> select tablespace_name , status 
    		from dba_data_files 
    		where tablespace_name like '%DATA%';
    
    TABLESPACE_NAME 	      STATUS
    ----------------         ---------
    HOT_DATA 	             AVAILABLE
    COLD_DATA 	             AVAILABLE
  4. Assign an appropriate quota on the tablespaces to the user:
    SQL> alter user demo quota unlimited on HOT_DATA;
    SQL> alter user demo quota unlimited on COLD_DATA;
    SQL> grant alter tablespace, select any dictionary to demo;
    SQL> grant all on ts$ to demo;
    SQL> grant all on dba_segments to demo;
  5. Create a table and partitions in the HOT_DATA tablespace:
    SQL> col SEGMENT_NAME format a10
    SQL> col PARTITION_NAME format a15
    SQL> select segment_name, partition_name,tablespace_name from user_segments where   segment_name='ORDERS'; 
    
    SEGMENT_NAME   PARTITION_NAME    TABLESPACE_NAME
    ----------    ---------------   ------------------
    ORDERS	        ORDERS_HIST	      HOT_DATA
    ORDERS	        ORDERS_COLD	      HOT_DATA
    ORDERS	        ORDERS_WARM	      HOT_DATA
    ORDERS	        ORDERS_HOT	      HOT_DATA
  6. Add an ILM policy to add storage tiering. The policy will be applied to the partitions of the table.
    SQL> ALTER TABLE DEMO.ORDERS
     ILM ADD POLICY TIER TO COLD_DATA READ ONLY
     SEGMENT AFTER 90 DAYS OF NO ACCESS
    
    SQL> select * from user_ilmpolicies; 
    
    POLIC POLICY_TYPE   TABLESPACE			    ENA  DEL
    ----- ------------- ------------------------------ ---  ---
    P9   DATA MOVEMENT				           YES  NO
    
    SQL> SELECT policy_name, action_type, scope, tier_tablespace, condition_type, condition_days FROM USER_ilmdatamovementpolicies;
    
    POLICY  ACTION_TYPE  SCOPE     TIER_TABLESPACE    CONDITION_TYPE 	 CONDITION_DAYS
    -----   -----------  -------   -----------------  ----------------   ---------------
    P9     STORAGE	     SEGMENT   COLD_DATA    LAST ACCESS TIME	  90
    
    SQL> select  POLICY_NAME, OBJECT_owner, OBJECT_NAME, SUBOBJECT_NAME , ENABLED, DELETED from user_ilmobjects where policy_name='P9';
    
    POLICY_NAME          OBJECT_OWNER         OBJECT_NAM SUBOBJECT_NAME  ENA DEL
    -------------------- -------------------- ---------- --------------- --- ---
    P9                   DEMO                 ORDERS                    YES NO
    P9                   DEMO                 ORDERS    ORDERS_HIST     YES NO
    P9                   DEMO                 ORDERS    ORDERS_COLD     YES NO
    P9                   DEMO                 ORDERS    ORDERS_WARM     YES NO
    P9                   DEMO                 ORDERS    ORDERS_HOT      YES NO

    According to the defined ILM policy, the ADO decision to move segments to different storage tiers will depend on 90 days of no access.

  7. To test the policy, invoke the ILM policy by setting the ILM evaluation time to seconds as shown in previous examples, flush the Heat Map information, and wait more than 90 seconds:
    SQL> EXECUTE dbms_ilm.flush_all_segments;
    SQL> EXECUTE  dbms_lock.sleep(100);

    For the purposes of this post, we can’t wait for the maintenance window to invoke the ADO policy jobs. Instead, we invoke it manually as the table owner.

  8. Access data from some of the partitions to update the access timestamp of these partitions in the Heat Map. This makes sure the policy is not applied to these partitions.
    SQL> alter session set nls_date_format = 'Mon/DD/YYYY HH24:MI:SS';
    SQL> select * from demo.orders partition(ORDERS_HOT) where rownum < 2;
    SQL> select * from demo.orders partition(ORDERS_WARM) where rownum < 2;
    
    SQL> select object_name, SUBOBJECT_NAME,SEGMENT_READ_TIME, FULL_SCAN, lookup_scan from dba_heat_map_segment  where owner='DEMO' and  object_name like 'ORDERS';
    
    OBJECT_NAME	    SUBOBJECT_NAME	SEGMENT_READ_TIME    FULL_SCAN		        LOOKUP_SCAN
    --------------- --------------- -------------------- -------------------- ------------
    ORDERS		ORDERS_HOT	        Jul/15/2024 00:12:02 Jul/15/2024 00:12:02
    ORDERS		ORDERS_HIST	        Jul/15/2024 00:03:43
    ORDERS		ORDERS_WARM	        Jul/15/2024 00:12:02 Jul/15/2024 00:12:02
    ORDERS		ORDERS_COLD	        Jul/15/2024 00:03:43
    
    
    SQL> SET serveroutput ON SIZE 999999
    SQL > DECLARE
            taskid NUMBER;
          BEGIN
            dbms_ilm.execute_ilm(owner=>'DEMO', object_name=>'ORDERS', task_id=> taskid);
            dbms_output.put_line('Task ID: ' || taskid);
          END;
          /  
    Task ID: 42
  9. Check the status of the submitted job:
    SQL> select POLICY_NAME, SUBOBJECT_NAME, SELECTED_FOR_EXECUTION from USER_ILMEVALUATIONDETAILS where TASK_ID=42;
    
       POLICY_NAM  SUBOBJECT_NAME   SELECTED_FOR_EXECUTION
       ----------  ----------       ------------------------
             P9    ORDERS_HIST      SELECTED FOR EXECUTION
             P9    ORDERS_COLD      SELECTED FOR EXECUTION
             P9    ORDERS_WARM      PRECONDITION NOT SATISFIED
             P9    ORDERS_HOT       PRECONDITION NOT SATISFIED
    
    SQL> select task_id, JOB_STATE, COMMENTS from user_ilmresults  where task_id=5215;
    
          TASK_ID    JOB_STATE
          -------    ----------------
          42       COMPLETED SUCCESSFULLY
  10. After you run the preceding policy, check the tablespace for partitions that aren’t accessed for more than 90 seconds (90 days in reality):
    SQL> select segment_name, partition_name,tablespace_name from user_segments where   segment_name='ORDERS';
    
    SEGMENT_NA PARTITION_NAME  TABLESPACE_NAME
    ---------- --------------- ------------------------------
    ORDERS	   ORDERS_COLD	   COLD_DATA
    ORDERS	   ORDERS_HIST	   COLD_DATA
    ORDERS	   ORDERS_HOT	   HOT_DATA
    ORDERS	   ORDERS_WARM	   HOT_DATA

Two of the partitions moved to the COLD_DATA tablespace.

Clean up

You can disable and delete all ADO policies in a table with the following code:

 SQL>ALTER TABLE demo.orders ILM DISABLE_ALL;
 SQL>ALTER TABLE demo.orders ILM DELETE_ALL;

Use the following commands to clean up the EFS directory DATA_PUMP_DIR_EFS:

-- Cleanup Single file
EXEC UTL_FILE.FREMOVE('DATA_PUMP_DIR_EFS','filename');
	
-- To Cleanup multiple files
begin
for i in (select filename from table(RDSADMIN.RDS_FILE_UTIL.LISTDIR('DATA_PUMP_DIR_EFS')) where type='file')
loop
  	UTL_FILE.FREMOVE ('DATA_PUMP_DIR_EFS', i.filename);
  end loop;
 end;
/

Delete all resources that you created to test this solution, including the EC2 instance, RDS instance, EFS storage and EBS storage. Drop related data diskgroups and tablespaces.

Conclusion

In this post, we showed how to implement hybrid partitioning and ADO to move data to cost-effective storage based on its access patterns, addressing the challenges of growing databases. By applying hybrid partitioning and ILM policies, we reduced the table size by more than 80%. This leads to significant cost savings, particularly for RDS instances running in Multi-AZ configurations or with multiple copies.

Leave your feedback in the comments section.


About the Authors

Archana Sharma is a Sr. Database Specialist Solutions Architect, working with Worldwide Public Sector customers. She has years of experience in relational databases, and is passionate about helping customers in their journey to the AWS Cloud with a focus on database migration and modernization.

Ramesh Pathuri is a Senior Database Consultant with AWS Worldwide Public Sector ProServe. His extensive experience in databases, along with a passion for guiding customers through their innovative transition to the AWS Cloud, enables them to unlock new possibilities through database migration and modernization, and optimize their data and analytics solutions for increased innovation, efficiency, and impact.