AWS for SAP

Automate and Optimize SAP Network Performance in a Multi-AZ deployment

Introduction

There are multiple SAP deployment patterns on Amazon Web Services (AWS) to meet the needs of organizations of all sizes. In AWS, an SAP workload is deployed in an AWS Region and across multiple Availability Zones (AZ) as per AWS patterns for Resilience.

Our standard recommendation is that if there are multiple SAP application servers for your SAP application (e.g. S/4HANA or ECC), these application servers should be deployed across different AZs to improve overall SAP application availability and reliability, as per Figure 1.

Figure 1 : A SAP application with multiple SAP application servers distributed across multi-AZsFigure 1 : A SAP application with multiple SAP application servers distributed across multi-AZs

Figure 1 shows the simplified architecture diagram, where (1) SAP users connect to the SAP application servers, which then (2) connect to the database server. SAP’s client/server architecture allows the support of the largest SAP workloads by scaling out the application tier, i.e. adding multiple SAP application servers.

The trade-off of this architecture is that for some workload operations (such as performance-critical batch jobs) the latency between the AZs might affect runtime performance (as per SAP Lens for Well Architected – Performance recommendations for latency and SAP Note 3496343 (SAP Support Portal access is required)). This blog discusses a solution to run batch or performance-critical workloads on application servers hosted in the same AZ as the database to mitigate this problem.

SAP’s recommendation on latency

In the blog “End-to-End Observability for SAP on AWS: Part 2 – SAP Network Latency Monitoring”, we discussed the importance of SAP network performance between the SAP application and database layer. For a well-performing SAP system, it is important to ensure the network latency between the SAP application tier (i.e. application servers) and the database server follows SAP’s recommendation :

  • Network latency between the SAP application server and database server to be less than 0.7 milliseconds (ms), as per SAP Note 1100926 (SAP Support Portal access is required)
  • Network latency for HANA system replication with synchronous data replication (required to achieve zero data loss) to be less than 1 ms

Latency impacts for AWS for SAP

Generally, the Inter-AZ network latency adheres to SAP’s network recommendations as outlined above. However, this latency can vary over time and differ between regions and AZ pairs.

Customers can measure the network latency between AZs (known as Inter-AZ network latency) and network latency within the same AZ (Intra-AZ network latency) using AWS Network Manager – Infrastructure Performance or SAP’s NIPING (SAP Support Portal access is required). The Intra-AZ network latency is lower than the Inter-AZ network latency, given the geographical distance between AZs. Therefore, when architecting SAP workloads for high availability across AZs, we recommend deploying in AZ pairs with the lowest network latency.

An example SAP business process that is performance-critical is the backflush process (automatic goods issue), which is used in manufacturing industries such as automotive, consumer goods, food and beverage, and pharmaceutical. In the automotive industry, the backflush process involves automatically deducting the necessary quantities of raw materials and components from inventory based on the Bill of Materials (BOM) and routing when a production order is confirmed. For example, if a manufacturer is producing 100 car engines, and each engine requires 4 pistons, 8 valves, and 1 crankshaft, the backflush process will automatically deduct 400 pistons, 800 valves, and 100 crankshafts from inventory without manual entry. This ensures efficient and accurate inventory management, reduces manual data entry, and provides real-time updates on production progress and material usage. If this backflush process runs slowly, the productivity of the manufacturing line will be impacted.

To understand the impact of Inter-AZ network latency to the backflush process (transaction MFBF) we ran the tests illustrated in Figure 2, which indicates a 4-10x performance decrease in RFC execution time when running on application servers 2 & 3 which are located in a different AZ than the database server.

Figure 2 : Comparison of Inter-AZ network latency impact on performance-critical jobs and transactions

Figure 2 shows the Inter-AZ latency has a significant impact on long-running transactions or batch jobs with time-critical performance requirements that make significant number of database calls (roundtrips). We therefore recommend running these jobs on SAP application servers in the same AZ as the database server, to benefit from the lower Intra-AZ network latency. The solution explained in the next section can help you automate this process, even in the event of a failure and the movement of your primary database from one AZ to another.

Optimizing SAP Network Performance Automatically

To ensure the SAP workload is well-managed and evenly distributed across the SAP application servers, SAP provides the following workload balancing or distribution mechanism :

Table 1 : SAP’s load-balancing mechanism

Let’s look at an example automotive customer running SAP S/4HANA with high availability, and multiple SAP application servers connecting to the database server, as per Figure 3. The performance-critical backflush batch jobs are configured to run in application server 1 using SAP’s load-balancing mechanism as per Table 1.

Figure 3 : Adjusting the SAP load-balancing mechanism to point to the application servers located in the same AZ as the primary database for performance-critical jobs/transactions

  1. Logon / batch / RFC server groups for performance-critical transactions/jobs are configured to point to application server 1 located in the same AZ as the Primary DB.
  2. In the event of a database server failover from Primary DB to Standby DB, the performance-critical transactions/jobs running from application server 1 would experience poor performance due to the slightly higher Intra-AZ network latency.
  3. To resolve this issue, the logon / batch / RFC server groups have to be adjusted to point to application server 2 instead. The proposed solution automatically updates SAP’s load-balancing mechanism (logon groups, batch server groups, and RFC server groups) to point to the application servers that are in the same AZ as the database. This ensures that even in the event of a database failover/failback, performance-critical transactions and jobs are processed on application server(s) within the same AZ as the database.

Figure 4 shows the high-level architecture of the proposed solution. It is similar to Figure 3 with additional application servers. As the solution is developed in SAP’s ABAP language, it will leverage AWS SDK for SAP ABAP which allows notifications to be sent to the IT teams managing the SAP systems of this change, via Amazon Simple Notification Service (SNS) as per step 4 below.

Figure 4 : Dynamically updating SAP Server Groups and using AWS ABAP SDK for notifications

Creating Multi-AZ network optimized solution

The solution can be installed on any SAP on AWS environment running on the SAP Netweaver ABAP architecture, including RISE with SAP, as it uses the SAP ABAP language, ensuring compatibility with any SAP application that uses the ABAP stack.

Important Considerations

  • This solution was successfully tested on SAP S/4HANA 2023.
  • To modify Logon groups and RFC server groups (RZ12), this solution will use SMLG_MODIFY function module.
  • To modify Background processing groups (SM61), this solution will uses CL_BP_SERVER_GROUP class.
  • If you use the notification capability from the AWS SDK for SAP ABAP, please refer to Getting Started with the AWS SDK for SAP ABAP Blog.
  • The sample ABAP code is available in Multi-AZ Network Optimized Solution github.
  • You can use any of the 3 load-balancing mechanisms (for example, you can choose to update the batch server group only, and leave the logon groups and RFC server groups untouched).
  • Performance-critical batch jobs and/or jobs that make significant RFC calls would benefit from having the batch and RFC server groups updated to ensure these jobs run on application servers located in the same AZ as the database.
  • If you would like to implement the solution in an earlier S/4HANA or SAP ECC version, please confirm the availability of both function modules above, and test in a Non-Production system first.

Detecting an AZ/DB failure

When an AZ and/or database failure occurs, the standby database instance is changed to Active role by the High Availability cluster software. Therefore, the hostname of the primary database instance is changed, which can be verified via a SQL query in ABAP.

Figure 5 : The solution workflowFigure 5 : The solution workflow

The solution uses two tables :

  1. /AWSSAMP/MAZ_DB : Contains the primary database hostname, obtained via a SQL query.
  2. /AWSSAMP/MAZ_CO : Contains configuration information of the application servers and the defined logon/server groups. This table determines the AZ of the application servers relative to the primary database, and is to be populated by the customer.

Figure 6 : Table /AWSSAMP/MAZ_CO showing the application servers assigned to the respective logon/server groups
Figure 6 : Table /AWSSAMP/MAZ_CO showing the application servers assigned to the respective logon/server groups

Here is a code snippet to detect AZ/database failure situation. If you save the result of this SQL execution in table /AWSSAMP/MAZ_DB, on the next program execution, you can determine that AZ and/or database failure has occurred if the hostname of the database has changed compared to the previous execution.

DATA: lv_hostname TYPE char20.
lo_con TYPE REF TO cl_sql_connection,
lo_stmt TYPE REF TO cl_sql_statement,
lo_result TYPE REF TO cl_sql_result_set,
lv_sql TYPE string,
lt_data TYPE REF TO data.

TRY.
lo_con = cl_sql_connection=>get_connection( ).
lo_stmt = lo_con->create_statement( ).

lv_sql = |select host from M_DATABASE|.
lo_result = lo_stmt->execute_query( lv_sql ).

get REFERENCE OF lv_hostname into lt_data.
lo_result->set_param( lt_data ).
lo_result->next( ).

lo_con->close( ).
CATCH cx_sql_exception INTO DATA(err).
MESSAGE err->get_text( ) TYPE 'E'.
ENDTRY.


DATA: lo_get_dbhost TYPE REF TO /AWSSAMP/CL_MAZ_GET_DBHOST.

* Get a result of previous execution.
SELECT * INTO TABLE lt_dbhost FROM ZTAWSMULTIDB.

* Compare a current SQL execution with the previous execution
LOOP AT lt_dbhost INTO ls_dbhost.
* If it is different, Updating the current result to a temporary table.
IF lv_hostname NE ls_dbhost-dbhost.
ls_current_dbhost-mandt = '100'.
ls_current_dbhost-dbhost = lv_hostname.
* Update current Active DB Hostname into /AWSSAMP/MAZ_DB
UPDATE /AWSSAMP/MAZ_DB FROM ls_current_dbhost.
ENDIF.
ENDLOOP.

Change the Logon/RFC Server Group

The logon group can be changed via transaction code SMLG and the RFC server group can be changed through transaction code RZ12. The SAP Netweaver ABAP stack provides SMLG_GET_DEFINED_SERVERS and SMLG_MODIFY standard functions to query and change these two groups. Before changing the server groups, you can check existing application servers that are currently registered by calling SMLG_GET_DEFINED_SERVERS function, and then call SMLG_MODIFY to delete existing applications servers and register new application servers in the list.

Here is a code snippet to change Logon and RFC sever groups. The GROUPTYPE input parameter allows you to specify the Group type. For example, ‘S’ means the RFC server Group. The SMLG_MODIFY function is also used for deleting and inserting application servers in the group, so you need to enter the appropriate Type in the MODIFICATN parameter as shown in Sample Code 2. For example, enter ‘D’ to perform deletion.

DATA: BEGIN OF ls_group,
        group_name TYPE char20,
        group_type TYPE char1,
      END OF ls_group.
      
DATA: lt_group LIKE TABLE OF ls_group,
      lv_grouptype TYPE char1.

DATA: lt_modi TYPE TABLE OF RZLLIMODIF,
      ls_modi TYPE RZLLIMODIF,
      lt_del_server TYPE TABLE OF RZLLIAPSRV.

ls_modi-CLASSNAME = ls_group-group_name.
ls_modi-GROUPTYPE = ls_group-group_type.

* Function Modification Type
* I. insertion of an item
* D. deletion of an item
* U. update of an item
ls_modi-MODIFICATN = 'D'. 
ls_modi_erfc-CLASSNAME = ls_group-group_name.
ls_modi_erfc-GROUPTYPE = ls_group-group_type.
ls_modi_erfc-MODIFICATN = 'U'.
INSERT ls_modi_erfc INTO TABLE lt_modi_erfc.

* Get exisiting application servers in Logon/RFC server group
* Sever Group Type
* '' Logon Server Group
* 'S' RFC Server Group
CALL FUNCTION 'SMLG_GET_DEFINED_SERVERS'
  EXPORTING
    GROUPTYPE = ls_group-group_type
    GROUPNAME = ls_group-group_name
  TABLES
    INSTANCES = lt_del_server
  EXCEPTIONS
    no_group_found = 1
    OTHERS         = 2.
        
* Change application servers in Logon/RFC server group.
CALL FUNCTION 'SMLG_MODIFY'
  EXPORTING
    GROUPTYPE = lv_grouptype
  TABLES
    MODIFICATIONS = lt_modi
    ERFC_MODIFICATIONS = lt_modi_erfc
  EXCEPTIONS
    no_group_found = 1
    OTHERS         = 2.

Change the Batch Server Group

The Batch server group can be changed via transaction code SM61. SAP Netweaver ABAP stack provides a standard class CL_BP_SERVER_GROUP to view and change this. To get information about a group that needs to be changed, you need to call the LOAD_DB method, which is declared as a protected section, therefore you create a separate Custom Business Object (CBO) class that inherits from the class.

Here is a code snippet to change the Background processing group. Call the LOAD_DB and GET_LIST method in the class to get the existing application server list, then delete the existing application server list by calling the DEL_FROM_LIST method, and register the new application servers by calling the ADD_TO_LIST method. To ensure the changes are saved, call the SAVE_DB method.

* /AWSSAMP/CL_MAZ_BP_GROUP Class Definition
CLASS /AWSSAMP/CL_MAZ_BP_GROUP DEFINITION INHERITING FROM CL_BP_SERVER_GROUP.
  PUBLIC SECTION.
    METHODS : LOAD_SRV_LIST IMPORTING p_groupname TYPE char20,
              GET_SRV_LIST EXPORTING p_list TYPE BPSRVENTRY,
              DEL_FROM_SRV_LIST IMPORTING p_srv TYPE BPSRVLINE,
              ADD_TO_SRV_LIST IMPORTING p_srv TYPE BPSRVLINE,
              SAVE_SRV_LIST_DB.
ENDCLASS

* /AWSSAMP/CL_MAZ_BP_GROUP Class Implementation
CLASS /AWSSAMP/CL_MAZ_BP_GROUP IMPLEMENTATION.
  * LOAD_SRV_LIST method to call LOAD_DB. To load a group information from DB.
  METHOD LOAD_SRV_LIST.
    TRY.
      CALL METHOD LOAD_DB
        EXPORTING i_name = p_groupname.
    CATCH CX_BP_HEALTH_DATA.
      MESSAGE 'Data Inconsistency Found.' TYPE 'E'.
    CATCH CX_UUID_ERROR.
      MESSAGE 'Error Class for UUID Processing Errors.' TYPE 'E'.
    ENDTRY.
  ENDMETHOD.

  * GET_SRV_LIST method to call LOAD_DB. To get servers for a list.
  METHOD GET_SRV_LIST.
    CALL METHOD GET_LIST
      RECEIVING o_list = p_list.
  ENDMETHOD.

  * DEL_FROM_SRV_LIST method to call DEL_FROM_LIST. To delete a server in a list.
  METHOD DEL_FROM_SRV_LIST.
    CALL METHOD DEL_FROM_LIST
      EXPORTING I_SRV_ENTRY = p_srv.
  ENDMETHOD.

  * ADD_TO_SRV_LIST method to call ADD_TO_LIST. To add a server in a list.
  METHOD ADD_TO_SRV_LIST.
    CALL METHOD ADD_TO_LIST
      EXPORTING I_SRV_ENTRY = p_srv.
  ENDMETHOD.

  * SAVE_SRV_LIST_DB method to call SAVE_DB. To save a list in a DB.
  METHOD SAVE_SRV_LIST_DB.
    TRY.
      CALL METHOD SAVE_DB.
    CATCH CX_BP_DATABASE.
      MESSAGE 'An Error Occurred While Attempting to Write to DB.' TYPE 'E'.
    ENDTRY.
  ENDMETHOD.

ENDCLASS.

Create a background job to execute the ABAP program periodically

By creating a background job through transaction code SM36, you can run the ABAP program (/AWSSAMP/MAZ_SOL) periodically (for example, every 5 minutes). When creating the job, you can set the run time execution interval through the Edit > Start time menu.

Figure 7 : Transaction SM36 Job ConfigurationFigure 7 : Transaction SM36 Job Configuration

The Multi-AZ network optimized solution described above can be applied and tested in your SAP system in less than an hour. It also includes the ability to publish notifications to the Amazon SNS service using the AWS SDK for SAP ABAP, to notify your SAP teams of alerts via email or SMS.

Conclusion

It is important to have a highly available and reliable architecture for SAP solutions that are core to your company business processes, which is why AWS recommends that SAP applications be architected across multiple Availability Zones, as per the SAP lens Well-Architected Framework – Select an architecture suitable for your availability and capacity requirements.

In order to ensure optimal performance for SAP transactions and batch jobs, you can enable automatic switching of SAP logon groups (transaction SMLG), RFC server groups (transaction RZ12) and/or batch server groups (transaction SM61) when there is a failover of database server. This ensures that the performance-critical transactions and batch jobs are always running on application servers in the same AZ as the database server.

In this blog, we have demonstrated how you can benefit from SAP on AWS to achieve high availability and reliability in a multi-AZ architecture, while ensuring optimum performance for your business critical transactions and batch jobs.

To learn more, visit the Multi-AZ network optimized solution github page for sample code.

To learn why thousands of customers choose AWS for SAP, visit the AWS for SAP page.

Join the SAP on AWS Discussion

In addition to your customer account team and AWS Support channels, we launched re:Post – A Reimagined Q&A Experience for the AWS Community. Our AWS for SAP Solution Architecture team regularly monitor the AWS for SAP topic for discussion and questions that could be answered to assist our customers and partners. If your question is not support-related, consider joining the discussion over at re:Post and adding to the community knowledge base.