Enhance your SAP Observability using Amazon Managed Prometheus and Grafana

Enterprises who rely on SAP systems for business-critical processes have stringent availability and performance requirements making observability strategy an important part of operational efficiency.

Organizations with large SAP estate(s) running solutions such as SAP HANA, Suite on HANA, or SAP S/4HANA have multiple observability options for these environments. Solutions like SAP Solution Manager, Amazon CloudWatch Application Insights are commonly used to monitor system health and performance. However, looking at enterprise observability strategy, combining SAP with non-SAP solutions and even multicloud configurations for single-pane-of-glass dashboard for visualization and alerting results in an optimized long-term architecture. Tools such as Prometheus and Grafana are already used by several organizations that also use SAP and Amazon managed offerings, namely, Amazon Managed Prometheus (AMP) and Amazon Managed Grafana (AMG) offers cloud native technology to achieve such a setup.

SAP customers are also increasingly choosing AWS for RISE with SAP. For customers who still haven’t moved to RISE, this solution highlights how to improve observability of their SAP workloads. In this blog, you will learn how to setup observability dashboards for SLES (SUSE Linux Enterprise Server) OS and SAP S/4HANA by configuring AMP and AMG by following best practices from SAP Lens for the AWS Well-Architected Framework. After completing the setup, you will be able to view the health of your overall SAP landscape presented in multiple dashboards for different components – operating system, SAP application server, and SAP high availability cluster.

Definitions

We’ll start by defining the terminology used in this blog.

Observability: Observability describes system performance, often by instrumenting a system to collect metrics, logs, and/or traces. Understanding system performance is key to achieving operational excellence and meet business objectives. Although the term “monitoring” is sometimes defined as different from observability, monitoring is an activity that makes a system observable, alongside other activities like tracing and logging.

Amazon Managed Prometheus (AMP): AMP is a serverless, Prometheus-compatible monitoring service for metrics that makes it easier to securely monitor container environments at scale. With AMP, customers can use the same open-source Prometheus data model and query language that they use today to monitor the performance of your workloads. Customers also benefit from improved scalability, availability, and security without having to manage the underlying infrastructure.

Amazon Managed Grafana (AMG): AMG is a fully managed and secure data visualization service that customers can use to instantly query, correlate, and visualize operational metrics, logs, and traces from multiple sources (including SAP, non-SAP, and Hybrid/Multicloud workloads). AMG makes it easy to deploy, operate, and scale Grafana, a widely deployed data visualization tool that is popular for its extensible data support.

Pre-requisites

As a pre-requisite, you should have an SAP system setup, with or without high availability, in an AWS account. The user account used for performing the configurations should have permissions to assign roles in AWS IAM. For this blog, we are using SAP S/4HANA system with ASCS/ERS and SAP HANA DB Clusters, along with two Application servers, operating on the SLES for SAP 15, SP4 operating system.

Architecture

We start by understanding the underlying architecture of the solution. Figure 1 shows a typical architecture for SAP S/4HANA in High Availability (HA) configuration on AWS and Figure 2 shows the observability architecture. All SAP systems are installed on Amazon EC2 instances certified for SAP and all data movement uses VPC endpoints.

Figure 1: SAP S/4HANA High Availability (HA) Architecture on AWS representation

Figure 2: Architecture for SAP observability using Amazon managed Prometheus (AMP) and Amazon Managed Grafana (AMG)

Setup & configuration

The process to configure solution is shown in Figure 3 below.

Figure 3: Steps for SAP observability configuration using AMP & visualizing in AMG

1. Create and configure AMP Workspace

Data is ingested into AMP Workspace using the “remote write” method and used as the dashboard data source by AMG. Ensure that your user id has permission to create the workspace and from AMP console, choose create workspace as shown in figure 4

Figure 4: AMP workspace creation

Once the workspace is created in AMP, the service will provide a remote write URL (as shown in Figure 5). Note the remote write URL since we will need this URL for configuration steps in later section.

Figure 5: AMP workspace endpoint details example

EC2 IAM to stream metrics to AMP

Create an IAM role for Amazon EC2 instance with AmazonPrometheusRemoteWriteAccess AWS managed policy. You can either attach the role to your EC2 instance, launch an EC2 instance with this newly created role, or attach the AmazonPrometheusRemoteWriteAccess policy to an existing role (as shown in Figure 6) already attached to an EC2 instance. You can refer to steps in AWS documentation.

Figure 6: AMP policy name

VPC Endpoint

To facilitate metrics transmission from EC2 instances to AMP privately via AWS backbone network, configuration of VPC endpoint for AMP is required. VPC endpoints enable secure access of managed service from resources (EC2 instances in this scenario) in a VPC.

Next, we will create VPC interface end points for AMP as follows:

From AWS console, choose Endpoints from VPC service, and select aps-workspaces service as shown in Figure 7

Figure 7: AWS VPC endpoint service name for AMP

You may also need to modify security groups within your VPC to allow resources to communicate with these interface endpoints over HTTPS. Detailed instructions on creating interface endpoints are available in the AWS documentation.

Additionally, if your VPC does not have direct internet access, you must also create an interface VPC endpoint for AWS Security Token Service to allow sigv4 to work through an endpoint, as shown in in Figure 8

Figure 8: VPC endpoint service name for AWS Security Token Service

2. Install and configure (Metrics) Exporters and Prometheus on EC2 instances

In this step, you will learn how to install necessary exporters and Prometheus agent on EC2 instances. Prometheus exporters help collect metrics from systems and make them readable for Prometheus. Prometheus agent helps forward metrics to endpoints. The list of exporters for each SAP architectural component is summarized in the table 1. In addition, Prometheus agent installation is required on all EC2 instances to stream data to AMP.

SAP System Role	Exporter Name	URL for More Information
SAP ASCS/ERS Cluster	ClusterLabs ha_cluster_exporter Prometheus node_exporter	https://github.com/ClusterLabs/ha_cluster_exporter https://github.com/prometheus/node_exporter
SAP Application Servers	SUSE sap_host_exporter Prometheus node_exporter	https://github.com/SUSE/sap_host_exporter https://github.com/prometheus/node_exporter

Table 1: List of exporters to be installed on EC2 for each SAP component

2.1 Node Exporter

The Prometheus Node Exporter exposes a wide variety of hardware- and kernel-related metrics, which can be used to show EC2 health status in a dashboard. Here are the steps to install and run node exporter in EC2 instances: (also documented on Prometheus’ website)

Execute below mentioned commands to install, de-compress, and run node exporter on your EC2 instance running Linux (SLES) operating system. Replace <VERSION> and <OS>-<ARCH> placeholders with node exporter version and OS architecture respectively. Node exporter packages available for download can be referenced on Prometheus’ website.

wget https://github.com/prometheus/node_exporter/releases/download/v<VERSION>/node_exporter-<VERSION>.<OS>-<ARCH>.tar.gz

tar xvfz node_exporter-*.*-amd64.tar.gz

cd node_exporter-*.*-amd64

./node_exporter

You can install the node exporter in the directory of your choice (e.g. /usr/local/bin). Once running, the node exporter publishes metrics on /metrics endpoint on port 9100 of the local server. You can verify that metrics are being exported on port 9100 /metrics endpoint by running the following curl command:

curl http://localhost:9100/metrics

The command output is similar to the following:

# HELP node_cpu_seconds_total Seconds the CPUs spent in each mode.

# TYPE node_cpu_seconds_total counter

node_cpu_seconds_total{cpu="0",mode="idle"} 6.8382833e+06

node_cpu_seconds_total{cpu="0",mode="iowait"} 824.38

node_cpu_seconds_total{cpu="0",mode="irq"} 0

# etc.

Complete this step to install node exporter on ALL SAP EC2 instances regardless of the SAP role.

2.2 HA Cluster Exporter on ASCS/ERS EC2 Instances

Clusterlabs HA Cluster Exporter is a stateless HTTP endpoint. On each HTTP request, it locally inspects the cluster status by parsing pre-existing distributed data, provided by the tools of the various cluster components. Exported data include information such as:

Pacemaker cluster summary, nodes and resources stats
Corosync ring errors and quorum votes
DRBD resources, etc.

In a highly available SAP system setup, knowing the status of services such as corosync, pacemaker, and failover status will help you understand the system better and help identify root cause of failures.

Install and run the exporter package with root user or user with sudo permissions in both ASCS and ERS EC2 instances:

zypper install prometheus-ha_cluster_exporter

./ha_cluster_exporter

Running HA Cluster Exporter will export the metrics under the /metrics path, on port 9664 by default. You may validate the HA cluster process status by locating /usr/bin/ha_cluster_exporter process in the list of running processes on the host.

2.3 SAP Host Exporter on Application Server Instances

SAP Host Exporter is a stateless HTTP endpoint. On each HTTP request, it pulls runtime data from the SAP system via the SAPControl web interface. Exported data include information such as:

Start Service processes
Enqueue Server stats
SAP Application Server Dispatcher work process queue stats

Use following commands to install sap_host_exporter

export DISTRO=SLE_15_SP4 # change as per your OS version
zypper addrepo https://download.opensuse.org/repositories/server:/monitoring/$DISTRO/server:monitoring.repo
zypper install prometheus-sap_host_exporter

After installation, you can run the exporter as follows, and connect to the SAPControl web service via Unix Domain Sockets, as shown in Figure 9.

./sap_host_exporter —sap-control-uds /tmp/.sapstream51213

Figure 9: Service sap_host_exporter running as a process in SLES

It will expose the metrics under the /metrics path, on port 9680 by default.

Complete these steps to install SAP Host Exporter on SAP EC2 instances running SAP Application Servers.

2.4 Prometheus agent

Prometheus will scrape the data from EC2 instances and store in AMP. So, in this step, we will install Prometheus in agent mode – since it uses fewer resources, and we don’t need the UI and alerting features that comes out-of-the-box. We will also configure remote write to AMP.

You can install Prometheus agent in the directory of your choice, example /usr/bin. In this example we are going to install Prometheus v2.49.1 on SLES for SAP 15 SP4 operating system as mentioned in the commands below. After the installation of Prometheus agent, we will navigate into Prometheus installation directory.

wget https://github.com/prometheus/prometheus/releases/download/v2.49.1/prometheus-2.49.1.linux-amd64.tar.gz

tar xvfz prometheus-*.tar.gz

cd prometheus-*

Locate the configuration file prometheus.yml, written in YAML format, in the Prometheus installation directory. Content in the file looks something like below.

# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "EC1-CS"

# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.

static_configs:
- targets: ['localhost:9664','localhost:9100']

remote_write:
- url: https://aps-workspaces.us-east-1.amazonaws.com/workspaces/<WORKSPACE ID>/api/v1/remote_write
sigv4:
region: <AWS REGION of AMP Workspace>
queue_config:
max_samples_per_send: 1000
max_shards: 200
capacity: 2500

We will need to make the following changes in prometheus.yml file to adapt for our use-case:

Modify job_name to something that will identify this system in the observability dashboard. For example: SAP S41 App Server for SAP Application Server EC2 instance(s) or S41 ASCS/ERS for SAP central services EC2 instance(s)
Modify targets <host>:<port> config. Entry for host is the host on which exporter is running and entry for port is the port where exporter publishes the metrics, example localhost:9100. You can have more than one target as shown above. For example, in the yml file configuration on SAP Application Server EC2 instance, you will have two entries for targets – ‘localhost:9680′,’localhost:9100’ – to scrap metrics from SAP Host Exporter and Node Exporter respectively
Add remote_write URL section at the end of the yml file. Change remote_write URL to the one you noted during AMP creation in step 1 and change region to AWS Region of your workspace, for example: us-east-1

Once the prometheus.yml file has been updated, run Prometheus in agent mode by executing the command mentioned below.

./prometheus --config.file=./prometheus.yml --enable-feature=agent &

Complete these steps for all EC2 instances identifying the correct port for each SAP component.

At this point, the data is being sent to AMP and we are ready to configure Amazon Managed Grafana.

Prometheus Agent Mode

The Agent mode optimizes Prometheus for the remote write use case. It disables querying, alerting, and local storage, and replaces it with a customized time series database write-ahead-log. Everything else stays the same: scraping logic, service discovery and related configuration.

Enabling exporters as systemd service

We recommend configuring these exporters and agents to autostart at boot, which can be done using systemctl; following is an example for HA Cluster

systemctl --now enable prometheus-ha_cluster_exporters

You can configure other services similarly; refer to SUSE documentation for more details about systemctl.

3. Configure AMG and setup observability dashboards

AMG is a fully managed service for Grafana, a popular choice for observability, that connects with Amazon Managed Prometheus to enable you to query, visualize, and alert on your metrics, logs, and traces.

In this section, you will learn how to configure AMG and then setup observability dashboards for SAP S/4HANA. The steps mentioned in this section will provide guidance on necessary configuration in AMG service to setup observability dashboards for SAP metrics collected in AMP service.

3.1 Create workspace in Amazon Grafana

Let’s start with creating a new workspace in AMG with AMP as the data source. A workspace is AMG is a logical Grafana server.

From AWS console, open Amazon Grafana service, and create workspace with the preferred alias, as shown in Figure 10
Choose your preferred authentication access method between AWS IAM Identity Center and SAML
Optional, but recommended, is to choose the VPC connection from the workspace to your SAP VPC (this avoids sending requests across public internet) if you are connecting to any data sources in your SAP VPC
Choose the Permission type between Service Managed and Customer Managed options
Finally, select Amazon Managed Prometheus as the data source name from the list of Data Sources as shown in Figure 11

Figure 10: AMG workspace alias

Figure 11: Data source name for AMG

It can take a few mins to complete, but at this stage, we have the Grafana workspace ready.

3.2 Configure Amazon Grafana Workspace

Once the workspace creation step completes in AMG, the next step is integration with AMP. Following steps include user creation with Admin action for Grafana workspace console, configuring AMP data source in Grafana workspace console, and importing observability dashboards in Grafana workspace console.

AMG supports AWS IAM Identity Center and IDP stored user credentials supporting SAML as authentication options for accessing Grafana console for AMG workspace. An admin user needs to be setup to access and configure Grafana workspace console. You can either setup user(s) in AWS IAM Identity Center or in an Identity Provider. In this blog, we have setup a user in AWS IAM Identity Center. User(s) setup in AWS IAM Identity Center requires AWSGrafanaAccountAdministrator and AWSSSODirectoryAdministrator policies as mentioned here. Review optional roles and assign as needed. If you chose SAML as the authentication access method for your workspace, please follow the steps mentioned

After the user(s) creation, assign the user(s) in your AMG workspace and perform “Make admin” action to the user that will be used for configuration in the Grafana console. To do this, open AMG in AWS console, click on All Workspaces and click on the newly created workspace. Within the authentication tab, add user(s) or group(s) in AWS Identity Center or Setup SAML Configuration. Once the user is added, select the user, and click on “Make Admin” selection from the Action dropdown (as shown in Figure 12). Once this step is completed, the user(s) configured will be allowed to access the Grafana console for this workspace as admin.

Figure 12: AWS IAM Identity Center user for AMG

Grafana viewer users

Use Admin user only to create dashboards; for viewing the dashboard “viewer” user is recommended for security as well as cost optimization reasons.

Retrieve the URL for Grafana workspace console. To do this, open Amazon Grafana service in AWS console, click on All Workspaces and find the workspace URL associated with the newly created workspace. As shown in Figure 13

Figure 13: Grafana workspace URL example

Login to AMG workspace console by accessing the workspace URL and authenticating with a user configured as admin in the earlier step

After logging into your AMG workspace console, select Amazon Managed Service for Prometheus within the Apps –> AWS Data Sources –> Data Sources section and select the Amazon Managed Prometheus workspace created in Step 1 for the collection of metrics (as shown in Figure 14)

Figure 14: Data source configuration for Grafana workspace

Once you have added your Amazon Managed Prometheus workspace as the data source successfully, it will appear as a configured data source in Administration -> Data sources tab. As shown in Figure 15

Figure 15: AMP as data source for AMG

3.3 Import SAP reports

Grafana dashboards can be created using reports in JSON format; you can either create your own reports or import the ones made available via Grafana. In this blog, we are using the import option and the reports used here are as follows:

OS level data using Node Exporter (ID 1860)
HA cluster running Pacemaker on ASCS/ERS (ID 12229)
SAP Application Server (ID 12761)

To import the reports, while logged into AMG workspace console, navigate to Dashboards tab, and import the JSON reports by either uploading the report or by Grafana.com report ID as shown in Figure 16.

Figure 16: Import dashboard either using JSON upload or Grafana.com dashboard ID

After adding all the reports, you can see the dashboards follows:

Figure 17: OS level metrics dashboard
Figure 18: SAP ASCS/ERS HA Cluster Dashboard
Figure 19: Multi-Cluster overview dashboard when a node is down.
Figure 20: SAP Application Server status and process overview dashboard
Figure 21: SAP Application Server dispatcher queue dashboard

Figure 17: OS level metrics dashboard

Figure 18: SAP ASCS/ERS HA Cluster Dashboard

Figure 19: Multi-Cluster overview dashboard when a node is down

Figure 20: SAP Application Server status and process overview dashboard

Figure 21: SAP Application Server dispatcher queue dashboard

AMG data sources and multicloud dashboards

As shown in the configuration steps (and in Figure 22), you can specify other data sources such as Amazon CloudWatch, Amazon Athena, etc. That enables dashboarding not only for non-SAP systems, but also for hybrid and multicloud setups.

Figure 22: Data sources for AMG

Conclusion

Prometheus and Grafana are powerful open-source tools for monitoring and visualizing SAP landscapes and using AMP with AMG on AWS provides organizations better automation and security posture. By using AMP and AMG for building SAP observability dashboards, you gain a single-pane-of-glass observability dashboard while avoiding heavy lifting of deploying and managing infrastructure and running periodic software updates for Prometheus and Grafana.

In this post, we discussed how you can setup SAP observability dashboards for an SAP S/4HANA system using Amazon Managed Prometheus and Amazon Managed Grafana. We also talked about how you can make use of Grafana with other data sources to integrated non-SAP systems as well. To learn more about AMG, types of dashboards, and security features, start with the AWS documentation.

AWS for SAP