AWS Cloud Operations Blog

Monitor Istio on EKS using Amazon Managed Prometheus and Amazon Managed Grafana

Service Meshes are an integral part of the Kubernetes environment that enables secure, reliable, and observable communication. Istio is an open-source service mesh that provides advanced network features without requiring any changes to the application code. These capabilities include service-to-service authentication, monitoring, and more.

Istio generates detailed telemetry for all service communications within a mesh. This telemetry provides observability of service behavior, thereby empowering operators to troubleshoot, maintain, and optimize their applications. These features don’t impose additional burdens on service developers. To monitor service behavior, Istio generates metrics for all service traffic in, out, and within an Istio service mesh. These metrics provide information on behaviors, like traffic volume, traffic error rates, and request-response latency.

In addition to monitoring the behavior of services within a mesh, it’s essential to monitor the behavior of the mesh itself. Istio components export metrics which provides insights into the health and function of the mesh control plane.

In this post, I’ll show how you can configure an Amazon Elastic Kubernetes Service (Amazon EKS) cluster with Istio as a service mesh,  Amazon Managed service for Prometheus, and Amazon Managed Grafana for monitoring your Istio Control and Data plane metrics. Furthermore, I’ll show you how Amazon Managed Grafana alerts can trigger PagerDuty.

Solution overview

The following diagram shows the complete setup that I’ll explain in this post.

The overall architecture highlights tan Amazon Elastic Kubernetes Service (Amazon EKS) cluster with Istio as a service mesh, Amazon Managed service for Prometheus, and Amazon Managed Grafana for monitoring your Istio Control and Data plane metrics.

Prerequisites

You will need the following to complete the steps in this post:

Create an Amazon EKS Cluster

Let’s start by setting a few environment variables:

export IAA_EKS_CLUSTER=IAA-EKS-CLUSTER
export IAA_ACCOUNT_ID=$(aws sts get-caller-identity --query 'Account' --output text)
export IAA_AWS_REGION=us-west-2 #<-- Change this to match your region
export AWS_REGION=us-west-2 #<-- Change this to match your region
export IAA_AMP_WORKSPACE_NAME=istio-amp-workshop

Prepare a Kubernetes configuration file eks-cluster-config.yaml with a shell script, and create an Amazon EKS cluster using eksctl:

cat << EOF > eks-cluster-config.yaml
---
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig
metadata:
  name: $IAA_EKS_CLUSTER
  region: $IAA_AWS_REGION
  version: '1.21'
managedNodeGroups:
- name: default-ng
  minSize: 1
  maxSize: 3
  desiredCapacity: 2
  iam:
    withAddonPolicies:
      certManager: true
      cloudWatch: true
EOF
eksctl create cluster -f eks-cluster-config.yaml

Installing Istio

In the Kubernetes context, Istio deploys an Envoy proxy as a sidecar container inside every pod that provides a service. One of the significant infrastructure enhancements of tunneling your service traffic through the Istio Envoy proxies is that you automatically collect fine-grained metrics and provide high-level application information (for every service proxy, it is reported).

Use the following commands to install Istio:

echo 'export ISTIO_VERSION="1.10.0"' >> ${HOME}/.bash_profile
source ${HOME}/.bash_profile

curl -L https://istio.io/downloadIstio | ISTIO_VERSION=${ISTIO_VERSION} sh -

cd ${PWD}/istio-${ISTIO_VERSION}
sudo cp -v bin/istioctl /usr/local/bin/

You’ll install all the Istio components using the built-in demo configuration profile. This installation lets you quickly get started evaluating Istio. In this demo, Istio will install in the istio-system namespace.

$ yes | istioctl install --set profile=demo

✔ Istio core installed
✔ Istiod installed
✔ Egress gateways installed
✔ Ingress gateways installed
✔ installation complete

You can verify that the service installation was successful.

$ kubectl -n istio-system get svc

NAME                   TYPE           CLUSTER-IP       EXTERNAL-IP                                                               PORT(S)                                                                      AGE
istio-egressgateway    ClusterIP      10.100.167.140   <none>                                                                    80/TCP,443/TCP,15443/TCP                                                     107s
istio-ingressgateway   LoadBalancer   10.100.15.31     abc1fcfe168bb4d9e8264e8952758806-1033162489.us-east-2.elb.amazonaws.com   15021:30819/TCP,80:30708/TCP,443:32447/TCP,31400:31433/TCP,15443:30201/TCP   107s
istiod                 ClusterIP      10.100.133.178   <none>                                                                    15010/TCP,15012/TCP,443/TCP,15014/TCP                                        117s

Now you can list all of the pods in the istio-system namespace:

$ kubectl -n istio-system get pods

NAME                                    READY   STATUS    RESTARTS   AGE
istio-egressgateway-cd6b59579-vlv6c     1/1     Running   0          2m35s
istio-ingressgateway-78f7794d66-9jbw5   1/1     Running   0          2m35s
istiod-574485bfdc-wtjcg                 1/1     Running   0          2m45s

Deploy sample application

Install this sample Bookinfo application inside a separate namespace to let Istio automatically inject the Sidecar Proxy.

To utilize all of Istio’s features, pods must be running an Istio sidecar proxy. You’ll be using the istio-injection=enabled option to inject Istio sidecar proxy automatically. However, you must still add this label to other namespaces to allow those namespaces to be managed by Istio.

Utilize the following command to enable Istio injection on bookinfo, where we will be deploying our sample application:

kubectl create namespace bookinfo
kubectl label namespace bookinfo istio-injection=enabled
kubectl get ns bookinfo --show-labels

Now you’ll deploy the Bookinfo application to review the key capabilities of istio. The step enables intelligent routing and telemetry data using Amazon Managed Service for Prometheus and Amazon Managed Grafana.

kubectl -n bookinfo apply -f ./samples/bookinfo/platform/kube/bookinfo.yaml

Next, you can verify the deployment using the following kubectl command.

$ kubectl -n bookinfo get pod,svc
NAME                                  READY   STATUS    RESTARTS   AGE
pod/details-v1-79f774bdb9-6jl84       2/2     Running   0          31s
pod/productpage-v1-6b746f74dc-mp6tf   2/2     Running   0          24s
pod/ratings-v1-b6994bb9-kc6mv         2/2     Running   0          29s
pod/reviews-v1-545db77b95-nkztf       2/2     Running   0          27s
pod/reviews-v2-7bf8c9648f-vdzt6       2/2     Running   0          27s
pod/reviews-v3-84779c7bbc-r95gg       2/2     Running   0          26s

NAME                  TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/details       ClusterIP   10.100.180.90    <none>        9080/TCP   33s
service/productpage   ClusterIP   10.100.55.170    <none>        9080/TCP   27s
service/ratings       ClusterIP   10.100.161.35    <none>        9080/TCP   31s
service/reviews       ClusterIP   10.100.156.207   <none>        9080/TCP   30s

Now with the Bookinfo services up and running, you need to make the application accessible from outside of your Amazon EKS cluster, e.g., from a browser and an Istio Gateway is used for this purpose. Let’s now define the virtual service and ingress gateway.

kubectl -n bookinfo  apply -f ./samples/bookinfo/networking/bookinfo-gateway.yaml

This command will take a few minutes to create the Ingress and associate with the services that it exposes.

To verify that the application is reachable, run the following command, select the link, and choose open:

export GATEWAY_URL=$(kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
echo "http://${GATEWAY_URL}/productpage"

Deploying a microservice-based application in an Istio service mesh enables service monitoring and tracing, request (version) routing, resiliency testing, security and policy enforcement, and more consistency across the services and the application.

Before you can use Istio to control the Bookinfo version routing, you need to define the available versions, called subsets, in destination rules. Subsets can be used for scenarios such as A/B testing, canary rollouts or routing to a specific version of a service. Run the below command to create default destination rules for sample Bookinfo services:

kubectl -n bookinfo apply -f ./samples/bookinfo/networking/destination-rule-all.yaml

Setup Amazon Managed Service for Prometheus workspace

A workspace in Amazon Managed Service for Prometheus is a logical space dedicated to storing and querying Prometheus metrics. A workspace supports fine-grained access control for authorizing its management updating, listing, describing, deleting, and the ingestion and querying of metrics.

Use the following command to create an Amazon Managed Service for Prometheus workspace:

aws amp create-workspace \
--alias $IAA_AMP_WORKSPACE_NAME \
--region $IAA_AWS_REGION

Creating the Amazon Managed service for Prometheus workspace takes just a few seconds.

You can improve security and performance by creating VPC-endpoint for Amazon Managed Service for Prometheus. For more information, see Using Amazon Service for Prometheus with interface VPC endpoints.

Ingest metrics and configure permissions

Run the following commands, to perform the following actions:

  • Creates an AWS Identity and Access Management (IAM) role with an IAM policy that has permissions to remote-write into an Amazon Managed service for Prometheus workspace
  • Associate the IAM role with a Kubernetes service account
  • Creates a trust relationship between the IAM role and the OIDC provider hosted in your Amazon EKS cluster
#!/bin/bash
CLUSTER_NAME=$IAA_EKS_CLUSTER
OIDC_PROVIDER=$(aws eks describe-cluster --name $CLUSTER_NAME --query "cluster.identity.oidc.issuer" --output text | sed -e "s/^https:\/\///")
PROM_SERVICE_ACCOUNT_NAMESPACE=istio-system
GRAFANA_SERVICE_ACCOUNT_NAMESPACE=istio-system
SERVICE_ACCOUNT_NAME=iamproxy-service-account
SERVICE_ACCOUNT_IAM_ROLE=EKS-AMP-ServiceAccount-Role
SERVICE_ACCOUNT_IAM_ROLE_DESCRIPTION="IAM role for the K8s service account with write access to AMP"
SERVICE_ACCOUNT_IAM_POLICY=AWSManagedPrometheusWriteAccessPolicy
SERVICE_ACCOUNT_IAM_POLICY_ARN=arn:aws:iam::$IAA_ACCOUNT_ID:policy/$SERVICE_ACCOUNT_IAM_POLICY
#
# Setup a trust policy designed for a specific combination of K8s service account and namespace to sign in from a Kubernetes cluster that hosts the OIDC Idp.
# If the IAM role already exists, then add this new trust policy to the existing trust policy
#
echo “Creating a new trust policy”
read -r -d '' NEW_TRUST_RELATIONSHIP <<EOF
 [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${IAA_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${OIDC_PROVIDER}:sub": "system:serviceaccount:${GRAFANA_SERVICE_ACCOUNT_NAMESPACE}:${SERVICE_ACCOUNT_NAME}"
        }
      }
    },
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${IAA_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${OIDC_PROVIDER}:sub": "system:serviceaccount:${PROM_SERVICE_ACCOUNT_NAMESPACE}:${SERVICE_ACCOUNT_NAME}"
        }
      }
    }
  ]
EOF
#
# Get the old trust policy, if one exists, and append it to the new trust policy
#
OLD_TRUST_RELATIONSHIP=$(aws iam get-role --role-name $SERVICE_ACCOUNT_IAM_ROLE --query 'Role.AssumeRolePolicyDocument.Statement[]' --output json)
COMBINED_TRUST_RELATIONSHIP=$(echo $OLD_TRUST_RELATIONSHIP $NEW_TRUST_RELATIONSHIP | jq -s add)
echo “Appending to the existing trust policy.”
read -r -d '' TRUST_POLICY <<EOF
{
  "Version": "2012-10-17",
  "Statement": ${COMBINED_TRUST_RELATIONSHIP}
}
EOF
echo "${TRUST_POLICY}" > TrustPolicy.json
#
# Setup the permission policy grants write permissions for all AMP workspaces
#
read -r -d '' PERMISSION_POLICY <<EOF
{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect":"Allow",
         "Action":[
            "aps:RemoteWrite",
            "aps:QueryMetrics",
            "aps:GetSeries",
            "aps:GetLabels",
            "aps:GetMetricMetadata"
         ],
         "Resource":"*"
      }
   ]
}
EOF
echo "${PERMISSION_POLICY}" > PermissionPolicy.json
#
# Create an IAM permission policy to be associated with the role, if the policy does not already exist
#
SERVICE_ACCOUNT_IAM_POLICY_ID=$(aws iam get-policy --policy-arn $SERVICE_ACCOUNT_IAM_POLICY_ARN --query 'Policy.PolicyId' --output text)
if [ "$SERVICE_ACCOUNT_IAM_POLICY_ID" = "" ]; 
then
  echo "Creating a new permission policy $SERVICE_ACCOUNT_IAM_POLICY"
  aws iam create-policy --policy-name $SERVICE_ACCOUNT_IAM_POLICY --policy-document file://PermissionPolicy.json 
else
  echo "Permission policy $SERVICE_ACCOUNT_IAM_POLICY already exists"
fi
#
# If the IAM role already exists, just update the trust policy.
# Otherwise, create one using the trust policy and permission policy
#
SERVICE_ACCOUNT_IAM_ROLE_ARN=$(aws iam get-role --role-name $SERVICE_ACCOUNT_IAM_ROLE --query 'Role.Arn' --output text)
if [ "$SERVICE_ACCOUNT_IAM_ROLE_ARN" = "" ]; 
then
  echo "$SERVICE_ACCOUNT_IAM_ROLE Role does not exist. Creating a new role with a trust and permission policy."
  #
  # Create an IAM role for the Kubernetes service account 
  #
  SERVICE_ACCOUNT_IAM_ROLE_ARN=$(aws iam create-role \
  --role-name $SERVICE_ACCOUNT_IAM_ROLE \
  --assume-role-policy-document file://TrustPolicy.json \
  --description "$SERVICE_ACCOUNT_IAM_ROLE_DESCRIPTION" \
  --query "Role.Arn" --output text)
  #
  # Attach the trust and permission policies to the Role.
  #
  aws iam attach-role-policy --role-name $SERVICE_ACCOUNT_IAM_ROLE --policy-arn $SERVICE_ACCOUNT_IAM_POLICY_ARN  
else
  echo "$SERVICE_ACCOUNT_IAM_ROLE_ARN Role already exists. Updating the trust policy"
  #
  # Update the IAM role for the Kubernetes service account a with the new trust policy
  #
  aws iam update-assume-role-policy --role-name $SERVICE_ACCOUNT_IAM_ROLE --policy-document file://TrustPolicy.json
fi
echo $SERVICE_ACCOUNT_IAM_ROLE_ARN
# EKS cluster hosts an OIDC provider with a public discovery endpoint.
# Associate this Idp with AWS IAM so that the latter can validate and accept the OIDC tokens issued by Kubernetes to service accounts.
# Doing this with eksctl is the more straightforward approach.
#
eksctl utils associate-iam-oidc-provider --cluster $CLUSTER_NAME --approve

Amazon Managed Service for Prometheus doesn’t directly scrape operational metrics from containerized workloads in a Kubernetes cluster. It requires users to deploy and manage a standard Prometheus server or an OpenTelemetry agent – such as the AWS Distro for OpenTelemetry Collector – in their cluster to perform this task.

Run the following commands to deploy the Prometheus server on the Amazon EKS cluster:


helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
IAA_AMP_WORKSPACE_ID=$(aws amp list-workspaces --alias $IAA_AMP_WORKSPACE_NAME --region=${IAA_AWS_REGION} --query 'workspaces[0].[workspaceId]' --output text)

Create a file called amp_ingest_override_values.yaml with the following content in it. If you’re using a version of Prometheus earlier than 2.26.0, follow the Using older versions of Prometheus documentation.

cat > amp_ingest_override_values.yaml << EOF
## The following is a set of default values for prometheus server helm chart which enable remoteWrite to AMP
## For the rest of prometheus helm chart values see: https://github.com/prometheus-community/helm-charts/blob/main/charts/prometheus/values.yaml
##
serviceAccounts:
  server:
    name: iamproxy-service-account
    annotations: 
      eks.amazonaws.com/role-arn: ${SERVICE_ACCOUNT_IAM_ROLE_ARN}
server:
  remoteWrite:
    - url: https://aps-workspaces.${AWS_REGION}.amazonaws.com/workspaces/${IAA_AMP_WORKSPACE_ID}/api/v1/remote_write
      sigv4:
        region: ${IAA_AWS_REGION}
      queue_config:
        max_samples_per_send: 1000
        max_shards: 200
        capacity: 2500
EOF

Run the following command to install the Prometheus server configuration and configure the remoteWrite endpoint:

helm install prometheus-for-amp prometheus-community/prometheus -n istio-system -f ./amp_ingest_override_values.yaml

To validate the setup, execute the following commands in the terminal. This script returns a list of pods in the cluster.

$kubectl get pods -n istio-system

NAME                                                     READY   STATUS    RESTARTS   AGE
istio-egressgateway-55d4df6c6b-nsz6c                     1/1     Running   0          128m
istio-ingressgateway-69dc4765b4-bz4kc                    1/1     Running   0          128m
istiod-798c47d594-75vtv                                  1/1     Running   0          128m
prometheus-for-amp-alertmanager-766dc65ddb-mgf7t         2/2     Running   0          63s
prometheus-for-amp-kube-state-metrics-764864d6b5-96574   1/1     Running   0          63s
prometheus-for-amp-node-exporter-jdsn6                   1/1     Running   0          63s
prometheus-for-amp-node-exporter-p9jh5                   1/1     Running   0          63s
prometheus-for-amp-pushgateway-68548d4d87-49swt          1/1     Running   0          63s
prometheus-for-amp-server-0                              2/3     Running   0          62s

Optionally, to test whether Amazon Managed service for Prometheus received the metrics, use the awscurl utility to send HTTP requests with AWS Sigv4 authentication. It requires your AWS credentials to authenticate queries to Amazon Managed Service for Prometheus.

awscurl --service="aps" \
--region="$IAA_AWS_REGION" "https://aps-workspaces.$IAA_AWS_REGION.amazonaws.com/workspaces/$IAA_AMP_WORKSPACE_ID/api/v1/query?query=istio_requests_total"

Your results should look similar to the following:


{
	"status": "success",
	"data": {
		"resultType": "vector",
		"result": [
			{
				"metric": {
				"__name__": "istio_requests_total",
				"app": "istio-ingressgateway",
				"chart": "gateways",
				....................................
				....................................
				"version": "v1"
				},
				"value": [
					1647974689.212,
					“1”
				]
			}
		]
	}
}

To collect more telemetry for our Grafana dashboard, open a new terminal tab and use these commands to send traffic to the mesh.


for i in {1..1000}; do curl -s -I -XGET "http://${GATEWAY_URL}/productpage"; done

AWS Single Sign-On (SSO)

To use Amazon Managed Grafana flexibly and conveniently, you can leverage AWS Single Sign-On (AWS SSO) for user management. AWS SSO is available once you’ve enabled AWS Organizations manually, or it’s auto-enabled while setting up AWS Control Tower. For more information see Using AWS SSO with your Amazon Managed Grafana workspace.

Amazon Managed Grafana integrates with AWS SSO to federate identities for your workforce. It redirects users to your company directory to sign in with their existing credentials. Then, they seamlessly authenticate into the Amazon Managed Grafana workspace. This approach enforces security settings such as password policies and two-factor authentication.

Create Amazon Managed Grafana Workspace and query metrics from Amazon Managed service for Prometheus workspace

You can easily spin up on-demand, auto-scaled Grafana workspaces (virtual Grafana servers) that let you create unified dashboards across multiple data sources. You must set it up before we can use Amazon Managed Grafana for the following example. In the next section, you’ll use the AWS console to walk you through the required steps and comment on things to consider when performing each step.

Select the Create workspace button in the right upper corner of the Amazon Managed Grafana console landing page. Next, specify the workspace name and optional description.

In this step, you must enable AWS SSO for Amazon Managed Grafana to manage user authentication to Grafana workspaces.

In this step, you must enable AWS SSO for Amazon Managed Grafana to manage user authentication to Grafana workspaces. Furthermore, choose ServiceManaged as the permission type:

Select the following all data sources such as such as AWS IoT Service, AWS X-Ray, Amazon CloudWatch, Amazon OpenSearch, AMP, Amazon TimeStream, Amazon Redshift, Amazon Athena and Amazon SNS as the notification channel on the next screen.

Select the following data sources and Amazon SNS as the notification channel on the next screen. Then, choose Next.

Select Create workspace without any selections on the next screen to create the Amazon Managed Grafana workspace

Select Create workspace without any selections on the next screen to create the Amazon Managed Grafana workspace.

By default, the AWS SSO user has Viewer permissions. Since we’ll be adding new data sources and creating a dashboard in Amazon Managed Grafana, you would want to update the user type as admin. Select configure users and user groups button under the Authentication tab. Next, select the SSO user that you want to use to log in to Grafana, and select the Make Admin button as follows:

Select the Grafana workspace URL in the Summary sectionAWS single sign on screen

Log in to Amazon Managed Grafana workspace

Select the Grafana workspace URL in the Summary section.

This link will take you to the AWS SSO login screen, where you can provide the UserId and Password of your configured AWS SSO user

This link will take you to the AWS SSO login screen, where you can provide the UserId and Password of your configured AWS SSO user.

Query metrics from Amazon Managed service for Prometheus workspace

After authenticating into the Amazon Managed Grafana console, add the Amazon Managed Service for Prometheus data source by selecting Data sources under the Configuration (gear) in the left navigation bar.

Select the ‘Add data source’ button on the right, and select Prometheus as shown in the following.

add data source screen

Configure the Prometheus data source

  •  In Name, add AMPDataSource or any name that you prefer.
  • In URL, add Amazon Managed service for Prometheus workspace remote write URL from Workload Account A without the api/v1/remote_write suffix.
  •  Enable SigV4 auth.
  •  Under SigV4 auth Details section.
  • For Default Region, choose the region where you created the Amazon Managed Service for Prometheus workspace

Select the Save and test button. You will see data source working.

Query Istio metrics

Now import the Grafana dashboards to enable visualizing metrics from the Istio environment. Go to the plus sign on the left navigation bar, and select Import, as shown in the following image.

Type 7639 (Istio Mesh Dashboard) in the Import via grafana.com textbox in the Import screen and select Load. Select the Prometheus data source in the dropdown at the bottom and choose Import. Once complete, you will be able to see the Grafana dashboard showing metrics from the Istio through the Prometheus data source.

You should edit the underlying PromQL queries in the dashboard JSON from [1m] to [5m] if the dashboard panel is empty for the Global Request Volume and HTTP workloads panels. Additional Grafana dashboards for Istio components are available at grafana.com.

More Istio metrics

Segmenting by service and service version, these are a few metrics that you usually want to monitor coming from the Istio Prometheus telemetry:

  • Number of requests: istio_request_count
  • Request duration: istio_request_duration_milliseconds_bucket by source and destination
  • Request size: istio_request_bytes_bucket by source and destination

You can also create your custom dashboard using PromQL (Prometheus Query Language) by creating a custom dashboard. Then add a panel connecting Amazon Managed service for Prometheus as the data source.

Setting up alerts with Amazon Managed Grafana and PagerDuty for Istio HA

Having a centralized incident management process is critical to keeping systems running smoothly. You integrate PagerDuty with Amazon Managed Grafana to monitor Istio metrics and configure alerts. View more details on alerting in and various supported providers at alert notifications for Amazon Managed Grafana.

PagerDuty is an alarm aggregation and dispatching service for system administrators and support teams. It collects alerts from your monitoring tools, provides an overall view of your monitoring alarms, and alerts an on-duty engineer if there’s a problem. To integrate PagerDuty with Amazon Managed Grafana, you can use an existing account or create a new account with free trial on PagerDuty.

Next, log in to your PagerDuty account. Under the Create a Service section, provide a name and description, as shown in the following image.

Select next, and continue selecting next on the upcoming two screens to choose default values. Then, choose API Events V2 and Submit on the integrations page, as shown in the following.

Create a Service Screen

You will see the following screen for the created service with an Integration Key to use for configuring Amazon Managed Grafana for alerting:

Now, Let’s create a notification channel in Amazon Managed Grafana.

Go to the bell icon on left as shown below and click on  Notification channels Tab.

Choose the Add channel button to see the following screen and populate the fields – Name, Type, and Integration Key (from PagerDuty), as follows:

Next, select Test to generate a notification to PagerDuty and select Save.

Switch back to the PagerDuty screen, and navigate to the home page. You will see an alert displayed as follows:

Clean up

You will continue to incur cost until deleting the infrastructure that you created for this post. Use the following commands to clean up the created AWS resources for this demonstration.

# Clean up prometheus.
helm uninstall prometheus-for-amp -n istio-system

aws iam detach-role-policy --role-name $SERVICE_ACCOUNT_IAM_ROLE --policy-arn $SERVICE_ACCOUNT_IAM_POLICY_ARN
aws iam delete-policy --policy-arn $SERVICE_ACCOUNT_IAM_POLICY_ARN
aws iam delete-role --role-name $SERVICE_ACCOUNT_IAM_ROLE
rm -r amp_ingest_override_values.yaml
aws amp delete-workspace --workspace-id $IAA_AMP_WORKSPACE_ID

# Cleaning up bookinfo application and istio.
kubectl -n bookinfo delete -f ./samples/bookinfo/networking/destination-rule-all.yaml
kubectl -n bookinfo delete -f ./samples/bookinfo/networking/bookinfo-gateway.yaml
kubectl -n bookinfo delete -f ./samples/bookinfo/platform/kube/bookinfo.yaml
kubectl delete ns bookinfo istio-system

# Cleaning up Amazon EKS Cluster.
eksctl delete cluster --name=$IAA_EKS_CLUSTER
cd ..
rm -r eks-cluster-config.yaml
rm -rf istio-*

Next, navigate to the Amazon Managed Grafana console to delete the created Amazon Managed Grafana workspace. Finally, log in to PagerDuty to delete the service integration.

Conclusion

This post demonstrated the steps for setting up an Amazon EKS cluster with Istio as a service mesh. It also uses Amazon Managed service for Prometheus and Amazon Managed Grafana to monitor your Istio Control and Data plane metrics. You can also look at the Monitoring your service mesh container environment using Amazon Managed Service for Prometheus post to learn more about monitoring your service mesh container environment with Amazon App Mesh using Amazon Managed service for Prometheus.

Furthermore, I demonstrated how to configure a PagerDuty Service and your Amazon Managed Grafana service for sending alerts to PagerDuty for further Incident Management. You can also look at centralized incident management with AWS Control Tower and Pager Duty blog for more information. Additionally, you can get hands-on experience with the AWS services using the One Observability Workshop.

Authors

Elamaran Shanmugam

Elamaran (Ela) Shanmugam is a Sr. Container Specialist Solutions Architect with Amazon Web Services. Ela is a Container, Observability and Multi-Account Architecture SME and helps AWS customers to design and build scalable, secure and optimized container workloads on AWS. His passion is building and automating Infrastructure to allow customers to focus more on their business. He is based out of Tampa, Florida and you can reach him on twitter @IamElaShan.

Munish Dabra

Munish Dabra is a Sr. Solutions Architect at Amazon Web Services. He is a software technology leader with ~20 years of experience in building scalable and distributed software systems. His current area of interests are containers, observability and AI/ML. He has an educational background in Computer Engineering, and M.B.A from The University of Texas. He is based out of Houston and in his spare time, he loves to play with his two kids and follows Tennis and Cricket.