Containers

Multi-cluster cost monitoring for Amazon EKS using Kubecost and Amazon Managed Service for Prometheus

Introduction

Amazon Managed Service for Prometheus is a Prometheus-compatible service that monitors and provides alerts on containerized applications and infrastructure at scale. In the previous post, Integrating Kubecost with Amazon Managed Service for Prometheus, we discussed how you can integrate Kubecost with Amazon Managed Service for Prometheus (AMP) to get granular visibility into your Amazon Elastic Kubernetes Service (Amazon EKS) cluster costs, letting you aggregate costs by the majority of Kubernetes contexts, starting from the cluster level down to the container level. The integration helps customers monitor a single Amazon EKS cluster without worrying about scaling the Prometheus instance. However, the complexity increases when your infrastructure grows to the size of multiple Amazon EKS clusters running across numerous regions and AWS accounts. You need to retrieve or gather the cost data from multiple endpoints to track the costs and generate reports of multiple Amazon EKS clusters for your show back or chargeback purposes. This is a time consuming and complicated process.

As part of AWS’ partnership with Kubecost, we are excited to announce this subsequent integration with Amazon Managed Service for Prometheus to help customers effectively monitor their Kubernetes costs without worrying about scaling the Prometheus instance. With the Amazon EKS optimized Kubecost bundle or with the Kubecost Enterprise License, AWS customers can now get a unified view into Kubernetes costs across multiple Amazon EKS cluster. In this post, you’ll learn how to set up cost monitoring across multiple Amazon EKS clusters in a federated view with Kubecost and Amazon Managed Service for Prometheus.

Solution overview

The architecture of this integration is similar to Amazon EKS cost monitoring with Kubecost, which is described in the previous post, with some enhancements as follows:

In this integration, an additional AWS SigV4 container is added to the cost-analyzer pod, which acts as a proxy to help query metrics from Amazon Managed Service for Prometheus using the AWS SigV4 signing process. It enables password-less authentication to reduce the risk of exposing your AWS credentials.

When the Amazon Managed Service for Prometheus integration is enabled, the bundled Prometheus server in the Kubecost Helm Chart is configured in the remote_write mode. The bundled Prometheus server sends the collected metrics to Amazon Managed Service for Prometheus using the AWS SigV4 signing process. All metrics and data are stored in Amazon Managed Service for Prometheus, and Kubecost queries the metrics directly from Amazon Managed Service for Prometheus instead of the bundled Prometheus. It helps customers to not worry about maintaining and scaling the local Prometheus instance.

There are two architectures you can deploy:

  • The Quick-Start architecture supports the setup of up to 100 clusters.
  • The Federated architecture supports the setup of over 100 clusters.

Quick-Start architecture

The infrastructure can manage up to 100 clusters. The following architecture diagram illustrates the small-scale infrastructure setup:

Architecture diagram showing design for small-medium scale cluster deployment option

To support the large-scale infrastructure that has over 100 clusters, Kubecost uses Amazon Simple Storage Service (Amazon S3) to improve the query performance efficiently. On top of the Amazon Prometheus Workspace, Kubecost stores the Kubecost’s extract, transform, and load (ETL) data in a central Amazon S3 bucket. Kubecost’s ETL data is a computed cache based on Prometheus’s metrics, from which customers can perform all possible Kubecost queries. By storing the ETL data on an Amazon S3 bucket, this integration offers resiliency to your cost allocation data, improves the performance, and enables high availability architecture for your Kubecost setup.

The following architecture diagram illustrates the large-scale infrastructure setup:

Architecture diagram showing design for large scale cluster deployment option

Walkthrough

Prerequisites

Create Amazon Managed Service for Prometheus workspace

Step 1: run the following command to get the information of your current EKS cluster:

bash
kubectl config current-context

The example output should be in this format:

bash
arn:aws:eks:${AWS_REGION}:${YOUR_AWS_ACCOUNT_ID}:cluster/${YOUR_CLUSTER_NAME}

Step 2: run the following command to create a new Amazon Managed Service for Prometheus workspace

bash
export AWS_REGION=<YOUR_AWS_REGION>
aws amp create-workspace --alias kubecost-amp --region $AWS_REGION

The Amazon Managed Service for Prometheus workspace should be created in a few seconds. Run the following command to get the workspace ID:

bash
export AMP_WORKSPACE_ID=$(aws amp list-workspaces --region ${AWS_REGION} --output json --query 'workspaces[?alias==`kubecost-amp`].workspaceId | [0]' | cut -d'"' -f 2)
echo $AMP_WORKSPACE_ID

Set up the environment

Step 1: set environment variables for integrating Kubecost with Amazon Managed Service for Prometheus

Run the following command to set environment variables for integrating Kubecost with Amazon Managed Service for Prometheus

bash
export RELEASE="kubecost"
export YOUR_CLUSTER_NAME=<YOUR_EKS_CLUSTER_NAME>
export AWS_REGION=${AWS_REGION}
export VERSION="1.104.4"
export KC_BUCKET="kubecost-etl-metrics" # Remove this line if you want to set up small-scale infrastructure 
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export REMOTEWRITEURL="https://aps-workspaces.${AWS_REGION}.amazonaws.com/workspaces/${AMP_WORKSPACE_ID}/api/v1/remote_write"
export QUERYURL="http://localhost:8005/workspaces/${AMP_WORKSPACE_ID}"

Step 2: set up Amazon S3 bucket, AWS IAM policy, and Kubernetes secret for storing Kubecost ETL files

Note: You can ignore this step 2 for the small-scale infrastructure setup

a. Create Object store Amazon S3 bucket to store Kubecost ETL metrics:

Run the following command in your workspace:

bash 
aws s3 mb s3://${KC_BUCKET}

b. Create AWS IAM Policy to grant access to the Amazon S3 bucket.

The following policy is for demo purposes only. You may need to consult your security team and make appropriate changes depending on your organization’s requirements.

Run the following command in your workspace:

bash
# create policy-kubecost-aws-s3.json file
cat <<EOF>policy-kubecost-aws-s3.json
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": "arn:aws:s3:::${KC_BUCKET}"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucketMultipartUploads",
                "s3:AbortMultipartUpload",
                "s3:ListBucket",
                "s3:DeleteObject",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::${KC_BUCKET}",
                "arn:aws:s3:::${KC_BUCKET}/*"
            ]
        }
    ]
}
EOF
# create the AWS IAM policy
aws iam create-policy \
 --policy-name kubecost-s3-federated-policy-$YOUR_CLUSTER_NAME \
 --policy-document file://policy-kubecost-aws-s3.json

c. Create Kubernetes secret to allow Kubecost to write ETL files to the Amazon S3 bucket.

Run the following command in your workspace:

bash
# create manifest file for the secret
cat <<EOF>federated-store.yaml
type: S3
config:
  bucket: "${KC_BUCKET}"
  endpoint: "s3.amazonaws.com"
  region: "${AWS_REGION}"
  insecure: false
  signature_version2: false
  put_user_metadata:
      "X-Amz-Acl": "bucket-owner-full-control"
  http_config:
    idle_conn_timeout: 90s
    response_header_timeout: 2m
    insecure_skip_verify: false
  trace:
    enable: true
  part_size: 134217728
EOF
# create Kubecost namespace and the secret from the manifest file 
kubectl create namespace ${RELEASE}
kubectl create secret generic \
  kubecost-object-store -n ${RELEASE} \
  --from-file federated-store.yaml

Step 3: set up IRSA to allow Kubecost and Prometheus to read and write metrics from Amazon Managed Service for Prometheus

These following commands help to automate the following tasks:

  • Create an AWS IAM role with the AWS managed IAM policy and trusted policy for the following service accounts: kubecost-cost-analyzer-amp, kubecost-prometheus-server-amp.
  • Modify current Kubernetes service accounts with annotation to attach a new AWS IAM role.

Run the following command in your workspace:

bash
eksctl create iamserviceaccount \
    --name kubecost-cost-analyzer-amp \
    --namespace ${RELEASE} \
    --cluster ${YOUR_CLUSTER_NAME} --region ${AWS_REGION} \
    --attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusQueryAccess \
    --attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess \
   --attach-policy-arn arn:aws:iam::${AWS_ACCOUNT_ID}:policy/kubecost-s3-federated-policy-${YOUR_CLUSTER_NAME} \ # Remove this line if you want to set up small-scale infrastructure 
    --override-existing-serviceaccounts \
    --approve

bash
eksctl create iamserviceaccount \
    --name kubecost-prometheus-server-amp \
    --namespace ${RELEASE} \
    --cluster ${YOUR_CLUSTER_NAME} --region ${AWS_REGION} \
    --attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusQueryAccess \
    --attach-policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess \
    --override-existing-serviceaccounts \
    --approve

For more information, you can check AWS documentation for AWS IAM roles for service accounts and learn more about Amazon Managed Service for Prometheus managed policy at Identity-based policy examples for Amazon Managed Service for Prometheus

Integrating Kubecost with Amazon Managed Service for Prometheus

Prepare the configuration file

Run the following command to create a file called config-values.yaml, which contains the defaults that Kubecost uses for connecting to your Amazon Managed Service for Prometheus workspace.

bash
cat << EOF > config-values.yaml
global:
  amp:
    enabled: true
    prometheusServerEndpoint: http://localhost:8005/workspaces/${AMP_WORKSPACE_ID}
    remoteWriteService: https://aps-workspaces.${AWS_REGION}.amazonaws.com/workspaces/${AMP_WORKSPACE_ID}/api/v1/remote_write
    sigv4:
      region: ${AWS_REGION}

sigV4Proxy:
  region: ${AWS_REGION}
  host: aps-workspaces.${AWS_REGION}.amazonaws.com
EOF

Primary cluster

Run this command to install Kubecost and integrate it with the Amazon Managed Service for Prometheus workspace as the primary:

bash
helm upgrade -i ${RELEASE} \
oci://public.ecr.aws/kubecost/cost-analyzer --version $VERSION \
--namespace ${RELEASE} --create-namespace \
-f https://tinyurl.com/kubecost-amazon-eks \
-f config-values.yaml \
-f https://raw.githubusercontent.com/kubecost/poc-common-configurations/main/etl-federation/primary-federator.yaml \ # Remove this line if you want to set up small-scale infrastructure 
--set global.amp.prometheusServerEndpoint=${QUERYURL} \
--set global.amp.remoteWriteService=${REMOTEWRITEURL} \
--set kubecostProductConfigs.clusterName=${YOUR_CLUSTER_NAME} \
--set kubecostProductConfigs.projectID=${AWS_ACCOUNT_ID} \
--set prometheus.server.global.external_labels.cluster_id=${YOUR_CLUSTER_NAME} \
--set federatedETL.federator.primaryClusterID=${YOUR_CLUSTER_NAME} \ # Remove this line if you want to set up small-scale infrastructure 
--set serviceAccount.create=false \
--set prometheus.serviceAccounts.server.create=false \
--set serviceAccount.name=kubecost-cost-analyzer-amp \
--set prometheus.serviceAccounts.server.name=kubecost-prometheus-server-amp \
--set federatedETL.federator.useMultiClusterDB=true \ 

Release Updates

For upcoming release updates, please see the below Kubecost reference

Release Updates

Additional clusters

The installation steps are similar to PRIMARY CLUSTER, except you don’t need to follow the steps in the section Create Amazon Managed Service for Prometheus workspace, and you need to update these environment variables below to match with your ADDITIONAL CLUSTERS. Please note that the AMP_WORKSPACE_ID and KC_BUCKET are the same as the Primary cluster.

bash
export RELEASE="kubecost"
export YOUR_CLUSTER_NAME=<YOUR_EKS_CLUSTER_NAME>
export AWS_REGION="<YOUR_AWS_REGION>"
export VERSION="1.104.4"
export KC_BUCKET="kubecost-etl-metrics"
export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
export REMOTEWRITEURL="https://aps-workspaces.${AWS_REGION}.amazonaws.com/workspaces/${AMP_WORKSPACE_ID}/api/v1/remote_write"
export QUERYURL="http://localhost:8005/workspaces/${AMP_WORKSPACE_ID}"

Run this command to install Kubecost and integrate it with the Amazon Managed Service for Prometheus workspace as the additional cluster:

bash helm upgrade -i ${RELEASE} \ oci://public.ecr.aws/kubecost/cost-analyzer --version $VERSION \ --namespace ${RELEASE} --create-namespace \ -f https://tinyurl.com/kubecost-amazon-eks \ -f config-values.yaml \ -f https://raw.githubusercontent.com/kubecost/poc-common-configurations/main/etl-federation/agent-federated.yaml \ # Remove this line if you want to set up small-scale infrastructure --set global.amp.prometheusServerEndpoint=${QUERYURL} \ --set global.amp.remoteWriteService=${REMOTEWRITEURL} \ --set kubecostProductConfigs.clusterName=${YOUR_CLUSTER_NAME} \ --set kubecostProductConfigs.projectID=${AWS_ACCOUNT_ID} \ --set prometheus.server.global.external_labels.cluster_id=${YOUR_CLUSTER_NAME} \ --set serviceAccount.create=false \ --set prometheus.serviceAccounts.server.create=false \ --set serviceAccount.name=kubecost-cost-analyzer-amp \ --set prometheus.serviceAccounts.server.name=kubecost-prometheus-server-amp \
--set federatedETL.federator.useMultiClusterDB=true \

Monitoring costs of your multi-cluster infrastructure

Expose Kubecost dashboard

After you install Kubecost on the primary cluster and all additional clusters, you can switch back to your primary cluster and run the following command to expose the Kubecost dashboard:

bash
kubectl port-forward --namespace kubecost deployment/kubecost-cost-analyzer 9090

On your web browser, navigate to http://localhost:9090 to access the dashboard.

You can now start monitoring your Amazon EKS cluster cost and efficiency. Depending on your organization’s requirements and setup, there are several options to expose Kubecost for ongoing internal access. You can also check this AWS workshop to learn how to expose Kubecost using AWS Load Balancer Controller.

Using Kubecost dashboard

When you access Kubecost dashboard, the default Overview view shows you comprehensive information about all Amazon EKS clusters monitored by Kubecost with Amazon Managed Service for Prometheus (active) and a list of unmonitored Amazon EKS clusters (unmonitored). You can see it in the following example screenshot:

Kubecost's Overview page showing high level details of data captured in Kubecost

In the Monitor/Allocation view, Kubecost provides granular visibility of your multiple Amazon EKS clusters costs aggregated by different Kubernetes context such as namespaces, controllers, pods, or labels. This help you to understand which parts of your application or projects are contributing to Amazon EKS spend. The following screenshot shows an example of Amazon EKS cluster cost aggregated by Namespace.

Cumulative Cost Explorer screenshot showing Kubecost interface displaying cost graph for the last 7 days.

Additionally, to monitor your AWS services costs in one platform, you can integrate Kubecost with your AWS Cost and Usage reports and enable Cloud Costs to see the costs of each AWS service across your AWS accounts. The following example screenshot shows the cost of each AWS service in the Monitor/Cloud Costs view.

Cloud Cost Explorer screenshot showing Kubecost interface displaying cost graph

Additional usage with Amazon Managed Service for Prometheus

Because all cost metrics emitted by Kubecost are centrally stored and managed in Amazon Managed Service for Prometheus for multiple Amazon EKS clusters, you can integrate with other observability tools supported by Amazon Managed Service for Prometheus to utilize that data. For example, you can write custom cost related PromQL queries and visualize it on Amazon Managed Grafana ,or use Alert Manager in multi-cluster mode. You can learn more about these integrations at Using AWS Observability Accelerator. To learn more about the Amazon Managed Service for Prometheus service quotas, you can refer to the documentation at Amazon Managed Service for Prometheus service quotas.

Cleaning up

bash
aws amp delete-workspace --alias kubecost-amp --region $AWS_REGION 
helm delete kubecost -n kubecost 
kubectl delete ns kubecost 
aws s3 rb s3://${KC_BUCKET}

Conclusion

In this post, we showed you how you can use Kubecost to monitor multi-cluster Amazon EKS environments using Amazon Managed Service for Prometheus as the metrics store so you don’t have to worry about managing your own infrastructure to store Kubecost data. In collaboration with Kubecost, we’re excited to release this new feature that allows you to monitor and track multiple Amazon EKS clusters costs in a single pane of glass. This setup offers rich features exclusively to Amazon EKS customers with no additional Kubecost license required, and includes Kubecost troubleshooting support. If you have Kubecost’s Enterprise license, additional features are enabled, such as Governance features that allow you to set budget rules for different projects or audit the costly deployments on your Amazon EKS cluster. The enterprise licenses are available from Kubecost or through AWS Marketplace. If you would like to learn more from the Kubecost team, contact them here.

Other useful resources for AWS Observability:

Linh Lam, Solutions Architect, Kubecost

Linh Lam is a Kubecost Solution Architect, ISV, focusing on integration and building solutions for customers. He is also passionate about application modernization, serverless, and container technology. Outside of work he enjoys hiking, camping, and building his home audio systems.