AWS Cloud Operations Blog

Using AWS Distro for OpenTelemetry and IAM Roles Anywhere on-premises to ingest metrics into Amazon Managed Service for Prometheus

Customers using Prometheus in self-hosted environments face challenges in managing a highly-available, scalable and secure Prometheus server environment, infrastructure for long-term storage, and access control. Amazon Managed Service for Prometheus, a Prometheus-compatible monitoring service for infrastructure and application metrics, solves these problems by providing a fully-managed environment which is tightly integrated with AWS Identity and Access Management (IAM) to control authentication and authorization. In addition to monitoring container workloads running on Amazon Elastic Kubernetes Service (Amazon EKS)/Amazon Elastic Container Service (Amazon ECS), customers can also use Amazon Managed Service for Prometheus to monitor workloads running in their on-premises environment or Amazon Elastic Compute Cloud (Amazon EC2) instances, using Open Telemetry collector.

Configuring Open Telemetry collector for on-premises environments may pose a challenge as you may need to provide programmatic access to your applications. Temporary access keys using AWS Security Token Service (AWS STS), is recommended over long-term credentials, for better security posture, and compliance with best practices such as rotation of the credentials, non-repudiation of actions due to credential sharing across multiple applications or users. However, this requires the use of identity federation (SAML, OIDC, etc.) and add complexity and maintenance overhead.

In this post, we show how to programmatically access your AWS resources running in your on-premises using IAM Roles Anywhere. IAM Roles Anywhere allows your workloads such as servers, containers, and applications to use X.509 digital certificates to obtain temporary AWS credentials and use the same IAM roles and policies that you have configured for your AWS workloads to access AWS resources. We show you how to send metrics from your on-premises workloads to Amazon Managed Service for Prometheus using this approach.

IAM Roles Anywhere uses a trust relationship between your AWS environment and your public key infrastructure (PKI). The following diagram represents this relationship. A trust anchor, trusted by both your on-premises workloads and your AWS environment will allow the secure retrieval of temporary credentials using a X509 Certificate issued by the trusted (common) authority.

Trust relationship of IAM Roles Anywhere. A Certificate authority is trusted by both sides of the solution (AWS and on-premises) and allows to create a temporary session token

Figure 1 . IAM Roles Anywhere trust relationship

For simplicity, we’ll use AWS Private Certificate Authority as the public key infrastructure in this post, but you can find the instructions on how to use your own Certificate Authority in the IAM Roles Anywhere documentation.

Solution overview

The following diagram shows the solution architecture. All steps on the left side can be executed in AWS CloudShell (as long as your user has the right permissions), while the steps on the right must be executed in your remote machine. This blog was written using Ubuntu 22.04.1 (LTS), you might need to adapt the instructions if you use a different system.

Figure 2. Solution Overview

Figure 2. Solution Overview

Step Terminal Description
1 CloudShell Create an AWS Private CA in your AWS Account with a self-signed certificate that will act as the common trusted authority.
2 CloudShell Create a Trust Anchor on IAM Roles Anywhere to establish trust with the AWS Private CA created in the previous step.
3 CloudShell Create an IAM Role with the permissions to write to any Amazon Managed Service for Prometheus Workspace and with a restricted assume role condition.
4 CloudShell Create an IAM Role Anywhere profile to allow trusted workloads to assume the role created in the previous step.
5 CloudShell Create an Amazon Managed Service for Prometheus Workspace to receive the metrics from the workload. On this step, the final command will print all the values (in the form of environment variables) that must be copied to the workload environment to perform the next steps.
6 Virtual Machine (Optional) Install Prometheus node exporter in the workload. This optional step will provide more detailed information about the virtual machine but is not mandatory.
7 Virtual Machine Download and install the AWS Distro for OpenTelemetry Collector (ADOT Collector) and prepare the home folder for the default user (aoc).
8 Virtual Machine Install AWS Signing helper tool provided by IAM Roles Anywhere.
9 Virtual Machine Generate an RSA key pair and a certificate request for the workload. The last command will print the certificate request that must be copied to the AWS Environment for the next step.
10a CloudShell Using AWS Private CA, issue a certificate for the workload based on the request generated in the previous step. The last command will print the workload certificate that must be copied to the workload environment to perform the next steps.
10b Virtual Machine Copy all the files needed for the AWS Signing helper tool to the aoc home folder and configure the proper permissions.
11 Virtual Machine Configure the credential process used by the AWS SDK for Go to use the AWS Signing helper tool, in combination with the key and certificate generated on the previous steps, to generate temporary credentials for the ADOT Collector. Configure ADOT Collector to use SDK for Go to remote write the metrics to the Amazon Managed Service for Prometheus workspace created in Step 5 and start the agent.

The flow of data in the solution can be separated in two parts, a one-time setup of the trust and credentials explained in this blog, and a continuous operation where temporary credential are constantly generated for the remote workload.

The top part of the data flow diagram shows the interaction between the different services or components described in this blog post to setup IAM Roles Anywhere.

The bottom part of the diagram shows the process where ADOT Collector uses the AWS Signing tool to create a session, and assume the role configured in the IAM Role Anywhere role profile. That way temporary credentials are returned to the ADOT Collector user (AWS STS) and in turn they’re used to sign the remote write request (using sigv4) for up to 1h (default session duration) until the credentials expire and the process repeats again with a fresh set of credentials.

Figure 3. Data Flow

Figure 3. Data Flow

Pre-requisites

In this blog we’ll be using two terminals to paste our commands. For the commands that you need to execute on your AWS Environment, we recommend using CloudShell. In order to open an CloudShell terminal, you can follow these steps:

  • Sign in to AWS Management Console.
  • From the AWS Management Console, you can launch CloudShell by choosing the following options available on the navigation bar:
    1. Choose the CloudShell icon.
    2. Start typing “CloudShell” in Search box and then choose the CloudShell option.
Options to launch AWS CloudShell from the AWS Management Console.

Figure 4. CloudShell launch options

You can find more information about CloudShell in the service Getting started page.

Prepare your AWS Environment

Note: The following commands must be executed by a user with elevated privileges on your AWS Account. You can run them using CloudShell.

1. Create an AWS Private Certificate Authority

Note: To use IAM Roles Anywhere, your workloads must use X.509 certificates issued by your certificate authority (CA). You register the CA with IAM Roles Anywhere as a trust anchor to establish trust between your public-key infrastructure (PKI) and IAM Roles Anywhere. You can also use AWS Private Certificate Authority (AWS Private CA) to create a CA and then use that to establish trust with IAM Roles Anywhere. AWS Private CA is a managed private CA service for managing your CA infrastructure and your private certificates.

Use the following commands to create configuration file and use it to create an AWS Private CA and create and import a self-signed Root Certificated for the Certificate Authority.

cat > ca_config.json << EOF
{
   "KeyAlgorithm":"RSA_2048",
   "SigningAlgorithm":"SHA256WITHRSA",
   "Subject":{
      "Country":"US",
      "Organization":"Example Corp",
      "OrganizationalUnit":"Sales",
      "State":"WA",
      "Locality":"Seattle",
      "CommonName":"www.example.com"
   }
}
EOF

PRIVATE_CA_ARN=$(aws acm-pca create-certificate-authority \
     --certificate-authority-configuration file://ca_config.json \
     --certificate-authority-type "ROOT" \
     --tags  Key=Name,Value=IAMRolesAnywhereRootCA \
     --query CertificateAuthorityArn \
     --output text)

echo "Private CA ARN: $PRIVATE_CA_ARN"     
echo "export PRIVATE_CA_ARN=$PRIVATE_CA_ARN" >> delete.env

aws acm-pca get-certificate-authority-csr \
     --certificate-authority-arn $PRIVATE_CA_ARN \
     --output text > ca.csr
     
ROOT_CERTIFICATE_ARN=$(aws acm-pca issue-certificate \
     --certificate-authority-arn $PRIVATE_CA_ARN \
     --csr fileb://ca.csr \
     --signing-algorithm SHA256WITHRSA \
     --template-arn arn:aws:acm-pca:::template/RootCACertificate/V1 \
     --validity Value=365,Type=DAYS \
     --query CertificateArn \
     --output text)
     
echo "Root Certificate ARN: $ROOT_CERTIFICATE_ARN"     
echo "export ROOT_CERTIFICATE_ARN=$ROOT_CERTIFICATE_ARN" >> delete.env

aws acm-pca get-certificate \
    --certificate-authority-arn $PRIVATE_CA_ARN \
    --certificate-arn $ROOT_CERTIFICATE_ARN \
    --output text > ca_cert.pem
    
aws acm-pca import-certificate-authority-certificate \
     --certificate-authority-arn $PRIVATE_CA_ARN \
     --certificate fileb://ca_cert.pem

2. Create a Trust Anchor for IAM Roles Anywhere

Use the following commands to create a Trust Anchor for IAM Roles Anywhere. The anchor will establish trust between IAM Roles Anywhere and the AWS Private CA created in the previous step:

TA_ID=$(aws rolesanywhere create-trust-anchor \
     --name ExternalWorkers \
    --source "sourceData={acmPcaArn=$PRIVATE_CA_ARN},sourceType=AWS_ACM_PCA" \
    --enabled \
    --query trustAnchor.trustAnchorId \
    --output text)

TA_ARN=$(aws rolesanywhere get-trust-anchor \
    --trust-anchor-id $TA_ID \
    --query trustAnchor.trustAnchorArn \
    --output text)

echo "IAM Roles Anywhere Trust Anchor: $TA_ID"     
echo "export TA_ID=$TA_ID" >> delete.env

echo "IAM Roles Anywhere Trust Anchor ARN: $TA_ARN"     
echo "export TA_ARN=$TA_ARN" >> delete.env

3. Create an IAM Role for your workloads with the needed permissions

Create an IAM Role that will be assumed by your workload using IAM Roles Anywhere. For the purpose of this blog, the role will only have permissions to write to the Amazon Managed services for Prometheus endpoint using the managed policy AmazonPrometheusRemoteWriteAccess.

It’s recommended that you add  conditions to the Trust Policy based on attributes extracted from the X509 Certificate as described in the documentation. In our case we added a condition that the Common Name (CN) in the certificate must match the value VM01.

cat > RolesAnywhere-Trust.json << EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "Service": "rolesanywhere.amazonaws.com"
            },
            "Action": [
                "sts:AssumeRole",
                "sts:SetSourceIdentity",
                "sts:TagSession"
            ],
            "Condition": {
                "StringEquals": {
                "aws:PrincipalTag/x509Subject/CN": "VM01"
            }
        }            
        }
    ]
}
EOF
REMOTE_ROLE=$(aws iam create-role \
    --role-name ExternalPrometheusRemoteWrite \
    --assume-role-policy-document file://RolesAnywhere-Trust.json \
    --query Role.Arn --output text)
  
aws iam attach-role-policy \
    --role-name ExternalPrometheusRemoteWrite \
    --policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess   
    
echo "Remote workload IAM Role: $REMOTE_ROLE"     
echo "export REMOTE_ROLE=$REMOTE_ROLE" >> delete.env

4. Create IAM Role Anywhere Profiles

IAM Roles Anywhere profiles specify which roles IAM Roles Anywhere assumes and what your workloads can do with the temporary credentials. In a profile, you can define a session policy to limit the permissions for a created session. See more details about session policies in the IAM documentation.

Use the commands below to create a profile and allow the trusted workloads to assume the IAM Role we just created

PROFILE_ID=$(aws rolesanywhere create-profile \
     --name PrometheusExternal \
    --role-arns $REMOTE_ROLE \
    --enabled \
    --query profile.profileId \
    --output text)
    


PROFILE_ID_ARN=$(aws rolesanywhere get-profile \
    --profile-id $PROFILE_ID \
    --query profile.profileArn \
    --output text)
   

echo "IAM Roles Anywhere profile ID: $PROFILE_ID"     
echo "export PROFILE_ID=$PROFILE_ID" >> delete.env

echo "IAM Roles Anywhere profile ID ARN: $PROFILE_ID_ARN"     
echo "export PROFILE_ID_ARN=$PROFILE_ID_ARN" >> delete.env

5. Create an Amazon Managed Service for Prometheus Workspace

The script below will create an Amazon Managed Service for Prometheus workspace in US-EAST-1 region. If desired, change the WORKLOAD_REGION variable to a supported region mentioned in the docs here.

WORKLOAD_REGION='us-east-1'

WORKSPACE_ID=$(aws amp create-workspace --alias onpremises-demo-workspace \
  --region $WORKLOAD_REGION \
  --output text \
  --query 'workspaceId')
  
WORKSPACE_URL=$(aws amp describe-workspace --region $WORKLOAD_REGION --workspace-id $WORKSPACE_ID --query workspace.prometheusEndpoint --output text)


echo "This is the URL for remote_write configuration: $WORKSPACE_URL" 
echo "export WORKSPACE_ID=$WORKSPACE_ID" >> delete.env

echo "export WORKLOAD_REGION=$WORKLOAD_REGION" >> delete.env
echo "export WORKSPACE_URL=$WORKSPACE_URL" >> delete.env

Finally run this command to print the information needed on your workload. This environment variables will be needed to configure the external credential process used for IAM Roles Anywhere. Copy all the lines starting with export and paste them in your remote workload terminal.

echo "===== Copy these values to your remote workload ===="
echo -e "\n\nexport TA_ARN=$TA_ARN\nexport PROFILE_ID_ARN=$PROFILE_ID_ARN\nexport REMOTE_ROLE=$REMOTE_ROLE\nexport WORKSPACE_URL=$WORKSPACE_URL"

Configure your remote workload

Note: The following commands must be executed in the remote machine where the workload is running.

6. Installing Prometheus Node Exporter (Optional)

The Prometheus Node Exporter exposes a wide variety of hardware- and kernel-related metrics. This is an optional step, but it will expose more metrics from the host to the collector and help to understand the potential of the solution proposed in this blog.

We can install this package using Ubuntu package manager:

sudo apt install prometheus-node-exporter -y

7. Using AWS Distro for Open Telemetry (ADOT) Collector

AWS Distro for OpenTelemetry Collector (ADOT Collector) is an AWS supported version of the upstream OpenTelemetry Collector and is distributed by Amazon. It supports some selected components from the OpenTelemetry community. It is fully compatible with AWS computing platforms including Amazon EC2, Amazon ECS, and Amazon EKS. It enables users to send telemetry data to AWS CloudWatch Metrics and Traces to AWS X-Ray as well as the other supported backends like Prometheus.

In this section, we will show you how you can deploy the ADOT collector to collect metrics and send those metrics to our Amazon Manage Prometheus workspace.

Let’s start by downloading and installing the latest version of the aws-otel-collector. Run the following commands to do so:

mkdir /tmp/adot
cd /tmp/adot
wget https://aws-otel-collector.s3.amazonaws.com/ubuntu/amd64/latest/aws-otel-collector.deb
sudo dpkg -i -E ./aws-otel-collector.deb

ADOT collector default user is aoc and it’s created as part of the installation of the package. We need to make changes in the AWS SDK for Go configuration file so this user is able to assume a role using IAM Roles Anywhere. In order to do so, let’s create a folder to store the x509 Certificates and the appropriate configuration files.

sudo mkdir /home/aoc
sudo chown -R aoc:aoc /home/aoc/

ADOT will be configured to use the sigv4authextension to connect with Amazon Managed Service for Promethues. The Sigv4 authentication extension provides Sigv4 authentication for making requests to AWS services. It adds authentication information to AWS API requests sent by HTTP. This authentication information is added by signing these requests using your AWS credentials.

In turn, the sigv4authextension uses the AWS SDK for Go to obtain AWS Credentials and the credentials are used to sign the API calls using the sigv4 process.

Note: A similar approach can be used for Prometheus Server or Grafana Agent by configuring the corresponding users, but it is out of scope for this blog post.

8. Install AWS Signing helper

To obtain temporary security credentials from AWS Identity and Access Management Roles Anywhere, use the credential helper tool that IAM Roles Anywhere provides. This tool is compatible with the credential_process feature available across the language SDKs. The helper manages the process of creating a signature with the certificate and calling the endpoint to obtain session credentials; it returns the credentials to the calling process in a standard JSON format. This tool is open source and it’s available on GitHub.

Use the following commands to download and install the tool:

wget https://rolesanywhere.amazonaws.com/releases/1.0.3/X86_64/Linux/aws_signing_helper
chmod +x aws_signing_helper
sudo mv aws_signing_helper /usr/bin

9. Generate a key pair and a Certificate Request on the Host

Use to following commands to create an RSA key pair and then use it to create a Certificate request for the host. Note that in the configuration file we’re setting the Common Name (CN) to VM01 to match the condition in our trust policy.

cat > config.txt << EOF

FQDN = vm01.example.com
ORGNAME = Example Corp

[ req ]
default_bits = 2048
default_md = sha256
prompt = no
encrypt_key = no
distinguished_name = dn
req_extensions = v3_req

[ dn ]
C = US
O = Example Corp
CN = VM01

[ v3_req ]
basicConstraints = CA:FALSE
keyUsage = nonRepudiation, digitalSignature, keyEncipherment
extendedKeyUsage = clientAuth,serverAuth

EOF

openssl req -out csr.pem -config config.txt -new -newkey rsa:2048 -nodes -keyout private-key.pem
cat csr.pem

10(a). Generate an x509 Certificate for your workload

Note: The following commands must be executed by a user with elevated privileges on your AWS Account. You can run them using CloudShell.

Use AWS Private CA to generate a certificate for the workload

The previous command will print the content of the Certificate request, similar to this:

-----BEGIN CERTIFICATE REQUEST-----
MIICiTCCAfICCQD6m7oRw0uXOjANBgkqhkiG9w0BAQUFADCBiDELMAkGA1UEBhMC
VVMxCzAJBgNVBAgTAldBMRAwDgYDVQQHEwdTZWF0dGxlMQ8wDQYDVQQKEwZBbWF6
b24xFDASBgNVBAsTC0lBTSBDb25zb2xlMRIwEAYDVQQDEwlUZXN0Q2lsYWMxHzAd
BgkqhkiG9w0BCQEWEG5vb25lQGFtYXpvbi5jb20wHhcNMTEwNDI1MjA0NTIxWhcN
MTIwNDI0MjA0NTIxWjCBiDELMAkGA1UEBhMCVVMxCzAJBgNVBAgTAldBMRAwDgYD
VQQHEwdTZWF0dGxlMQ8wDQYDVQQKEwZBbWF6b24xFDASBgNVBAsTC0lBTSBDb25z
b2xlMRIwEAYDVQQDEwlUZXN0Q2lsYWMxHzAdBgkqhkiG9w0BCQEWEG5vb25lQGFt
YXpvbi5jb20wgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBAMaK0dn+a4GmWIWJ
21uUSfwfEvySWtC2XADZ4nB+BLYgVIk60CpiwsZ3G93vUEIO3IyNoH/f0wYK8m9T
rDHudUZg3qX4waLG5M43q7Wgc/MbQITxOUSQv7c7ugFFDzQGBzZswY6786m86gpE
Ibb3OhjZnzcvQAaRHhdlQWIMm2nrAgMBAAEwDQYJKoZIhvcNAQEFBQADgYEAtCu4
nUhVVxYUntneD9+h8Mg9q6q+auNKyExzyLwaxlAoo7TJHidbtS4J5iNmZgXL0Fkb
FFBjvSfpJIlJ00zbhNYS5f6GuoEDmFJl0ZxBHjJnyp378OD8uTs7fLvjx79LjSTb
NYiytVbZPQUQ5Yaxu2jXnimvw3rrszlaEXAMPLE=
-----END CERTIFICATE REQUEST-----

Copy the certificated request your workload terminal, and save it to a local file in your CloudShell or terminal session where you configured the AWS Private CA and IAM roles. Name the file csr.pem to make it consistent with the original file name.

Use the following commands to request AWS Private CA to issue a certificate for your workload using the request file. The second command will retrieve the issued certificate that must be copied back to your workload machine.

WORKLOAD_CERT=$(aws acm-pca issue-certificate \
      --certificate-authority-arn $PRIVATE_CA_ARN \
     --csr fileb://csr.pem \
     --signing-algorithm "SHA256WITHRSA" \
     --validity Value=200,Type="DAYS" \
     --query CertificateArn \
     --output text)
     
     
aws acm-pca get-certificate \
     --certificate-authority-arn $PRIVATE_CA_ARN \
     --certificate-arn $WORKLOAD_CERT \
     --query Certificate \
     --output text > cert.pem
     
cat cert.pem

Configure your remote workload

Note: The following commands must be executed in the remote machine where the workload is running.

10(b). Setup key and certificate for aoc user

From the previous command will print the content of the Certificate issued by the AWS Private CA, similar to this:

-----BEGIN CERTIFICATE-----
MIICiTCCAfICCQD6m7oRw0uXOjANBgkqhkiG9w0BAQUFADCBiDELMAkGA1UEBhMC
VVMxCzAJBgNVBAgTAldBMRAwDgYDVQQHEwdTZWF0dGxlMQ8wDQYDVQQKEwZBbWF6
b24xFDASBgNVBAsTC0lBTSBDb25zb2xlMRIwEAYDVQQDEwlUZXN0Q2lsYWMxHzAd
BgkqhkiG9w0BCQEWEG5vb25lQGFtYXpvbi5jb20wHhcNMTEwNDI1MjA0NTIxWhcN
MTIwNDI0MjA0NTIxWjCBiDELMAkGA1UEBhMCVVMxCzAJBgNVBAgTAldBMRAwDgYD
VQQHEwdTZWF0dGxlMQ8wDQYDVQQKEwZBbWF6b24xFDASBgNVBAsTC0lBTSBDb25z
b2xlMRIwEAYDVQQDEwlUZXN0Q2lsYWMxHzAdBgkqhkiG9w0BCQEWEG5vb25lQGFt
YXpvbi5jb20wgZ8wDQYJKoZIhvcNAQEBBQADgY0AMIGJAoGBAMaK0dn+a4GmWIWJ
21uUSfwfEvySWtC2XADZ4nB+BLYgVIk60CpiwsZ3G93vUEIO3IyNoH/f0wYK8m9T
rDHudUZg3qX4waLG5M43q7Wgc/MbQITxOUSQv7c7ugFFDzQGBzZswY6786m86gpE
Ibb3OhjZnzcvQAaRHhdlQWIMm2nrAgMBAAEwDQYJKoZIhvcNAQEFBQADgYEAtCu4
nUhVVxYUntneD9+h8Mg9q6q+auNKyExzyLwaxlAoo7TJHidbtS4J5iNmZgXL0Fkb
FFBjvSfpJIlJ00zbhNYS5f6GuoEDmFJl0ZxBHjJnyp378OD8uTs7fLvjx79LjSTb
NYiytVbZPQUQ5Yaxu2jXnimvw3rrszlaEXAMPLE=
-----END CERTIFICATE-----

Copy the certificate output from the CloudShell or Terminal session into your workload machine, and save it as cert.pem in the current folder (/tmp/adot). Run the following commands to create a folder accessible to the aoc user and copy the required files there. Note that the certificate request file csr.pem is not needed anymore.

sudo mkdir /home/aoc/.x509
sudo mv private-key.pem /home/aoc/.x509/
sudo mv cert.pem /home/aoc/.x509/
sudo chown -R aoc:aoc /home/aoc/.x509/

11. Configure the credential process for aoc user

The RSA private key and the certificate issued by the AWS Private CA copied above will be used by the aoc user to obtain an AWS Identity with help of the signing helper tool installed in step 8. In order to do this, we need to add an external process to the authentication chain of the AWS SDK for Go. We can do this by creating a configuration file as explained in the documentation.

Use the following commands to create the configuration file needed. Remember to set up the environment variables in the local environment by copying the lines starting with export from your AWS Environment.

cat > config << EOF
[default]
credential_process = aws_signing_helper credential-process --certificate /home/aoc/.x509/cert.pem --private-key /home/aoc/.x509/private-key.pem --trust-anchor-arn $TA_ARN --profile-arn $PROFILE_ID_ARN --role-arn $REMOTE_ROLE
EOF

sudo chown aoc:aoc config
sudo mv config /home/aoc/.x509/

echo "AWS_CONFIG_FILE=/home/aoc/.x509/config" | sudo tee -a /opt/aws/aws-otel-collector/etc/.env

Note: The credential process here is configured for the `default` profile of the AWS  for Go configuration. You can create multiple profiles in your configuration if needed as described in the AWS Documentation, and you can specify the profile that will be used by the ADOT Collector adding the AWS_PROFILE environment variable and assign the name of the profile in the .env file described above in addition to the AWS_CONFIG_FILE variable.

Now let’s configure our collector to send the metrics to our Amazon Managed Service for Prometheus workspace while configuring the role that we created for sending those metrics. The configuration file must be accessible for the aoc user and we need to store the file in the configuration path. Update the configuration file if you deployed your Amazon Manage Service for Prometheus workspace on a different region than us-east-1.

cat > config.yaml << EOF
extensions:
  health_check:
  
  sigv4auth:
    service: "aps"
    region: "us-east-1"
      
receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:55681
  prometheus:
    config:
      scrape_configs:
        - job_name: "otel-collector"
          scrape_interval: 5s
          static_configs:
            - targets: ["localhost:8888"]
        - job_name: "node-exporter"
          scrape_interval: 5s
          static_configs:
            - targets: ["localhost:9100"]
processors:
  batch/traces:
    timeout: 1s
    send_batch_size: 50
  batch/metrics:
    timeout: 60s
    
exporters:
  prometheusremotewrite:
    endpoint: ${WORKSPACE_URL}api/v1/remote_write
    auth:
      authenticator: sigv4auth
          
service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [batch/metrics]
      exporters: [prometheusremotewrite]
  extensions: [health_check,sigv4auth]
EOF

sudo chown aoc:aoc config.yaml
sudo mv config.yaml /opt/aws/aws-otel-collector/etc/

Finally let’s restart the AWS Distro for Open Telemetry collector to use the new configuration and credentials:

sudo systemctl daemon-reload
sudo service aws-otel-collector restart

You can see if there is any authentication error using this journalctl

sudo journalctl -u aws-otel-collector

AWS IAM Roles Anywhere session are recorded in AWS CloudTrail with the event name CreateSession from event source rolesanywhere.amazonaws.com. You can identify the remote machine authenticating by looking at the x509 information in the event response:

{
    "eventVersion": "1.08",
    "userIdentity": {
        "type": "Unknown",
        "principalId": "",
        "arn": "",
        "accountId": "111122223333",
        "accessKeyId": "",
        "userName": ""
    },
    "eventTime": "2022-12-07T01:59:18Z",
    "eventSource": "rolesanywhere.amazonaws.com",
    "eventName": "CreateSession",
    "awsRegion": "us-east-1",
    "sourceIPAddress": "*******",
    "userAgent": "CredHelper/1.0.3 (go1.16.15; linux; amd64)",
    "requestParameters": {
        "cert": "**** REDACTED ****",
        "durationSeconds": 3600,
        "profileArn": "arn:aws:rolesanywhere:us-east-1:111122223333:profile/a1b2c3d4-5678-90ab-cdef-EXAMPLE11111 ",
        "roleArn": "arn:aws:iam::111122223333:role/ExternalPrometheusRemoteWrite",
        "trustAnchorArn": "arn:aws:rolesanywhere:us-east-1:111122223333:trust-anchor/a1b2c3d4-5678-90ab-cdef-EXAMPLE11111 "
    },
    "responseElements": {
        "credentialSet": [
            {
                "assumedRoleUser": {
                    "arn": "arn:aws:sts::111122223333:assumed-role/ExternalPrometheusRemoteWrite/a1b2c3d4-5678-90ab-cdef-EXAMPLE11111 ",
                    "assumedRoleId": "**** REDACTED ****"
                },
                "credentials": {
                    "accessKeyId": "**** REDACTED ****",
                    "expiration": "2022-12-07T02:59:18Z",
                    "secretAccessKey": "HIDDEN_DUE_TO_SECURITY_REASONS",
                    "sessionToken": "**** REDACTED ****"
                },
                "packedPolicySize": 44,
                "roleArn": "arn:aws:iam::111122223333:role/ExternalPrometheusRemoteWrite",
                "sourceIdentity": "CN=VM01"
            }
        ],
        "subjectArn": "arn:aws:rolesanywhere:us-east-1:111122223333:subject/a1b2c3d4-5678-90ab-cdef-EXAMPLE11111",
        "x509Subject": "C=US,O=Example Corp,CN=VM01"
    },
    "requestID": "f4d2454d-224a-405a-a6a8-28ba9827484f",
    "eventID": "6dc838a9-5ca1-4691-8bf3-b55aaec21a48",
    "readOnly": false,
    "eventType": "AwsApiCall",
    "managementEvent": true,
    "recipientAccountId": "111122223333",
    "eventCategory": "Management",
    "tlsDetails": {
        "tlsVersion": "TLSv1.2",
        "cipherSuite": "ECDHE-RSA-AES128-GCM-SHA256",
        "clientProvidedHostHeader": "rolesanywhere.us-east-1.amazonaws.com"
    }
}

Using AWS CloudTrail Lake, you can verify the frequency the temporary credentials are rotated (1 hour by default). Use the following query to see the relevant arguments used in this blog. Remember to update the variable $EDS_ID with the id of your Event Data Source:

SELECT 
    eventTime,
    eventName,
    sourceIPAddress,
    userAgent,
    element_at(requestParameters,'profileArn') as profileArn,
    element_at(requestParameters,'roleArn') as roleArn,
    element_at(requestParameters,'durationSeconds') as durationSeconds,
    element_at(requestParameters,'trustAnchorArn') as trustAnchorArn,
    element_at(responseElements,'x509Subject') as x509Subject
from 
	$EDS_ID 
where 
	eventname = 'CreateSession' and 
    eventSource = 'rolesanywhere.amazonaws.com'

You can see from the results credential helper process is invoked approximately every 1h (session duration) to obtain a new set of credentials.

Credential Helper Session historical in CloudTrail console

Figure 5. Credential Helper Session Trail

AWS IAM Roles Anywhere also exposes CloudWatch Metrics to monitor Successful invocations of the CreateSession action. You can see this metrics in the AWS CloudWatch metrics console and create alerts to monitor the rotation of the temporary credentials.

CloudWatch Metrics for Sessions on IAM Roles Anywhere service

Figure 6. Session Count for IAM Roles Anywhere

You can now visualize the metrics exposed by Node Exporter and sent by the ADOT Collector to the Amazon Managed Service for Prometheus workspace using Amazon Manage Grafana or any other visualization tools of your choice.

Grafana Dashboard for node_exporter

Figure 7. Amazon Managed Service for Grafana dashboard

Troubleshooting

  • Confirm the POSIX permissions on the files moved to the folder /home/aoc/.x509/ . The files must be readable by the user aoc
  • Check the content of the configuration file used for the credential process (/home/aoc/.x509/config). In the configuration file you should see three different Amazon Resource Names (ARNs):
    • One for the Trust Anchor
    • One for the Profile
    • One for the IAM role the process will assume.
  • Check that the configuration of the ADOT Collector environment file in /opt/aws/aws-otel-collector/etc/.env includes the environment variable AWS_CONFIG_FILE and it points to the right file path /home/aoc/.x509/config
  • Check the configuration file for the ADOT Collector in /opt/aws/aws-otel-collector/etc/config.yaml and confirm the value endpoint for the prometheusremotewrite exporter correspond to the remote_write URL of your Amazon Managed Service for Prometheus workspace, and includes api/v1/remote_write as part of the URL.

Conclusion

In this blog we showed you how you can setup a secure environment to collect Prometheus metrics from an on-premises virtual machine and remote write metrics to Amazon Managed Services for Prometheus. AWS IAM Roles Anywhere plays a key role here by providing temporary credentials to the Prometheus server. As you might already know, you can easily collect Prometheus metrics from a variety of environments including Amazon EKS, Amazon ECS and Amazon EC2 instances. Take a look at the references below:

Cleanup

To clean up your AWS Environment of resources, run the following commands. Some cleanup will also be needed on your VM but is out of scope for these instructions.

source delete.env
aws rolesanywhere delete-profile --profile-id $PROFILE_ID
aws rolesanywhere delete-trust-anchor --trust-anchor-id $TA_ID
aws acm-pca update-certificate-authority --certificate-authority-arn $PRIVATE_CA_ARN --status DISABLED
aws acm-pca delete-certificate-authority --certificate-authority-arn $PRIVATE_CA_ARN --permanent-deletion-time-in-days 7
aws iam detach-role-policy --role-name ExternalPrometheusRemoteWrite --policy-arn arn:aws:iam::aws:policy/AmazonPrometheusRemoteWriteAccess
aws iam delete-role --role-name ExternalPrometheusRemoteWrite
aws amp delete-workspace --workspace-id $WORKSPACE_ID --region $WORKLOAD_REGION

About the author:

Rafael Pereyra

Rafael Pereyra is a Sr. Security Transformation Consultant at AWS Professional Services, where he helps customers securely deploy, monitor and operate solutions in the cloud. Rafael’s interests includes containerized applications, improving observability, monitoring and logging of solutions, IaC and automation in general. In Rafael’s spare time, he enjoys cooking with family and friends.