AWS Developer Tools Blog

Recommended AWS CDK project structure for Python applications

September 22, 2022: Migrated the reference application to AWS CDK v2. Renamed deployment.py to backend/component.py to support multi-component use cases and better emphasize the mapping of AWS Well-Architected Framework component terminology. Renamed pipeline.py to toolchain.py to expand the scope to any tools related to component’s software development life cycle (e.g. continuous deployment pipeline, pull request validation build, etc). The component is now a cdk.Stack. Moved the definition of the production cdk.Stage to the toolchain implementation.

In this blog post, I describe the recommended AWS Cloud Development Kit (AWS CDK) project structure for Python applications. This is based on Best practices for developing and deploying cloud infrastructure with the AWS CDK.

The AWS CDK is an open source software development framework for modeling and provisioning your cloud application resources through AWS CloudFormation by utilizing familiar programming languages, including TypeScript, JavaScript, Python, C#, Java, and Go.

The AWS CDK application maps to a component as defined by the AWS Well-Architected Framework. A component usually includes logical units (e.g., api, database), and optionally can have a toolchain with a continuous deployment pipeline. The logical units should be implemented as constructs, including the infrastructure (e.g., Amazon S3 buckets, Amazon RDS databases, Amazon VPC network), runtime (e.g., AWS Lambda function code), and configuration code.

For an example, I will walk through a user management backend component that utilizes Amazon API Gateway, AWS Lambda, and Amazon DynamoDB to provide basic CRUD operations for managing users. The project also includes a toolchain with a continuous deployment pipeline. This essentially contains everything required for managing the component as a unit of ownership, including the specific deployment environments.

Concepts

We recommend organizing the project directory structure based on the component’s logical units. Each logical unit should have a directory and include the related infrastructure, runtime, and configuration code. For example:

.
|-- backend
|   |-- api
|   |   |-- runtime
|   |   |   |-- lambda_function.py
|   |   |   `-- requirements.txt
|   |   `-- infrastructure.py
|   |-- database
|   |   `-- infrastructure.py

This way, if I need to make API changes, then I can easily find the code related to that logical unit. If I need to refactor the code, or make it a separate unit of ownership, it can be changed in a single place. In other words, it is a self-contained unit.

The logical units should be implemented as constructs and not as stacks. Constructs are the basic building blocks of AWS CDK applications, while stacks are deployment units. All of the AWS resources defined within the scope of a stack, either directly or indirectly, are provisioned as a single unit. Implementing logical units as constructs provides the flexibility to support different deployment layouts and enables future reuse as construct libraries. I will further discuss the deployment layout later.

Note: When refactoring constructs, consider logical ID stability to avoid unexpected infrastructure changes.

Before the AWS CDK arrived, runtime and infrastructure code remained two separate concepts. The AWS CDK abstraction lets you to combine the infrastructure and runtime code of a logical unit behind a single construct interface.

Project structure

Let’s look at the recommended project structure in detail. Clone the example I use in this blog post from https://github.com/aws-samples/aws-cdk-project-structure-python. Note that I have left out source code snippets and files in the blog post, such as linters and the full pipeline structure. The focus is on the code that illustrates the project structure recommendations, while the source code still provides a fully functional project for reference. Below is a snapshot of the project structure, excluding files not in the scope of this blog post:

# The recommended project structure example
.
|-- backend
|   |-- api
|   |   |-- runtime
|   |   |   |-- lambda_function.py
|   |   |   `-- requirements.txt
|   |   `-- infrastructure.py
|   |-- database
|   |   `-- infrastructure.py
|   |-- monitoring
|   |   `-- infrastructure.py
|   `-- component.py
|-- app.py
|-- constants.py
|-- requirements.txt
`-- toolchain.py

Three logical units compose the user management backend: API, database, and monitoring. Each logical unit contains an infrastructure.py module. If the infrastructure implementation were more complex, then I would replace the infrastructure.py module with infrastructure package, which contains multiple modules. Some logical units also have a runtime directory. For example, the API has a runtime directory containing Lambda function code.

Next, I will cover app.py (the AWS CDK application entry point), backend/component.py (the user management backend deployment layout), and toolchain.py (the continuous deployment pipeline) modules in order to show the implementation of the recommended project structure. We’ll start with app.py.

app.py

# The AWS CDK application entry point
...
import constants
from backend.component import Backend
from toolchain import Toolchain

app = cdk.App()

# Component sandbox stack
Backend(
    app,
    constants.APP_NAME + "Sandbox",
    env=cdk.Environment(
        account=os.environ["CDK_DEFAULT_ACCOUNT"],
        region=os.environ["CDK_DEFAULT_REGION"],
    ),
    api_lambda_reserved_concurrency=1,
    database_dynamodb_billing_mode=dynamodb.BillingMode.PAY_PER_REQUEST,
)

# Toolchain stack (defines the continuous deployment pipeline)
Toolchain(
    app,
    constants.APP_NAME + "Toolchain",
    env=cdk.Environment(account="111111111111", region="eu-west-1"),
)

app.synth()
Python

The module defines the app object, followed by the component sandbox stack and the toolchain stack with a continuous deployment pipeline.

Note: constants.APP_NAME is utilized as part of the construct identifier (e.g., constants.APP_NAME + "Sandbox" above) in order to set a unique prefix for a CloudFormation stack name.

In this case, I utilize CDK Pipelines for continuous deployment, and instantiate the toolchain stack that defines the pipeline here. Then, the pipeline deploys the user management backend to production environment.

Note: We recommend deploying the toolchain in a separate production deployment account. See Separating CI/CD management capabilities from workloads for more details.

During development, I will iterate quickly and deploy changes to my sandbox environment. The Backend definition above enables this. It defines the user management backend deployment layout for my sandbox environment. The Backend class is imported from the backend/component.py module. Let’s look into it.

backend/component.py

# The user management backend deployment layout
...
from backend.api.infrastructure import API
from backend.database.infrastructure import Database
from backend.monitoring.infrastructure import Monitoring

class Backend(cdk.Stack):
    def __init__(
        self,
        scope: cdk.Construct,
        id_: str,
        *,
        database_dynamodb_billing_mode: dynamodb.BillingMode,
        api_lambda_reserved_concurrency: int,
        **kwargs: Any,
    ):
        super().__init__(scope, id_, **kwargs)
		
        database = Database(
            self, 
            "Database", 
            dynamodb_billing_mode=database_dynamodb_billing_mode
        )
        api = API(
            self,
            "API",
            dynamodb_table_name=database.dynamodb_table.table_name,
            lambda_reserved_concurrency=api_lambda_reserved_concurrency,
        )
        Monitoring(self, "Monitoring", database=database, api=api)
        
        database.dynamodb_table.grant_read_write_data(api.lambda_function)
	
        self.api_endpoint = cdk.CfnOutput(
            self, 
            "APIEndpoint", 
            value=api.api_gateway_http_api.url,
        )
Python

The Backend class inherits from cdk.Stack—a unit of deployment in the AWS CDK. As I mentioned above, all AWS resources defined within the scope of a stack, either directly or indirectly, are provisioned as a single unit. The Backend class composes the API, Database, and Monitoring constructs into a single deployment unit. The class also defines the permissions between the logical units and the stack outputs.

Finally, let’s look at the toolchain definition.

toolchain.py

# The continuous deployment pipeline
...
import aws_cdk as cdk
from aws_cdk import pipelines

import constants
from backend.component import Backend

class Toolchain(cdk.Stack):
    def __init__(self, scope: cdk.Construct, id_: str, **kwargs: Any):
        super().__init__(scope, id_, **kwargs)
        ...
        pipeline = pipelines.CodePipeline(...)
        Toolchain._add_production_stage(codepipeline)
    ...
    @staticmethod
    def _add_production_stage(self, pipeline: pipelines.CodePipeline) -> None:
        production = cdk.Stage(
            pipeline,
            PRODUCTION_ENV_NAME,
            env=cdk.Environment(
                account=PRODUCTION_ENV_ACCOUNT, region=PRODUCTION_ENV_REGION
            ),
        )
        backend = Backend(
            production,
            constants.APP_NAME + PRODUCTION_ENV_NAME,
            stack_name=constants.APP_NAME + PRODUCTION_ENV_NAME,
            api_lambda_reserved_concurrency=10,
            database_dynamodb_billing_mode=dynamodb.BillingMode.PROVISIONED,
        )
        ...
        pipeline.add_stage(production, post=[smoke_test])
Python

This time, the Backend stage is utilized for deployment to a production environment via a pipeline. This lets me keep my sandbox environment similar to the production environment, all while remaining able to add customizations. For example, I use the database_dynamodb_billing_mode argument to set DynamoDB capacity mode to on-demand for the sandbox environment and to provisioned mode for the production environment.

Conclusion

The AWS CDK allows for infrastructure code to be located in the same repository with runtime code. This leads to additional considerations, such as how to structure the project. In this blog post, I have described the recommended AWS CDK project structure for Python applications, thereby aiming to ease the maintenance and evolution of your projects.

If you think I’ve missed something, or you have a use case that I didn’t cover, we would love to hear from you on the aws-cdk-project-structure-python GitHub repository. Happy coding!

About the author

Alex Pulver is a Sr. Partner Solutions Architect at AWS SaaS Factory team. He works with AWS Partners at any stage of their software-as-a-service (SaaS) journey in order to help build new products, migrate existing applications, or optimize SaaS solutions on AWS. His areas of interest include builder experience (e.g., developer tools, DevOps culture, CI/CD), containers, security, IoT, and AWS multi-account strategy.