AWS Developer Tools Blog
Recommended AWS CDK project structure for Python applications
September 22, 2022: Migrated the reference application to AWS CDK v2. Renamed
deployment.py
tobackend/component.py
to support multi-component use cases and better emphasize the mapping of AWS Well-Architected Framework component terminology. Renamedpipeline.py
totoolchain.py
to expand the scope to any tools related to component’s software development life cycle (e.g. continuous deployment pipeline, pull request validation build, etc). The component is now acdk.Stack
. Moved the definition of the productioncdk.Stage
to the toolchain implementation.
In this blog post, I describe the recommended AWS Cloud Development Kit (AWS CDK) project structure for Python applications. This is based on Best practices for developing and deploying cloud infrastructure with the AWS CDK.
The AWS CDK is an open source software development framework for modeling and provisioning your cloud application resources through AWS CloudFormation by utilizing familiar programming languages, including TypeScript, JavaScript, Python, C#, Java, and Go.
The AWS CDK application maps to a component as defined by the AWS Well-Architected Framework. A component usually includes logical units (e.g., api, database), and optionally can have a toolchain with a continuous deployment pipeline. The logical units should be implemented as constructs, including the infrastructure (e.g., Amazon S3 buckets, Amazon RDS databases, Amazon VPC network), runtime (e.g., AWS Lambda function code), and configuration code.
For an example, I will walk through a user management backend component that utilizes Amazon API Gateway, AWS Lambda, and Amazon DynamoDB to provide basic CRUD operations for managing users. The project also includes a toolchain with a continuous deployment pipeline. This essentially contains everything required for managing the component as a unit of ownership, including the specific deployment environments.
Concepts
We recommend organizing the project directory structure based on the component’s logical units. Each logical unit should have a directory and include the related infrastructure, runtime, and configuration code. For example:
. |-- backend | |-- api | | |-- runtime | | | |-- lambda_function.py | | | `-- requirements.txt | | `-- infrastructure.py | |-- database | | `-- infrastructure.py
This way, if I need to make API changes, then I can easily find the code related to that logical unit. If I need to refactor the code, or make it a separate unit of ownership, it can be changed in a single place. In other words, it is a self-contained unit.
The logical units should be implemented as constructs and not as stacks. Constructs are the basic building blocks of AWS CDK applications, while stacks are deployment units. All of the AWS resources defined within the scope of a stack, either directly or indirectly, are provisioned as a single unit. Implementing logical units as constructs provides the flexibility to support different deployment layouts and enables future reuse as construct libraries. I will further discuss the deployment layout later.
Note: When refactoring constructs, consider logical ID stability to avoid unexpected infrastructure changes.
Before the AWS CDK arrived, runtime and infrastructure code remained two separate concepts. The AWS CDK abstraction lets you to combine the infrastructure and runtime code of a logical unit behind a single construct interface.
Project structure
Let’s look at the recommended project structure in detail. Clone the example I use in this blog post from https://github.com/aws-samples/aws-cdk-project-structure-python. Note that I have left out source code snippets and files in the blog post, such as linters and the full pipeline structure. The focus is on the code that illustrates the project structure recommendations, while the source code still provides a fully functional project for reference. Below is a snapshot of the project structure, excluding files not in the scope of this blog post:
# The recommended project structure example . |-- backend | |-- api | | |-- runtime | | | |-- lambda_function.py | | | `-- requirements.txt | | `-- infrastructure.py | |-- database | | `-- infrastructure.py | |-- monitoring | | `-- infrastructure.py | `-- component.py |-- app.py |-- constants.py |-- requirements.txt `-- toolchain.py
Three logical units compose the user management backend: API, database, and monitoring. Each logical unit contains an infrastructure.py
module. If the infrastructure implementation were more complex, then I would replace the infrastructure.py
module with infrastructure
package, which contains multiple modules. Some logical units also have a runtime
directory. For example, the API has a runtime
directory containing Lambda function code.
Next, I will cover app.py
(the AWS CDK application entry point), backend/component.py
(the user management backend deployment layout), and toolchain.py
(the continuous deployment pipeline) modules in order to show the implementation of the recommended project structure. We’ll start with app.py
.
app.py
The module defines the app
object, followed by the component sandbox stack and the toolchain stack with a continuous deployment pipeline.
Note:
constants.APP_NAME
is utilized as part of the construct identifier (e.g.,constants.APP_NAME + "Sandbox"
above) in order to set a unique prefix for a CloudFormation stack name.
In this case, I utilize CDK Pipelines for continuous deployment, and instantiate the toolchain stack that defines the pipeline here. Then, the pipeline deploys the user management backend to production environment.
Note: We recommend deploying the toolchain in a separate production deployment account. See Separating CI/CD management capabilities from workloads for more details.
During development, I will iterate quickly and deploy changes to my sandbox environment. The Backend
definition above enables this. It defines the user management backend deployment layout for my sandbox environment. The Backend
class is imported from the backend/component.py
module. Let’s look into it.
backend/component.py
The Backend
class inherits from cdk.Stack—a unit of deployment in the AWS CDK. As I mentioned above, all AWS resources defined within the scope of a stack, either directly or indirectly, are provisioned as a single unit. The Backend
class composes the API
, Database
, and Monitoring
constructs into a single deployment unit. The class also defines the permissions between the logical units and the stack outputs.
Finally, let’s look at the toolchain definition.
toolchain.py
This time, the Backend
stage is utilized for deployment to a production environment via a pipeline. This lets me keep my sandbox environment similar to the production environment, all while remaining able to add customizations. For example, I use the database_dynamodb_billing_mode
argument to set DynamoDB capacity mode to on-demand for the sandbox environment and to provisioned mode for the production environment.
Conclusion
The AWS CDK allows for infrastructure code to be located in the same repository with runtime code. This leads to additional considerations, such as how to structure the project. In this blog post, I have described the recommended AWS CDK project structure for Python applications, thereby aiming to ease the maintenance and evolution of your projects.
If you think I’ve missed something, or you have a use case that I didn’t cover, we would love to hear from you on the aws-cdk-project-structure-python GitHub repository. Happy coding!
About the author
Alex Pulver is a Sr. Partner Solutions Architect at AWS SaaS Factory team. He works with AWS Partners at any stage of their software-as-a-service (SaaS) journey in order to help build new products, migrate existing applications, or optimize SaaS solutions on AWS. His areas of interest include builder experience (e.g., developer tools, DevOps culture, CI/CD), containers, security, IoT, and AWS multi-account strategy.