Document storage with Amazon DocumentDB
In this course, you have been building a restaurant-rating application. You used Amazon DynamoDB to build the core restaurant-ratings service to allow users to rate restaurants. Then you sped up your application by adding caching to key workflows by using Amazon ElastiCache. You then used Amazon Neptune to add fraud detection by analyzing reviews to detect bot traffic.
In this fourth lesson, you add marketing pages for restaurants. Restaurants can customize the look of their page on their site to help promote their offerings. They also can display their menus, promote positive reviews, and post photos. To handle this application, you’ll use Amazon DocumentDB (with MongoDB compatibility).
This lesson teaches you how to use a fully managed document database in an application. First, you learn why you would want to use a document database such as Amazon DocumentDB. Then you walk through the steps to create an Amazon DocumentDB database, design your data model, and use the database in your application. At the end of this lesson, you should feel confident in your ability to use Amazon DocumentDB in your application. For additional information about Amazon DocumentDB, see the Amazon DocumentDB Developer Guide.
Time to complete: 30–45 minutes
Amazon DocumentDB is a fully managed document database from AWS. A document database is a type of NoSQL database that allows you to store and query rich documents in your application. A document database works well for the following use cases:
- Content management systems: When storing data for displaying rich content pages, you often have heterogeneous data. It can include content, images, and testimonials. Storing this data in a rigid relational database can remove flexibility. By using a document database, you can move quickly and provide a great experience for your users.
- Profile management: When storing user profiles in your application, you might have a wide range of settings, preferences, and additional data to store with your user. With a document database such as Amazon DocumentDB, you can keep this data together while keeping data access fast.
- Web and mobile applications: Modern web and mobile applications demand fast data access at high scale. The rise of global applications means you can have millions of users per second at any time of day. Amazon DocumentDB provides high performance and easy scaling to handle your high-volume applications.
With Amazon DocumentDB, you get a fully managed document database experience. This means you don't need to focus on instance failover, database backups and recovery, or software upgrades. You can focus on building your application and innovating for your customers.
Finally, Amazon DocumentDB has API compatibility with MongoDB. This means you can use popular open-source libraries to interact with Amazon DocumentDB, or you can migrate existing databases to Amazon DocumentDB with minimal hassle.
In this lesson, you learn how to build a service by using Amazon DocumentDB as your data storage. This lesson has five steps.
-
1. Create an AWS Cloud9 environment
In this module, you create and prepare an AWS Cloud9 environment. AWS Cloud9 is a cloud-based integrated development environment (IDE). It gives you a fast, consistent development environment from which you can quickly build AWS applications.
To get started, navigate to the AWS Cloud9 console. Choose Create environment to start the AWS Cloud9 environment creation wizard.
(click to zoom)On the first page of the wizard, give your environment a name and a description. Then choose Next step.
(click to zoom)The next step allows you to configure environment settings, such as the instance type for your environment, the platform, and network settings.
The default settings work for this lesson, so scroll to the bottom and choose Next step.
(click to zoom)The last step shows your settings for review. Scroll to the bottom and choose Create environment.
(click to zoom)Your AWS Cloud9 environment should take a few minutes to provision. As it is being created, the following screen is displayed.
(click to zoom)After a few minutes, you should see your AWS Cloud9 environment. There are three areas of the AWS Cloud9 console to know, as illustrated in the following screenshot:
- File explorer: On the left side of the IDE, the file explorer shows a list of the files in your directory.
- File editor: In the upper right area of the IDE, the file editor is where you view and edit files that you’ve chosen in the file explorer.
- Terminal: In the lower right area of the IDE, the terminal is where you run commands to execute code samples.
(click to zoom)In this lesson, you use Python to interact with your Amazon DocumentDB database. Run the following commands in your AWS Cloud9 terminal to download and unpack the module code.
cd ~/environment
curl -sL https://s3.amazonaws.com/aws-data-labs/document-cms.tar | tar -xvRun the following command in your AWS Cloud9 terminal to view the contents of your directory.
ls
You should see two directories in your AWS Cloud9 terminal:
- scripts/: The scripts directory includes files necessary for configuring and preparing your database. Use these scripts to test your database connection and load sample data into your database.
- application/: The application directory contains files that are similar to what you have in your application. They show how to query your document database to satisfy your data access patterns.
Run the following command in your terminal to install the dependencies for your application.
sudo pip install -r requirements.txt
In this module, you configured an AWS Cloud9 instance to use for development. In the next module, you create an Amazon DocumentDB database.
-
2. Create an Amazon DocumentDB database
In this module, you create an Amazon DocumentDB database. This database is used to power the restaurant marketing pages in your application.
To get started, navigate to the Amazon DocumentDB console. Choose Create to begin the database-creation wizard.
(click to zoom)In the Configuration box, give your Amazon DocumentDB database a name for its Cluster identifier. Also, because this is a walkthrough example, you can set the Number of instances to 1. In a production setting, you likely would want to use additional instances.
(click to zoom)In the Authentication section, give your database a master username and master password. Make sure you write these down because you need them to connect to your database.
Then select the toggle next to Show advanced settings to see additional configuration settings.
(click to zoom)In the Network settings, you can choose a VPC and security group for your Amazon DocumentDB database. Remove the default security group choice and choose the security group that was created for your AWS Cloud9 environment.
(click to zoom)Most of the remaining default settings are fine for this walkthrough.
Scroll to the bottom of the wizard to the Deletion protection section. Clear the check box marked Enable deletion protection to disable this feature. This will make it easier to delete your Amazon DocumentDB database after you complete this lesson. In a production environment, you should enable deletion protection.
Choose Create cluster to create your Amazon DocumentDB database.
(click to zoom)You can configure tags or update additional configuration options, but the defaults work for this tutorial.
Choose Create database to create your Amazon DocumentDB database.
(click to zoom)AWS begins provisioning your Amazon DocumentDB database. As your database is being provisioned, it shows a Status of creating.
When your database is ready, it shows a Status of available.
(click to zoom)(click to zoom)Next, you need to configure your security group to allow your AWS Cloud9 environment to access your Amazon DocumentDB database.
Navigate to the Security Groups page of the Amazon EC2 console. You should see the security group that was created for your AWS Cloud9 environment. Choose the Security group ID to see its details.
(click to zoom)You should see details about your security group including its inbound networking rules. Choose Edit inbound rules to edit the rules.
(click to zoom)You should see existing inbound rules in your security group to allow SSH access to your AWS Cloud9 environment.
Choose Add rule to add an additional rule. For Type, choose Custom TCP. Enter 27017 for the Port range. Then choose your AWS Cloud9 security group for the Source.
Choose Save rules to save your security group rules.
(click to zoom)You have configured access to your Amazon DocumentDB database from your AWS Cloud9 environment. Now, test connecting to your database to ensure it was configured correctly.
First, go to the Amazon DocumentDB console. Find the Amazon DocumentDB database you created and choose its Cluster identifier.
(click to zoom)You should see details about your Amazon DocumentDB database. Choose the Configuration tab and find the Cluster endpoint. Copy this value.
(click to zoom)In the AWS Cloud9 terminal, run the following commands to set the Amazon DocumentDB configuration in the AWS Cloud9 environment.
export DOCUMENTDB_ENDPOINT=<yourClusterEndpoint>
export DOCUMENTDB_USER=<yourUsername>
export DOCUMENTDB_PASSWORD=<yourPassword>Be sure to substitute your values for <yourClusterEndpoint>, <yourUsername>, and <yourPassword>.
Next, run the following command in your terminal to download the public key for connecting to Amazon DocumentDB.
wget https://s3.amazonaws.com/rds-downloads/rds-combined-ca-bundle.pem
In the scripts/ directory, there is a file named test_connection.py. Open that file in your file editor. The contents of the file are as follows.
import os import pymongo USER = os.environ["DOCUMENTDB_USER"] PASSWORD = os.environ["DOCUMENTDB_PASSWORD"] HOST = os.environ["DOCUMENTDB_ENDPOINT"] client = pymongo.MongoClient( f"mongodb://{USER}:{PASSWORD}@{HOST}:27017/?ssl=true&ssl_ca_certs=rds-combined-ca-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred" ) db = client.restaurants results = db.restaurants.count() print( f"Connected successfully! There are {results} documents in your restaurants collection." )
This file uses PyMongo, an open-source Python library for using the MongoDB API. The script reads the environment variables you configured, and then it initializes a client with a connection string that includes the username, password, and cluster endpoint. Finally, it runs a count() operation to count the records in your restaurants collection.
Test your connection by running the following command in your terminal.
python scripts/test_connection.py
You should see a message in your terminal with the following output.
Connected successfully! There are 0 documents in your restaurants collection.
In this module, you created a document database by using Amazon DocumentDB. Amazon DocumentDB provides a fully managed document database that is compatible with the MongoDB API. After creating your database, you configured your security group to allow inbound traffic from your AWS Cloud9 environment to your Amazon DocumentDB database. Finally, you saw how to connect to Amazon DocumentDB and ran a script to test your connection.
In the next module, you design your data model for your restaurant marketing service and load your table with sample data.
-
3. Design a document data model and load sample data
In this module, you learn the basics of data modeling with a document database. First, you learn some key terminology and concepts for a document database. Then you load some sample documents into your database.
If most of your database experience has been with a relational database, some document database terminology might be different. However, many of the concepts in a document database are comparable to a relational database.
There are three key terms you should learn about Amazon DocumentDB:
- Collection: A grouping of records in your Amazon DocumentDB database. A collection is similar to a table in a relational database.
- Document: A record in Amazon DocumentDB. This term refers to a grouping of data that is identifiable by a primary key. It is similar to a row in a relational database and is the foundational data unit in a document database.
- Field: Within a document, the data attributes are called fields. Fields are similar to columns in a relational database. However, you don't need to specify fields for your collection upfront as you do with a relational database. Amazon DocumentDB is schemaless, which means the database itself does not enforce your data schema. You should have a schema, but it is enforced in your application code.
Though a document is comparable to a row in a relational database, there are some significant differences. In a relational database, you often normalize your data by breaking down related records into atomic units and placing different entity types in different tables. Then you reassemble those records at query time by using the SQL join operator.
In a document database, you use a different approach. You keep related data together in a single document. That means your document could include fields that have simple values, such as strings and numbers, or they could include complex values, such as arrays and nested documents.
There is an example document in the scripts/restaurant_1.json file. Open that file in your file editor. You should see the following contents.
{ "name": "The Vineyard", "createdAt": "2020-02-15 01:45:40", "updatedAt": "2020-03-13 03:48:35", "address": { "street": "1122 Broadway", "location": "New York, NY" }, "promotedReviews": [ { "reviewer": "delighted_dan", "rating": 5, "review": "This place is great! Came here with my family and we all loved it. Try the tortellini!" }, { "reviewer": "happy_hannah", "rating": 4, "review": "Pretty good! Very nice service and good food. Would come again!" } ], "foodImages": [ { "url": "https://cdn.reviewmyrestaurant.com/lkj234n1k.png", "caption": "Croque Madame" }, { "url": "https://cdn.reviewmyrestaurant.com/lksfm1340.png", "caption": "French Onion Soup" } ] }
Notice how rich the document is. It includes simple information such as the restaurant name and created date. It also includes a nested document with the restaurant's address information and arrays of documents for promoted reviews and food images for the restaurant. Keeping this data together in the same document results in faster reads than a comparable query in a relational database that requires reading multiple tables and combining the results.
Amazon DocumentDB includes flexible indexing features that allow you to index your documents in multiple ways to allow for efficient queries. You can index on single fields, such as the restaurant name, or you can use a compound index to index multiple fields. For example, you might want to create a compound index on location and the updatedAt timestamp to find the most recently updated restaurant pages in a specific location.
Now, load some sample data and create a simple index to see how this works.
In the scripts/ directory, there is a file called load_sample_data.py. Open the file in your file editor. The contents should look as follows.
import json import os import pymongo USER = os.environ["DOCUMENTDB_USER"] PASSWORD = os.environ["DOCUMENTDB_PASSWORD"] HOST = os.environ["DOCUMENTDB_ENDPOINT"] client = pymongo.MongoClient( f"mongodb://{USER}:{PASSWORD}@{HOST}:27017/?ssl=true&ssl_ca_certs=rds-combined-ca-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred" ) db = client.restaurants with open("scripts/restaurant_1.json", "r") as f: restaurant = json.load(f) db.restaurants.insert_one(restaurant) with open("scripts/restaurant_2.json", "r") as f: restaurant = json.load(f) db.restaurants.insert_one(restaurant) print("Documents loaded successfully.") db.restaurants.create_index([("name", pymongo.DESCENDING)]) print("Index created successfully.")
This script does two things. First, it reads two example Restaurant documents from local files and loads them into your Amazon DocumentDB database in the Restaurants collection. Second, it creates a single field index on the name field to allow efficient lookups by restaurant name.
Run the following command in your terminal to execute the script.
python scripts/load_sample_data.py
You should see the following output in your terminal.
$ python scripts/load_sample_data.py
Documents loaded successfully.
Index created successfully.Now that the data is loaded, execute the scripts/test_connection.py script again to see the number of documents in your database.
Run the following command in your terminal.
python scripts/test_connection.py
You should see the following output in your terminal.
$ python scripts/test_connection.py
Connected successfully! There are 2 documents in your restaurants collection.Great! You have successfully inserted documents into your Amazon DocumentDB database.
In this module, you learned the basic terminology for working with document databases and saw an example document. Then you loaded some example documents into your Amazon DocumentDB database and created an index.
In the next module, you run advanced queries against your Amazon DocumentDB database.
-
4. Use Amazon DocumentDB in your application
In this module, you learn how to use Amazon DocumentDB in your application. First, you create a compound index in your database collection and use the index to query your documents. Then you see how to update your documents. Finally, you query your data again to verify that the data changed.
In the last module, you created a single field index on the name field in your collection. In this module, you create a compound index by indexing two fields.
Imagine you want to look at restaurants in a specific location. When browsing those restaurants, you want to see restaurants whose pages have been updated most recently so that you can see any changes since you last visited.
To do this, you can use a compound index. A compound index works by indexing multiple fields. The index first orders documents according to the first field in the index, then the second field, and so on for each field in the index.
For this use case, create a compound index that uses the location for the first field and the updated timestamp for the second field.
In the scripts/ directory, there is a file called add_compound_index.py. Open that file in your file editor. The contents should look as follows.
import os import pymongo USER = os.environ["DOCUMENTDB_USER"] PASSWORD = os.environ["DOCUMENTDB_PASSWORD"] HOST = os.environ["DOCUMENTDB_ENDPOINT"] client = pymongo.MongoClient( f"mongodb://{USER}:{PASSWORD}@{HOST}:27017/?ssl=true&ssl_ca_certs=rds-combined-ca-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred" ) db = client.restaurants db.restaurants.create_index( [("address.location", pymongo.ASCENDING), ("updatedAt", pymongo.DESCENDING)] ) print("Index created successfully.")
This is similar to the index-creation process from the scripts/load_sample_data.py script except that the index has two fields rather than one.
Create the compound index by running the following command in your terminal.
python scripts/add_compound_index.py
You should see output in your console confirming that the index was created.
Next, use the index. There is a file in the application/ directory called get_recently_updated_by_location.py. Open the file in your file editor. The contents should look as follows.
import os import pymongo USER = os.environ["DOCUMENTDB_USER"] PASSWORD = os.environ["DOCUMENTDB_PASSWORD"] HOST = os.environ["DOCUMENTDB_ENDPOINT"] client = pymongo.MongoClient( f"mongodb://{USER}:{PASSWORD}@{HOST}:27017/?ssl=true&ssl_ca_certs=rds-combined-ca-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred" ) db = client.restaurants def get_recently_updated_by_location(location): results = db.restaurants.find({"address.location": location}).sort( [("updatedAt", pymongo.DESCENDING)] ) return results results = get_recently_updated_by_location("New York, NY") for restaurant in results: print(f"Restaurant: {restaurant['name']}. Updated at {restaurant['updatedAt']}")
In this file, there is a function called get_recently_updated_by_location. It is similar to code you would have in your application. The function takes a restaurant name, and then queries Amazon DocumentDB to find restaurants in that location. For the matches, it sorts them in order of the updatedAt field.
At the bottom of the file, there is a statement to invoke the function to find the most recently updated restaurants in New York, NY. It then prints out the results.
Run the following command in your terminal to execute the script.
python application/get_recently_updated_by_location.py
You should see the following output in your terminal.
$ python application/get_recently_updated_by_location.py
Restaurant: Bill's Burgers. Updated at 2020-04-10 04:07:23
Restaurant: The Vineyard. Updated at 2020-03-13 03:48:35Notice how the output matches both New York, NY, restaurants and returns them in order of when they were most recently updated.
Next, let's say you want to update a restaurant, and you have a way that a restaurant can add a new review to its promoted reviews.
In the applications/ directory, there is a file named add_review_to_restaurant.py. The contents of that file are as follows.
import datetime import os import pymongo USER = os.environ["DOCUMENTDB_USER"] PASSWORD = os.environ["DOCUMENTDB_PASSWORD"] HOST = os.environ["DOCUMENTDB_ENDPOINT"] client = pymongo.MongoClient( f"mongodb://{USER}:{PASSWORD}@{HOST}:27017/?ssl=true&ssl_ca_certs=rds-combined-ca-bundle.pem&replicaSet=rs0&readPreference=secondaryPreferred" ) db = client.restaurants def add_review_to_restaurant(name, review): db.restaurants.update_one( {"name": name}, { "$push": {"promotedReviews": review}, "$set": {"updatedAt": datetime.datetime.now().isoformat(),}, }, ) return add_review_to_restaurant( "The Vineyard", { "reviewer": "elated_eric", "rating": 5, "review": "Sooo good! Can't wait to come back.", }, )
There is a function named add_review_to_restaurant that is similar to code that you would have in your application. This function takes a restaurant name and a review, and it adds the review to the restaurant.
Look at the database code in that function. It uses the update_one() method to update a single record. The first argument identifies the document to be updated. It does a match on the name field. The second argument contains the updates to be applied. In this instance, you are adding another element to the promotedReviews array using the $push operator. Additionally, you are updating the updatedAt timestamp to the current time.
At the bottom of the file is a statement to invoke the function with some example data. It adds a sample review to The Vineyard restaurant.
Run the script to add a new review to The Vineyard by running the following command in your terminal.
python application/add_review_to_restaurant.py
This command has no output. To see that the restaurant document was updated, run the application/get_recently_updated_by_location.py script again. Because you have updated The Vineyard, it should now show up first in the results.
Run the script with the following command in your terminal.
python application/get_recently_updated_by_location.py
You should see the following output in your terminal (note that your output is slightly different because The Vineyard has a timestamp according to when you ran your update command).
$ python application/get_recently_updated_by_location.py
Restaurant: The Vineyard. Updated at 2020-05-28T16:38:23.485473
Restaurant: Bill's Burgers. Updated at 2020-04-10 04:07:23Success! The Vineyard is now returned before Bill's Burgers and shows the timestamp from when you updated it.
In this module, you learned more advanced data access with Amazon DocumentDB. First, you created a compound index to allow you to efficiently query on multiple fields. Then you used that index to query restaurants in a location by most recently updated. Finally, you updated a complex field in an existing document and verified that the query was updated.
In the next module, you clean up the resources you created in this lesson.
-
5. Clean up the resources you created
In this lesson, you created a document database by using Amazon DocumentDB that serves as the database for a restaurant content-management service. Document databases are good matches for content-management systems in which you have heterogeneous data that needs to be kept together for fast access. With Amazon DocumentDB, you get a fully managed document database that simplifies database operations.
In this module, you clean up the resources you created in this lesson to avoid incurring additional charges.
First, delete your Amazon DocumentDB database. To do so, navigate to the Instances page in the Amazon DocumentDB console. Choose your database instance, and then choose Delete in the Actions dropdown.
(click to zoom)For this lesson, you can decline to keep a final snapshot of your database. Choose Delete to confirm the deletion.
(click to zoom)The Amazon DocumentDB instances page shows that your instance is being deleted.
(click to zoom)Additionally, you need to delete your AWS Cloud9 development environment. To do so, navigate to the AWS Cloud9 console. Choose the environment you created for this lesson, and choose Delete.
(click to zoom)
In this module, you learned how to clean up the Amazon DocumentDB database and the AWS Cloud9 environment that you created in this lesson.
In this lesson, you learned how to create and use an Amazon DocumentDB database in your application. First, you created an Amazon DocumentDB database and configured network access so that you could connect to the database. Then you learned the terminology of document databases and loaded your database with example data. Finally, you saw how to use a document database in your application, including how to query your documents and how to update existing documents. You can use these patterns when building applications with Amazon DocumentDB.