Data Engineering Hub
GitHub Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode

AWS DynamoDB

Overview

This tutorial will show you how to export a large MongoDB collection as a JSON file and upload it to AWS S3 using the AWS CLI. The AWS CLI is used for larger files because there is a file size limit when uploading files to S3 via the AWS management console.

Requirements

1. Export the collection using mongoexport

In the example below, replace the uri connection string with your own.


mongoexport --uri="mongodb+srv://username:password@example-mongodb.example.mongodb.net/" --db=example --collection=example --out=example.json

You can also put your credentials into a separate configuration file and reference it.


mongoexport --config=config.yaml --db=example --collection=example --type=json --out=example.json

2. Generate an md5 checksum

After we have exported our file from MongoDB, we want to generate a hash which uniquely identifies the data. It will be used in the next step to verify that the data was uploaded correctly in S3.


openssl md5 -binary example.json | base64

The generated md5 hash should look similar to: t8oeOvMA7tKvxzZoEcYawQ==

Using the S3 copy command below, specify your file, the destination you want to send it to in S3, and the md5 hash generated previously.


aws s3 cp example.json s3://example-bucket/example.json --metadata md5="example"