Archive DynamoDB items to S3 automatically

Click for: original source

Storing data like JSON logs in DynamoDB is a great idea as DynamoDB is very scalable. In addition, it is easier to transfer data into a DynamoDB table using for example Lambda and AWS SDK. Also, it makes analyzing the logs easier for example the AWS Console offers great filtering options to search for specific so-called table items. By Martin Mueller.

This all sounds very good but there is one hitch and that is the cost. As the number of items increases, so does the cost. So it would be advisable to delete the DynamoDB data from the table after a certain time, e.g. 30 days, and import it into an S3. The costs for S3 are much lower and it would even be possible to reduce them if you use a cheaper S3 tier like Glacier.

DynamoDB Streams invokes a Lambda, which writes the deleted item away to S3. In author’s example, the DynamoDB items are JSON logs with few properties. In your case, the DynamoDB item can look different. You can find the code in this GitHub repo. But the basic concept should still the same!

What is pretty cool is that DynamoDB Streams provides a batching feature. The Lambda can then process the deleted items as a batch. That reduces the number of Lambda calls and therefore the costs. The default DynamoDB default batching is not quite ideal for our use case here so author used AWS Console and the Lambda call metrics to optimize it. The batchSize with 10000 and the maxBatchingWindow are chosen maximally to call a Lambda really only every 5 minutes. Nice one!

[Read More]

Tags aws serverless learning nosql database