Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream processing. We consider it has a huge potential to improve traditional development patterns in both transactional and analytical processing of data. Specifically it can be applied to event-driven applications, data pipelines and streaming analytics.

Employing dataflow programming, Beam supports a range of I/O connectors, but we find some gaps in the existing connectors especially in relation to the Python SDK. It fueled us to start the Apache Beam Python I/O Connectors project.

We are happy to present the first release of the Apache Beam Python I/O connector for Amazon DynamoDB.

✨NEW

  • Add a composite transform (WriteToDynamoDB) that writes records to a DynamoDB table with help of the batch_writer of the boto3 package.
    • The batch writer will automatically handle buffering and sending items in batches. In addition, it will also automatically handle any unprocessed items and resend them as needed.