author avatar

Jaehyeon Kim

Data Engineer. Data Streaming Enthusiast
6 posts in total

Amazon Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon OpenSearch Service and Amazon OpenSearch Serverless. The Apache Beam Python I/O connector for Amazon Data Firehose (firehose_pyio) provides a data sink feature that facilitates integration with those services.

Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream processing. We consider it has a huge potential to improve traditional development patterns in both transactional and analytical processing of data. Specifically it can be applied to event-driven applications, data pipelines and streaming analytics.

Employing dataflow programming, Beam supports a range of I/O connectors, but we find some gaps in the existing connectors especially in relation to the Python SDK. It fueled us to start the Apache Beam Python I/O Connectors project.