We are happy to present the first release of the Apache Beam Python I/O connector for Amazon Data Firehose.

✨NEW

  • Add a composite transform (WriteToFirehose) that puts records into a Firehose delivery stream in batch, using the put_record_batch method of the boto3 package.
  • Provide options that control individual records.
    • jsonify - A flag that indicates whether to convert a record into JSON. Note that a record should be of bytes, bytearray or file-like object, and, if it is not of a supported type (e.g. integer), we can convert it into a Json string by specifying this flag to True.
    • multiline - A flag that indicates whether to add a new line character (\n) to each record. It is useful to save records into a CSV or Jsonline file.
  • Create a dedicated pipeline option (FirehoseOptions) that reads AWS related values (e.g. aws_access_key_id) from pipeline arguments.
  • Implement metric objects that record the total, succeeded and failed elements counts.
  • Add unit and integration testing cases. The moto and localstack-utils packages are used for unit and integration testing respectively. Also, a custom test client is created for testing retry of failed elements, which is not supported by the moto package.

See this post for more examples.