Skip to content

Releases: apache/beam

Beam 2.67.0 release

01 Aug 06:31
Compare
Choose a tag to compare
Beam 2.67.0 release Pre-release
Pre-release

We are happy to present the new 2.67.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.67.0, check out the detailed release notes.

Highlights

  • [Python] Prism runner now enabled by default for most Python pipelines using the direct runner (#34612). This may break some tests, see #34612 for details on how to handle issues.

I/Os

  • Debezium IO upgraded to 3.1.1 requires Java 17 (Java) (#34747).
  • Add support for streaming writes in IOBase (Python)
  • Implement support for streaming writes in FileBasedSink (Python)
  • Expose support for streaming writes in TextIO (Python)

New Features / Improvements

  • Added support for Processing time Timer in the Spark Classic runner (#33633).
  • Add pip-based install support for JupyterLab Sidepanel extension (#35397).
  • [IcebergIO] Create tables with a specified table properties (#35496)
  • Add support for comma-separated options in Python SDK (Python) (#35580).
    Python SDK now supports comma-separated values for experiments and dataflow_service_options,
    matching Java SDK behavior while maintaining backward compatibility.
  • Milvus enrichment handler added (Python) (#35216).
    Beam now supports Milvus enrichment handler capabilities for vector, keyword,
    and hybrid search operations.
  • [Beam SQL] Add support for DATABASEs, with an implementation for Iceberg (#35637)
  • Respect BatchSize and MaxBufferingDuration when using JdbcIO.WriteWithResults. Previously, these settings were ignored (#35669).

Breaking Changes

  • [Python] Prism runner now enabled by default for most Python pipelines using the direct runner (#34612). This may break some tests, see #34612 for details on how to handle issues.
  • Go: The pubsubio.Read transform now accepts ReadOptions as a value type instead of a pointer, and requires exactly one of Topic or Subscription to be set (they are mutually exclusive). Additionally, the ReadOptions struct now includes a Topic field for specifying the topic directly, replacing the previous topic parameter in the Read function signature ([#35369])(#35369).
  • SQL: The ParquetTable external table provider has changed its handling of the LOCATION property. To read from a directory, the path must now end with a trailing slash (e.g., LOCATION '/path/to/data/'). Previously, a trailing slash was not required. This change was made to enable support for glob patterns and single-file paths ([#35582])(#35582).

Bugfixes

  • [YAML] Fixed handling of missing optional fields in JSON parsing (#35179).
  • [Python] Fix WriteToBigQuery transform using CopyJob does not work with WRITE_TRUNCATE write disposition (#34247)
  • [Python] Fixed dicomio tags mismatch in integration tests (#30760).
  • [Java] Fixed spammy logging issues that affected versions 2.64.0 to 2.66.0.

Known Issues

  • (#35666). YAML Flatten incorrectly drops fields when input PCollections' schema are different. This issue exists for all versions since 2.52.0.

List of Contributors

According to git shortlog, the following people contributed to the 2.66.0 release. Thank you to all contributors!

Aditya Shukla, Ahmed Abualsaud, Arun Pandian, Boris Li, Chamikara Jayalath, Charles Nguyen, Chenzo, Danny McCormick, David Adeniji, Derrick Williams, Dmytro Tsyliuryk, Dustin Rhodes, Enrique Calderon, Gottipati Gautam, Hai Joey Tran, Hunor Portik, Jack McCluskey, Kenneth Knowles, Khorbaladze A., Marcio Sugar, Minh Son Nguyen, Mohamed Awnallah, Nathaniel Young, Nhon Dinh, Quentin Sommer, Rafael Raposo, Rakesh Kumar, Razvan Culea, Reuven Lax, Robert Bradshaw, Sam Whittle, Shunping Huang, Steven van Rossum, Talat UYARER, Tanu Sharma, Tarun Annapareddy, Tobi Kaymak, Tobias Kaymak, Valentyn Tymofieiev, Veronica Wasson, Vitaly Terentyev, XQ Hu, Yi Hu, akashorabek, arnavarora2004, changliiu, claudevdm, fozzie15, mvhensbergen, twosom

Beam 2.66.0 release

18 Jun 19:05
Compare
Choose a tag to compare

We are happy to present the new 2.66.0 release of Beam.
This release includes both improvements and new functionality.

For more information on changes in 2.66.0, check out the detailed release notes.

Beam 3.0.0 Development Highlights

  • [Java] Java 8 support is now deprecated. It is still supported until Beam 3.
    From now, pipeline submitted by Java 8 client uses Java 11 SDK container for
    remote pipeline execution (35064).

Highlights

  • [Python] Several quality-of-life improvements to the vLLM model handler. If you use Beam RunInference with vLLM model handlers, we strongly recommend updating past this release.

I/Os

  • [IcebergIO] Now available with Beam SQL! (#34799)
  • [IcebergIO] Support reading with column pruning (#34856)
  • [IcebergIO] Support reading with pushdown filtering (#34827)
  • [IcebergIO] Create tables with a specified partition spec (#34966, #35268)
  • [IcebergIO] Dynamically create namespaces if needed (#35228)

New Features / Improvements

  • [Beam SQL] Introducing Beam Catalogs (#35223)
  • Adding Google Storage Requests Pays feature (Golang)(#30747).
  • [Python] Prism runner now auto-enabled for some Python pipelines using the direct runner (#34921).
  • [YAML] WriteToTFRecord and ReadFromTFRecord Beam YAML support
  • Python: Added JupyterLab 4.x extension compatibility for enhanced notebook integration (#34495).

Breaking Changes

  • Yapf version upgraded to 0.43.0 for formatting (Python) (#34801).
  • Python: Added JupyterLab 4.x extension compatibility for enhanced notebook integration (#34495).
  • Python: Argument abbreviation is no longer enabled within Beam. If you previously abbreviated arguments (e.g. --r for --runner), you will now need to specify the whole argument (#34934).
  • Java: Users of ReadFromKafkaViaSDF transform might encounter pipeline graph compatibility issues when updating the pipeline. To mitigate, set the updateCompatibilityVersion option to the SDK version used for the original pipeline, example --updateCompatabilityVersion=2.64.0
  • Python: Updated AlloyDBVectorWriterConfig API to align with new PostgresVectorWriter transform. Heres a quick guide to update your code: (#35225)

Bugfixes

  • (Java) Fixed CassandraIO ReadAll does not let a pipeline handle or retry exceptions (#34191).
  • [Python] Fixed vLLM model handlers breaking Beam logging. (#35053).
  • [Python] Fixed vLLM connection leaks that caused a throughput bottleneck and underutilization of GPU (#35053).
  • [Python] Fixed vLLM server recovery mechanism in the event of a process termination (#35234).
  • (Python) Fixed cloudpickle overwriting class states every time loading a same object of dynamic class (#35062).
  • [Python] Fixed pip install apache-beam[interactive] causes crash on google colab (#35148).
  • [IcebergIO] Fixed Beam <-> Iceberg conversion logic for arrays of structs and maps of structs (#35230).

Known Issues

N/A

List of Contributors

According to git shortlog, the following people contributed to the 2.66.0 release. Thank you to all contributors!

Aditya Yadav, Adrian Stoll, Ahmed Abualsaud, Bhargavkonidena, Chamikara Jayalath, Charles Nguyen, Chenzo, Damon, Danny McCormick, Derrick Williams, Enrique Calderon, Hai Joey Tran, Jack McCluskey, Kenneth Knowles, Leonardo Cesar Borges, Michael Gruschke, Minbo Bae, Minh Son Nguyen, Niel Markwick, Radosław Stankiewicz, Rakesh Kumar, Robert Bradshaw, S. Veyrié, Sam Whittle, Shubham Jaiswal, Shunping Huang, Steven van Rossum, Tanu Sharma, Vardhan Thigle, Vitaly Terentyev, XQ Hu, Yi Hu, akashorabek, atask-g, atognolag, bullet03, changliiu, claudevdm, fozzie15, ikarapanca, kristynsmith, Pablo Rodriguez Defino, tvalentyn, twosom, wollowizard

Beam 2.65.0 Release

12 May 19:13
Compare
Choose a tag to compare

We are happy to present the new 2.65.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.65.0, check out the detailed release notes.

Highlights

I/Os

  • Upgraded GoogleAdsAPI to v19 for GoogleAdsIO (Java) (#34497). Changed PTransform method from version-specified (v17()) to current() for better backward compatibility in the future.
  • Added support for writing to Pubsub with ordering keys (Java) (#21162)

New Features / Improvements

  • Added support for streaming side-inputs in the Spark Classic runner (#18136).

Breaking Changes

  • [Python] Cloudpickle is set as the default pickle_library, where previously
    dill was the default in #34695.
    For known issues, reporting new issues, and understanding cloudpickle
    behavior refer to #34903.
  • [Python] Reshuffle now preserves PaneInfo, where previously PaneInfo was lost
    after reshuffle. To opt out of this change, set the
    update_compatibility_version to a previous Beam version e.g. "2.64.0".
    (#34348).
  • [Python] PaneInfo is encoded by PaneInfoCoder, where previously PaneInfo was
    encoded with FastPrimitivesCoder falling back to PickleCoder. This only
    affects cases where PaneInfo is directly stored as an element.
    (#34824).
  • [Python] BigQueryFileLoads now adds a Reshuffle before triggering load jobs.
    This fixes a bug where there can be data loss in a streaming pipeline if there
    is a pending load job during autoscaling. To opt out of this change, set the
    update_compatibility_version to a previous Beam version e.g. "2.64.0".
    (#34657)
  • [YAML] Kafka source and sink will be automatically replaced with compatible managed transforms.
    For older Beam versions, streaming update compatiblity can be maintained by specifying the pipeline
    option update_compatibility_version (#34767).

Deprecations

  • Beam ZetaSQL is deprecated and will be removed no earlier than Beam 2.68.0 (#34423).
    Users are recommended to switch to Calcite SQL dialect.

Bugfixes

  • Fixed read Beam rows from cross-lang transform (for example, ReadFromJdbc) involving negative 32-bit integers incorrectly decoded to large integers (#34089)
  • (Java) Fixed SDF-based KafkaIO (ReadFromKafkaViaSDF) to properly handle custom deserializers that extend Deserializer interface(#34505)
  • [Python] TypedDict typehints are now compatible with Mapping and Dict type annotations.

Security Fixes

Known Issues

N/A

List of Contributors

According to git shortlog, the following people contributed to the 2.65.0 release. Thank you to all contributors!

Aaron Trelstad, Adrian Stoll, Ahmed Abualsaud, akashorabek, Arun Pandian, Bentsi Leviav, Bryan Dang, Celeste Zeng, Chamikara Jayalath, claudevdm, Danny McCormick, Derrick Williams, Ozzie Fernandez, Gabija Balvociute, Gayatri Kate, illoise, Jack McCluskey, Jan Lukavský, Jinho Lee, Justin Bandoro, Kenneth Knowles, XQ Hu, Luke Tsekouras, Martin Trieu, Matthew Suozzo, Naireen Hussain, Niel Markwick, Radosław Stankiewicz, Razvan Culea, Robert Bradshaw, Robert Burke, RuiLong J., Sam Whittle, Sarthak, Shubham Jaiswal, Shunping Huang, Steven van Rossum, Suvrat Acharya, [email protected], Talat Uyarer, TanuSharma2511, Tobias Kaymak, Tom Stepp, Valentyn Tymofieiev, twosom, Vitaly Terentyev, wollowizard, Yi Hu, Yifan Ye, Zilin Du

Beam 2.64.0 release

31 Mar 14:35
Compare
Choose a tag to compare

We are happy to present the new 2.64.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.64.0, check out the detailed release notes.

Highlights

I/Os

  • [Java] Use API compatible with both com.google.cloud.bigdataoss:util 2.x and 3.x in BatchLoads (#34105)
  • [IcebergIO] Added new CDC source for batch and streaming, available as Managed.ICEBERG_CDC (#33504)
  • [IcebergIO] Address edge case where bundle retry following a successful data commit results in data duplication (#34264)

New Features / Improvements

  • [Python] Support custom coders in Reshuffle (#29908, #33356).
  • [Java] Upgrade SLF4J to 2.0.16. Update default Spark version to 3.5.0. (#33574)
  • [Java] Support for --add-modules JVM option is added through a new pipeline option JdkAddRootModules. This allows extending the module graph with optional modules such as SDK incubator modules. Sample usage: <pipeline invocation> --jdkAddRootModules=jdk.incubator.vector (#30281).
  • Managed API for Java and Python supports key I/O connectors Iceberg, Kafka, and BigQuery.
  • Prism now supports event time triggers for most common cases. (#31438)
    • Prism does not yet support triggered side inputs, or triggers on merging windows (such as session windows).

Breaking Changes

  • [Python] Reshuffle now correctly respects user-specified type hints, fixing a previous bug where it might use FastPrimitivesCoder wrongly. This change could break pipelines with incorrect type hints in Reshuffle. If you have issues after upgrading, temporarily set update_compatibility_version to a previous Beam version to use the old behavior. The recommended solution is to fix the type hints in your code. (#33932)
  • [Java] SparkReceiver 2 has been moved to SparkReceiver 3 that supports Spark 3.x. (#33574)
  • [Python] Correct parsing of collections.abc.Sequence type hints was added, which can lead to pipelines failing type hint checks that were previously passing erroneously. These issues will be most commonly seen trying to consume a PCollection with a Sequence type hint after a GroupByKey or a CoGroupByKey. (#33999.

Bugfixes

  • (Python) Fixed occasional pipeline stuckness that was affecting Python 3.11 users (#33966).
  • (Java) Fixed TIME field encodings for BigQuery Storage API writes on GenericRecords (#34059).
  • (Java) Fixed a race condition in JdbcIO which could cause hangs trying to acquire a connection (#34058).
  • (Java) Fix BigQuery Storage Write compatibility with Avro 1.8 (#34281).
  • Fixed checkpoint recovery and streaming behavior in Spark Classic and Portable runner's Flatten transform by replacing queueStream with SingleEmitInputDStream (#34080, #18144, #20426)
  • (Java) Fixed Read caching of UnboundedReader objects to effectively cache across multiple DoFns and avoid checkpointing unstarted reader. #34146 #33901

List of Contributors

According to git shortlog, the following people contributed to the 2.64.0 release. Thank you to all contributors!

Ahmed Abualsaud
akashorabek
Arun Pandian
Bentsi Leviav
Chamikara Jayalath
Charles Nguyen
Claire McGinty
claudevdm
Damon
Danny McCormick
darshan-sj
Derrick Williams
fozzie15
Hai Joey Tran
Jack McCluskey
Jozef Vilcek
jrmccluskey
Kenneth Knowles
Liam Miller-Cushon
liferoad
Luv Agarwal
martin trieu
Matar
Matthew Suozzo
Michel Davit
Minbo Bae
Mohamed Awnallah
Naireen Hussain
Pablo Rodriguez Defino
Radosław Stankiewicz
Rakesh Kumar
Reuven Lax
Robert Bradshaw
Robert Burke
Rohit
Rohit Sinha
Sam Whittle
Saumil Patel
Shunping Huang
So-shi Nakachi
Steven van Rossum
Suvrat Acharya
Svetak Sundhar
synenka
Talat UYARER
tvalentyn
twosom
utkarshparekh
Vitaly Terentyev
XQ Hu
Yi Hu
Zilin Du

Beam 2.63.0 release

18 Feb 15:31
Compare
Choose a tag to compare

We are happy to present the new 2.63.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.63.0, check out the detailed release notes.

I/Os

  • Support gcs-connector 3.x+ in GcsUtil (#33368)
  • Support for X source added (Java/Python) (#X).
  • Introduced --groupFilesFileLoad pipeline option to mitigate side-input related issues in BigQueryIO
    batch FILE_LOAD on certain runners (including Dataflow Runner V2) (Java) (#33587).

New Features / Improvements

  • Add BigQuery vector/embedding ingestion and enrichment components to apache_beam.ml.rag (Python) (#33413).
  • Upgraded to protobuf 4 (Java) (#33192).
  • [GCSIO] Added retry logic to each batch method of the GCS IO (Python) (#33539)
  • [GCSIO] Enable recursive deletion for GCSFileSystem Paths (Python) (#33611).
  • External, Process based Worker Pool support added to the Go SDK container. (#33572)
  • Support the Process Environment for execution in the Go SDK. (#33651)
  • Prism
    • Prism now uses the same single port for both pipeline submission and execution on workers. Requests are differentiated by worker-id. (#33438)
      • This avoids port starvation and provides clarity on port use when running Prism in non-local environments.
    • Support for @RequiresTimeSortedInputs added. (#33513)
    • Initial support for AllowedLateness added. (#33542)
    • The Go SDK's inprocess Prism runner (AKA the Go SDK default runner) now supports non-loopback mode environment types. (#33572)
    • Support the Process Environment for execution in Prism (#33651)
    • Support the AnyOf Environment for execution in Prism (#33705)
      • This improves support for developing Xlang pipelines, when using a compatible cross language service.
  • Partitions are now configurable for the DaskRunner in the Python SDK (#33805).
  • [Dataflow Streaming] Enable Windmill GetWork Response Batching by default (#33847).
    • With this change user workers will request batched GetWork responses from backend and backend will send multiple WorkItems in the same response proto.
    • The feature can be disabled by passing --windmillRequestBatchedGetWorkResponse=false

Breaking Changes

  • AWS V1 I/Os have been removed (Java). As part of this, x-lang Python Kinesis I/O has been updated to consume the V2 IO and it also no longer supports setting producer_properties (#33430).
  • Upgraded to protobuf 4 (Java) (#33192), but forced Debezium IO to use protobuf 3 (#33541 because Debezium clients are not protobuf 4 compatible. This may cause conflicts when using clients which are only compatible with protobuf 4.
  • Minimum Go version for Beam Go updated to 1.22.10 (#33609)

Bugfixes

  • Fix data loss issues when reading gzipped files with TextIO (Python) (#18390, #31040).
  • [BigQueryIO] Fixed an issue where Storage Write API sometimes doesn't pick up auto-schema updates (#33231)
  • Prism
    • Fixed an edge case where Bundle Finalization might not become enabled. (#33493).
    • Fixed session window aggregation, which wasn't being performed per-key. (#33542).)
  • [Dataflow Streaming Appliance] Fixed commits failing with KeyCommitTooLargeException when a key outputs >180MB of results. #33588.
  • Fixed a Dataflow template creation issue that ignores template file creation errors (Java) (#33636)
  • Correctly documented Pane Encodings in the portability protocols (#33840).
  • Fixed the user mailing list address (#26013).
  • [Dataflow Streaming] Fixed an issue where Dataflow Streaming workers were reporting lineage metrics as cumulative rather than delta. (#33691)

List of Contributors

According to git shortlog, the following people contributed to the 2.62.0 release. Thank you to all contributors!

Ahmed Abualsaud,
Alex Merose,
Andrej Galad,
Andrew Crites,
Arun Pandian,
Bartosz Zablocki,
Chamikara Jayalath,
Claire McGinty,
Clay Johnson,
Damon Douglas,
Danish Amjad,
Danny McCormick,
Deep1998,
Derrick Williams,
Dmitry Labutin,
Dmytro Sadovnychyi,
Eduardo Ramírez,
Filipe Regadas,
Hai Joey Tran,
Jack McCluskey,
Jan Lukavský,
Jeff Kinard,
Jozef Vilcek,
Julien Tournay,
Kenneth Knowles,
Michel Davit,
Miguel Trigueira,
Minbo Bae,
Mohamed Awnallah,
Mohit Paddhariya,
Nahian-Al Hasan,
Naireen Hussain,
Niall Pemberton,
Radosław Stankiewicz,
Razvan Culea,
Robert Bradshaw,
Robert Burke,
Rohit Sinha,
S. Veyrié,
Sam Whittle,
Sergei Lilichenko,
Shingo Furuyama,
Shunping Huang,
Thiago Nunes,
Tim Heckman,
Tobias Bredow,
Tom Stepp,
Tony Tang,
VISHESH TRIPATHI,
Vitaly Terentyev,
Yi Hu,
XQ Hu,
akashorabek,
claudevdm

Beam 2.62.0 release

13 Jan 15:35
Compare
Choose a tag to compare

We are happy to present the new 2.62.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.62.0, check out the detailed release notes.

New Features / Improvements

  • Added support for stateful processing in Spark Runner for streaming pipelines. Timer functionality is not yet supported and will be implemented in a future release (#33237).
  • The datetime module is now available for use in jinja templatization for yaml.
  • Improved batch performance of SparkRunner's GroupByKey (#20943).
  • Support OnWindowExpiration in Prism (#32211).
    • This enables initial Java GroupIntoBatches support.
  • Support OrderedListState in Prism (#32929).

I/Os

  • gcs-connector config options can be set via GcsOptions (Java) (#32769).
  • [Managed Iceberg] Support partitioning by time (year, month, day, hour) for types date, time, timestamp, and timestamp(tz) (#32939)
  • Upgraded the default version of Hadoop dependencies to 3.4.1. Hadoop 2.10.2 is still supported (Java) (#33011).
  • [BigQueryIO] Create managed BigLake tables dynamically (#33125)

Breaking Changes

  • Upgraded ZetaSQL to 2024.11.1 (#32902). Java11+ is now needed if Beam's ZetaSQL component is used.

Bugfixes

  • Fixed EventTimeTimer ordering in Prism. (#32222).
  • [Managed Iceberg] Fixed a bug where DataFile metadata was assigned incorrect partition values (#33549).

Security Fixes

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

List of Contributors

According to git shortlog, the following people contributed to the 2.62.0 release. Thank you to all contributors!

Ahmed Abualsaud, Ahmet Altay, Alex Merose, Andrew Crites, Arnout Engelen, Attila Doroszlai, Bartosz Zablocki, Chamikara Jayalath, Claire McGinty, Claude van der Merwe, Damon Douglas, Danny McCormick, Gabija Balvociute, Hai Joey Tran, Hakampreet Singh Pandher, Ian Sullivan, Jack McCluskey, Jan Lukavský, Jeff Kinard, Jeffrey Kinard, Laura Detmer, Kenneth Knowles, Martin Trieu, Mattie Fu, Michel Davit, Naireen Hussain, Nick Anikin, Radosław Stankiewicz, Ravi Magham, Reeba Qureshi, Robert Bradshaw, Robert Burke, Rohit Sinha, S. Veyrié, Sam Whittle, Shingo Furuyama, Shunping Huang, Svetak Sundhar, Valentyn Tymofieiev, Vlado Djerek, XQ Hu, Yi Hu, twosom

Beam 2.61.0 release

14 Nov 14:46
Compare
Choose a tag to compare

We are happy to present the new 2.61.0 release of Beam.
This release includes both improvements and new functionality.

For more information on changes in 2.61.0, check out the detailed release notes.

Highlights

  • [Python] Introduce Managed Transforms API (#31495)
  • Flink 1.19 support added (#32648)

I/Os

  • [Managed Iceberg] Support creating tables if needed (#32686)
  • [Managed Iceberg] Now available in Python SDK (#31495)
  • [Managed Iceberg] Add support for TIMESTAMP, TIME, and DATE types (#32688)
  • BigQuery CDC writes are now available in Python SDK, only supported when using StorageWrite API at least once mode (#32527)
  • [Managed Iceberg] Allow updating table partition specs during pipeline runtime (#32879)
  • Added BigQueryIO as a Managed IO (#31486)
  • Support for writing to Solace messages queues (SolaceIO.Write) added (Java) (#31905).

New Features / Improvements

  • Added support for read with metadata in MqttIO (Java) (#32195)
  • Added support for processing events which use a global sequence to "ordered" extension (Java) #32540
  • Add new meta-transform FlattenWith and Tee that allow one to introduce branching
    without breaking the linear/chaining style of pipeline construction.
  • Use Prism as a fallback to the Python Portable runner when running a pipeline with the Python Direct runner (#32876)

Deprecations

  • Removed support for Flink 1.15 and 1.16
  • Removed support for Python 3.8

Bugfixes

  • (Java) Fixed tearDown not invoked when DoFn throws on Portable Runners (#18592, #31381).
  • (Java) Fixed protobuf error with MapState.remove() in Dataflow Streaming Java Legacy Runner without Streaming Engine (#32892).
  • Adding flag to support conditionally disabling auto-commit in JdbcIO ReadFn (#31111)

Known Issues

N/A

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

List of Contributors

According to git shortlog, the following people contributed to the 2.60.0 release. Thank you to all contributors!

Ahmed Abualsaud, Ahmet Altay, Arun Pandian, Ayush Pandey, Chamikara Jayalath, Chris Ashcraft, Christoph Grotz, DKPHUONG, Damon, Danny Mccormick, Dmitry Ulyumdzhiev, Ferran Fernández Garrido, Hai Joey Tran, Hyeonho Kim, Idan Attias, Israel Herraiz, Jack McCluskey, Jan Lukavský, Jeff Kinard, Jeremy Edwards, Joey Tran, Kenneth Knowles, Maciej Szwaja, Manit Gupta, Mattie Fu, Michel Davit, Minbo Bae, Mohamed Awnallah, Naireen Hussain, Rebecca Szper, Reeba Qureshi, Reuven Lax, Robert Bradshaw, Robert Burke, S. Veyrié, Sam Whittle, Sergei Lilichenko, Shunping Huang, Steven van Rossum, Tan Le, Thiago Nunes, Vitaly Terentyev, Vlado Djerek, Yi Hu, claudevdm, fozzie15, johnjcasey, kushmiD, liferoad, martin trieu, pablo rodriguez defino, razvanculea, s21lee, tvalentyn, twosom

Beam 2.60.0 release

16 Oct 17:47
Compare
Choose a tag to compare

We are happy to present the new 2.60.0 release of Beam.
This release includes both improvements and new functionality.

For more information on changes in 2.60.0, check out the detailed release notes.

Highlights

  • Added support for using vLLM in the RunInference transform (Python) (#32528)
  • [Managed Iceberg] Added support for streaming writes (#32451)
  • [Managed Iceberg] Added auto-sharding for streaming writes (#32612)
  • [Managed Iceberg] Added support for writing to dynamic destinations (#32565)

New Features / Improvements

  • Dataflow worker can install packages from Google Artifact Registry Python repositories (Python) (#32123).
  • Added support for Zstd codec in SerializableAvroCodecFactory (Java) (#32349)
  • Added support for using vLLM in the RunInference transform (Python) (#32528)
  • Prism release binaries and container bootloaders are now being built with the latest Go 1.23 patch. (#32575)
  • Prism
    • Prism now supports Bundle Finalization. (#32425)
  • Significantly improved performance of Kafka IO reads that enable commitOffsetsInFinalize by removing the data reshuffle from SDF implementation. (#31682).
  • Added support for dynamic writing in MqttIO (Java) (#19376)
  • Optimized Spark Runner parDo transform evaluator (Java) (#32537)
  • [Managed Iceberg] More efficient manifest file writes/commits (#32666)

Breaking Changes

  • In Python, assert_that now throws if it is not in a pipeline context instead of silently succeeding (#30771)
  • In Python and YAML, ReadFromJson now override the dtype from None to
    an explicit False. Most notably, string values like "123" are preserved
    as strings rather than silently coerced (and possibly truncated) to numeric
    values. To retain the old behavior, pass dtype=True (or any other value
    accepted by pandas.read_json).
  • Users of KafkaIO Read transform that enable commitOffsetsInFinalize might encounter pipeline graph compatibility issues when updating the pipeline. To mitigate, set the updateCompatibilityVersion option to the SDK version used for the original pipeline, example --updateCompatabilityVersion=2.58.1

Deprecations

  • Python 3.8 is reaching EOL and support is being removed in Beam 2.61.0. The 2.60.0 release will warn users
    when running on 3.8. (#31192)

Bugfixes

  • (Java) Fixed custom delimiter issues in TextIO (#32249, #32251).
  • (Java, Python, Go) Fixed PeriodicSequence backlog bytes reporting, which was preventing Dataflow Runner autoscaling from functioning properly (#32506).
  • (Java) Fix improper decoding of rows with schemas containing nullable fields when encoded with a schema with equal encoding positions but modified field order. (#32388).

Known Issues

N/A

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

List of Contributors

According to git shortlog, the following people contributed to the 2.60.0 release. Thank you to all contributors!

Ahmed Abualsaud, Aiden Grossman, Arun Pandian, Bartosz Zablocki, Chamikara Jayalath, Claire McGinty, DKPHUONG, Damon Douglass, Danny McCormick, Dip Patel, Ferran Fernández Garrido, Hai Joey Tran, Hyeonho Kim, Igor Bernstein, Israel Herraiz, Jack McCluskey, Jaehyeon Kim, Jeff Kinard, Jeffrey Kinard, Joey Tran, Kenneth Knowles, Kirill Berezin, Michel Davit, Minbo Bae, Naireen Hussain, Niel Markwick, Nito Buendia, Reeba Qureshi, Reuven Lax, Robert Bradshaw, Robert Burke, Rohit Sinha, Ryan Fu, Sam Whittle, Shunping Huang, Svetak Sundhar, Udaya Chathuranga, Vitaly Terentyev, Vlado Djerek, Yi Hu, Claude van der Merwe, XQ Hu, Martin Trieu, Valentyn Tymofieiev, twosom

Beam 2.59.0 release

24 Aug 17:16
Compare
Choose a tag to compare

We are happy to present the new 2.59.0 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

For more information on changes in 2.59.0, check out the detailed release notes.

Highlights

  • Added support for setting a configureable timeout when loading a model and performing inference in the RunInference transform using with_exception_handling (#32137)
  • Initial experimental support for using Prism with the Java and Python SDKs
    • Prism is presently targeting local testing usage, or other small scale execution.
    • For Java, use 'PrismRunner', or 'TestPrismRunner' as an argument to the --runner flag.
    • For Python, use 'PrismRunner' as an argument to the --runner flag.
    • Go already uses Prism as the default local runner.

I/Os

  • Improvements to the performance of BigqueryIO when using withPropagateSuccessfulStorageApiWrites(true) method (Java) (#31840).
  • [Managed Iceberg] Added support for writing to partitioned tables (#32102)
  • Update ClickHouseIO to use the latest version of the ClickHouse JDBC driver (#32228).
  • Add ClickHouseIO dedicated User-Agent (#32252).

New Features / Improvements

  • BigQuery endpoint can be overridden via PipelineOptions, this enables BigQuery emulators (Java) (#28149).
  • Go SDK Minimum Go Version updated to 1.21 (#32092).
  • [BigQueryIO] Added support for withFormatRecordOnFailureFunction() for STORAGE_WRITE_API and STORAGE_API_AT_LEAST_ONCE methods (Java) (#31354).
  • Updated Go protobuf package to new version (Go) (#21515).
  • Added support for setting a configureable timeout when loading a model and performing inference in the RunInference transform using with_exception_handling (#32137)
  • Adds OrderedListState support for Java SDK via FnApi.
  • Initial support for using Prism from the Python and Java SDKs.

Bugfixes

  • Fixed incorrect service account impersonation flow for Python pipelines using BigQuery IOs (#32030).
  • Auto-disable broken and meaningless upload_graph feature when using Dataflow Runner V2 (#32159).
  • (Python) Upgraded google-cloud-storage to version 2.18.2 to fix a data corruption issue (#32135).
  • (Go) Fix corruption on State API writes. (#32245).

Known Issues

  • Prism is under active development and does not yet support all pipelines. See #29650 for progress.
    • In the 2.59.0 release, Prism passes most runner validations tests with the exceptions of pipelines using the following features:
      OrderedListState, OnWindowExpiry (eg. GroupIntoBatches), CustomWindows, MergingWindowFns, Trigger and WindowingStrategy associated features, Bundle Finalization, Looping Timers, and some Coder related issues such as with Python combiner packing, and Java Schema transforms, and heterogenous flatten coders. Processing Time timers do not yet have real time support.
    • If your pipeline is having difficulty with the Python or Java direct runners, but runs well on Prism, please let us know.

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

List of Contributors

According to git shortlog, the following people contributed to the 2.59.0 release. Thank you to all contributors!

Ahmed Abualsaud,Ahmet Altay,Andrew Crites,atask-g,Axel Magnuson,Ayush Pandey,Bartosz Zablocki,Chamikara Jayalath,cutiepie-10,Damon,Danny McCormick,dependabot[bot],Eddie Phillips,Francis O'Hara,Hyeonho Kim,Israel Herraiz,Jack McCluskey,Jaehyeon Kim,Jan Lukavský,Jeff Kinard,Jeffrey Kinard,jonathan-lemos,jrmccluskey,Kirill Berezin,Kiruphasankaran Nataraj,lahariguduru,liferoad,lostluck,Maciej Szwaja,Manit Gupta,Mark Zitnik,martin trieu,Naireen Hussain,Prerit Chandok,Radosław Stankiewicz,Rebecca Szper,Robert Bradshaw,Robert Burke,ron-gal,Sam Whittle,Sergei Lilichenko,Shunping Huang,Svetak Sundhar,Thiago Nunes,Timothy Itodo,tvalentyn,twosom,Vatsal,Vitaly Terentyev,Vlado Djerek,Yifan Ye,Yi Hu

Beam 2.58.1 release

16 Aug 18:44
Compare
Choose a tag to compare

We are happy to present the new 2.58.1 release of Beam.
This release includes both improvements and new functionality.
See the download page for this release.

New Features / Improvements

  • Fixed issue where KafkaIO Records read with ReadFromKafkaViaSDF are redistributed and may contain duplicates regardless of the configuration. This affects Java pipelines with Dataflow v2 runner and xlang pipelines reading from Kafka, (#32196)

Known Issues

  • Large Dataflow graphs using runner v2, or pipelines explicitly enabling the upload_graph experiment, will fail at construction time (#32159).
  • Python pipelines that run with 2.53.0-2.58.0 SDKs and read data from GCS might be affected by a data corruption issue (#32169). The issue will be fixed in 2.59.0 (#32135). To work around this, update the google-cloud-storage package to version 2.18.2 or newer.

For the most up to date list of known issues, see https://github.com/apache/beam/blob/master/CHANGES.md

List of Contributors

According to git shortlog, the following people contributed to the 2.58.1 release. Thank you to all contributors!

Danny McCormick

Sam Whittle