Skip to content

[Bug]: Registering DoFns and CombineFns Seems Excessively Slow #34693

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 17 tasks
JonathanHopeDMRC opened this issue Apr 21, 2025 · 2 comments
Open
1 of 17 tasks

Comments

@JonathanHopeDMRC
Copy link

What happened?

SDK Version: 2.63
SDK Language: Go

While there is no doubt some cost to registering DoFns and CombineFns with Beam, as it stands the cost right now seems excessively high. I am working on a pipeline that currently has around 30 DoFns. I've been noticing the builds taking a very long time for a while so I spent some time debugging them. I noticed that it was the calls to beam.Register* and register.* that were slowing down the builds. To illustrate this I moved all of those calls to separate package and used actiongraph to measure the time those calls were taking relative to everything else:

❯ actiongraph -f /tmp/actiongraph6 top             
263.156s  58.50%  build pcmig/pkg/start
 26.691s  64.43%  link  pcmig/cmd/batch
 15.179s  67.81%  build github.com/apache/beam/sdks/v2/go/pkg/beam/io/fileio

As you can see those calls are 10x slower than the next slowest thing. It's possible that this is expected behavior, but I wanted to raise the issue just in case.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner
@chamikaramj
Copy link
Contributor

cc: @damccorm @jrmccluskey

@chamikaramj
Copy link
Contributor

(seems like a Go SDK issue not a YAML issue)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants