Skip to main content

Command Palette

Search for a command to run...

Apache Beam: ParDo

The General Purpose Transform

Updated
1 min read
N

Los Angeles

Overview

If you want to apply a function which transforms each element in an input collection, you should use the ParDo transform.

When You Should Use the ParDo Transform

Any time you want to perform some generic processing function on each element of a PCollection.

How to Use the ParDo Transform

To perform processing on an element using the ParDo transform, all you need is a Java class which is a subclass of the DoFn abstract class which contains your transformation logic. Your subclass must override the @processElement function with the transformation code (see below examples).

Example: Filter Word Counts with Max Length

  // The input PCollection to ParDo.
  PCollection<String> words = ...;

  // Apply a ParDo that takes maxWordLengthCutOffView as a side input.
  PCollection<String> wordsBelowCutOff =
  words.apply(ParDo
      .of(new DoFn<String, String>() {
          public void processElement(ProcessContext c) {
            String word = c.element();
            int maxLength = 10;
            if (word.length() <= maxLength) {
              c.output(word);
            }
          }
      })
  );

Conclusion

Check out other useful transforms from the official Apache Beam documentation.

Apache Beam and Google Cloud Dataflow

Part 8 of 12

Dive into the world of scalable data processing with our comprehensive series on Apache Beam and Google Cloud Dataflow.

Up next

Apache Beam: KvSwap

Swap the Keys and Values of a Key-Value Pair

More from this blog

Nikhil Rao's Blog

18 posts

For the love of Data