Apache Beam: ParDo

The General Purpose Transform

·

1 min read

Overview

If you want to apply a function which transforms each element in an input collection, you should use the ParDo transform.

When You Should Use the ParDo Transform

Any time you want to perform some generic processing function on each element of a PCollection.

How to Use the ParDo Transform

To perform processing on an element using the ParDo transform, all you need is a Java class which is a subclass of the DoFn abstract class which contains your transformation logic. Your subclass must override the @processElement function with the transformation code (see below examples).

Example: Filter Word Counts with Max Length

  // The input PCollection to ParDo.
  PCollection<String> words = ...;

  // Apply a ParDo that takes maxWordLengthCutOffView as a side input.
  PCollection<String> wordsBelowCutOff =
  words.apply(ParDo
      .of(new DoFn<String, String>() {
          public void processElement(ProcessContext c) {
            String word = c.element();
            int maxLength = 10;
            if (word.length() <= maxLength) {
              c.output(word);
            }
          }
      })
  );

Conclusion

Check out other useful transforms from the official Apache Beam documentation.