Overview
If you want to apply a function which transforms each element in an input collection, you should use the ParDo
transform.
When You Should Use the ParDo Transform
Any time you want to perform some generic processing function on each element of a PCollection.
How to Use the ParDo Transform
To perform processing on an element using the ParDo
transform, all you need is a Java class which is a subclass of the DoFn
abstract class which contains your transformation logic. Your subclass must override the @processElement
function with the transformation code (see below examples).
Example: Filter Word Counts with Max Length
// The input PCollection to ParDo.
PCollection<String> words = ...;
// Apply a ParDo that takes maxWordLengthCutOffView as a side input.
PCollection<String> wordsBelowCutOff =
words.apply(ParDo
.of(new DoFn<String, String>() {
public void processElement(ProcessContext c) {
String word = c.element();
int maxLength = 10;
if (word.length() <= maxLength) {
c.output(word);
}
}
})
);
Conclusion
Check out other useful transforms from the official Apache Beam documentation.