Apache Beam: ToString Transform
Overview
If you want to convert every input element from a PCollection to a string, you should check out the ToString
transforms. It can do everything from simply converting an object to a string by implicitly calling its toString()
method to concatenate a kv pair with a custom delimiter.
When You Should Use the ToString Transform
You should use the ToString
transforms when you want to perform the following transformations on your input data:
Transform each element into a string using the
Object.toString()
methodTransform an input element of type
Iterables
into a string using a delimiterTransform an input element of type
KV
into a string using a delimiter
How to Use the ToString Transform
Just apply the built-in transform to a PCollection of elements, lists, or KVs. The output PCollection will always have the type of the String
.
Example: Convert KV to String
// Create key-value pairs
PCollection<KV<String, String>> pairs =
pipeline.apply(
Create.of(
KV.of("fall", "apple"),
KV.of("spring", "strawberry"),
KV.of("winter", "orange"),
KV.of("summer", "peach"),
KV.of("spring", "cherry"),
KV.of("fall", "pear")));
// Use ToString on key-value pairs
PCollection<String> result = pairs.apply(ToString.kvs());
// results in a PCollection containing
// fall,apple
// string,strawberry
// ...etc
Conclusion
Check out other useful transforms from the official Apache Beam documentation.