Apache Beam: ToString Transform

·

1 min read

Overview

If you want to convert every input element from a PCollection to a string, you should check out the ToString transforms. It can do everything from simply converting an object to a string by implicitly calling its toString() method to concatenate a kv pair with a custom delimiter.

When You Should Use the ToString Transform

You should use the ToString transforms when you want to perform the following transformations on your input data:

  1. Transform each element into a string using the Object.toString() method

  2. Transform an input element of type Iterables into a string using a delimiter

  3. Transform an input element of type KV into a string using a delimiter

How to Use the ToString Transform

Just apply the built-in transform to a PCollection of elements, lists, or KVs. The output PCollection will always have the type of the String.

Example: Convert KV to String

    // Create key-value pairs
    PCollection<KV<String, String>> pairs =
        pipeline.apply(
            Create.of(
                KV.of("fall", "apple"),
                KV.of("spring", "strawberry"),
                KV.of("winter", "orange"),
                KV.of("summer", "peach"),
                KV.of("spring", "cherry"),
                KV.of("fall", "pear")));
    // Use ToString on key-value pairs
    PCollection<String> result = pairs.apply(ToString.kvs());

    // results in a PCollection containing
    // fall,apple
    // string,strawberry
    // ...etc

Conclusion

Check out other useful transforms from the official Apache Beam documentation.