Apache Beam: WithKeys Transform

·

1 min read

Overview

If you need to convert a PCollection into a KV pair using dynamically generated keys derived from the input element, you should check out the WithKeys transform.

When You Should Use the WithKeys Transform

When you only want to convert a PCollection<V> to a PCollection<KV<K, V>>. You can then use aggregations or transforms your newly generated KV's.

How to Use the WithKeys Transform

Just apply the built-in Transform to any PCollection and supply a function which will take the input element as an argument and return a value which will be used as a key.

Example: Create a KV pair from a list of Words

    PCollection<String> words = pipeline.apply(Create.of("Hello", "World", "Beam", "is", "fun"));
    PCollection<KV<Integer, String>> lengthAndWord =
        words.apply(
            WithKeys.of(
                new SerializableFunction<String, Integer>() {
                  @Override
                  public Integer apply(String s) {
                    return s.length();
                  }
                }));
// returns
// KV{5, World}
// KV{5, Hello}
// ... etc

Conclusion

Check out other useful transforms from the official Apache Beam documentation.