Kafka Streams API

A Step Beyond Hello World

Ivan Ponomarev, Synthesized/MIPT

Our plan

kafka

Lecture 1.

  1. Kafka (brief reminder) and Data Streaming

  2. Application configuration. Stateless transformations

  3. Transformations with local state

Lecture 2.

  1. Stream-table dualism and Table joins

  2. Time and window operations

kafka

Tables vs Streams

User Location

stream table animation latestLocation
Michael G. Noll. Of Streams and Tables in Kafka and Stream Processing

Tables vs Streams

Number of places visited

stream table animation numVisitedLocations
Michael G. Noll. Of Streams and Tables in Kafka and Stream Processing

Tables vs Streams

Derivative and integral

\[\huge state(now) = \int\limits_{t=0}^{now} stream(t)\, \mathrm{d}t \quad\quad stream(t) = \frac{\mathrm{d}state(t)}{\mathrm{d}t}\]

Martin Kleppmann, “Designing Data Intensive Applications”

Table-table join

\[\huge (uv)'= u'v + uv'\]
table table

Table-Table join

\[\huge (uv)'= u'v + uv'\]
table table1

Table-Table join

\[\huge (uv)'= u'v + uv'\]
table table2

Table-Table join topology

join storages

Rewriting the totalling app using KTable

KTable<String, Long> totals = input.groupByKey().aggregate(
    () -> 0L,
    (k, v, a) -> a + Math.round(v.getAmount() * v.getOdds()),
    Materialized.with(Serdes.String(), Serdes.Long())
);
$kafka-topics --zookeeper localhost --describe

Topic:
table2-demo-KSTREAM-AGGREGATE-STATE-STORE-0000000001-changelog
PartitionCount:10
ReplicationFactor:1
Configs:cleanup.policy=compact

Get a table of match scores

KStream<String, Score> scores =
    eventScores.flatMap((k, v) ->
        Stream.of(Outcome.H, Outcome.A).map(o ->
            KeyValue.pair(String.format("%s:%s", k, o), v))
            .collect(Collectors.toList()))
    .mapValues(EventScore::getScore);

KTable<String, Score> tableScores =
    scores.groupByKey(Grouped.with(...). reduce((a, b) -> b);
$kafka-topics --zookeeper localhost --describe

table2-demo-KSTREAM-REDUCE-STATE-STORE-0000000006-repartition
table2-demo-KSTREAM-REDUCE-STATE-STORE-0000000006-changelog

Demo: Combining the amount of bets with the current account

KTable<String, String> joined =
    totals.join(tableScores,
            (total, eventScore) ->
                String.format("(%s)\t%d", eventScore, total));

Co-partitioning

Join works

copart norm

Number of partitions mismatch

Join does not work (Runtime Exception)

copart diff

Participle algorithm mismatch

Join doesn’t work silently!

copart diff algorithm

GlobalKTable

Replicates everywhere entirely

GlobalKTable<...> global = streamsBuilder.globalTable("global", ...);
globalktable

Foreign Key Joins: join + ForeignKeyExtractor

fkjoin

Operations on Streams and Tables: summary

streams stateful operations

Types of joins: Table-Table

table table

Types of joins: Table-Table

table table1

Types of joins: Table-Table

table table2

Types of joins: Stream-Table

stream table

Types of joins: Stream-Stream

stream stream

Our plan

kafka

Lecture 1.

  1. Kafka (brief reminder) and Data Streaming

  2. Application configuration. Stateless transformations

  3. Transformations with local state

Lecture 2.

  1. Stream-table dualism and table joins

  2. Time and window operations

kafka

Save Timestamped values to  RocksDB

WindowKeySchema.java

static Bytes toStoreKeyBinary(byte[] serializedKey,
                              long timestamp,
                              int seqnum) {
    ByteBuffer buf = ByteBuffer.allocate(
                                serializedKey.length
                                + TIMESTAMP_SIZE
                                + SEQNUM_SIZE);
    buf.put(serializedKey);
    buf.putLong(timestamp);
    buf.putInt(seqnum);
    return Bytes.wrap(buf.array());
}

Quick retrieval of key values for a time range

timestamped record

Demo: Windowed Joins

  • "Post-scorer" is a player who tries to push the correct bet at the time of changing the score in the match

  • The time stamp of the bet and the events of the change of account must "almost coincide".

livebet

Time, Forward!

KStream<String, Bet> bets = streamsBuilder.stream(BET_TOPIC,
    Consumed.with(
            Serdes...)
            .withTimestampExtractor(

                (record, previousTimestamp) ->
                    ((Bet) record.value()).getTimestamp()

            ));

(Time can also be extracted from WallClock and RecordMetadata.)

Demo: Windowed Joins

По событию смены счёта понимаем, какая ставка будет «правильной»:

Score current = Optional.ofNullable(stateStore.get(key))
                .orElse(new Score());
stateStore.put(key, value.getScore());

Outcome currenOutcome =
    value.getScore().getHome() > current.getHome()
    ?
    Outcome.H : Outcome.A;

Demo: Windowed Joins

KStream<String, String> join = bets.join(outcomes,
    (bet, sureBet) ->

    String.format("%s %dms before goal",
                bet.getBettor(),
                sureBet.getTimestamp() - bet.getTimestamp()),
                JoinWindows.of(Duration.ofSeconds(1)).before(Duration.ZERO),
                StreamJoined.with(Serdes....
    ));

Tumbling window

TimeWindowedKStream<..., ...> windowed =
    stream.groupByKey()
        .windowedBy(TimeWindows.of(Duration.ofSeconds(20)));

Source: Kafka Streams in Action image::tumbling-window.png[width="70%"]

Tumbling window

TimeWindowedKStream<..., ...> windowed =
    stream.groupByKey()
        .windowedBy(TimeWindows.of(Duration.ofSeconds(20)));

KTable<Windowed<...>, Long> count = windowed.count();

/*
* Windowed<K> interface:
* - K key()
* - Window window()
* -- Instant startTime()
* -- Instant endTime()
*/

Hopping Window

TimeWindowedKStream<..., ...> windowed =
    stream.groupByKey()
        .windowedBy(TimeWindows.of(Duration.ofSeconds(20))
                        .advanceBy(Duration.ofSeconds(10)));

Source: Kafka Streams in Action image::hopping-window.png[width="50%"]

Session Window

SessionWindowedKStream<..., ...> windowed =
    stream.groupByKey()
        .windowedBy(SessionWindows.with(Duration.ofMinutes(5)));
streams session windows 02

Window Retention time vs. Grace Time

window retention

Sometimes you don’t need windows, but Punctuator

metronome
class MyTransformer implements Transformer<...> {
    @Override
    public void init(ProcessorContext context) {

        context.schedule(
            Duration.ofSeconds(10),
            PunctuationType.WALL_CLOCK_TIME,
            timestamp->{. . .});

    }

Our plan

kafka

Lecture 1.

  1. Kafka (brief reminder) and Data Streaming

  2. Application configuration. Stateless transformations

  3. Transformations with local state

Lecture 2.

  1. Stream-table dualism and table joins

  2. Time and window operations

kafka

It’s time to wrap up!

Kafka Streams in Action

KSIA
  • William Bejeck,
    “Kafka Streams in Action, Second Edition”, Spring 2023?

  • The first edition is out of date!

Kafka: The Definitive Guide

kafka the definitive guide
  • Gwen Shapira, Todd Palino, Rajini Sivaram, Krit Petty

  • November 2021

Other sources

Communities, conferences

Conclusions

  • Kafka StreamsAPI is a convenient abstraction over the "raw" Kafka

  • To start using, you need to understand stream processing

  • Technology is being rapidly developed

    • + live community, there is a chance to influence the process yourself

    • - public interfaces change very quickly

That’s all!

Thanks!