Apache Kafka

what it is and how it will change the architecture of your application

Ivan Ponomarev, Synthesized/MIPT

me
  • Tech Lead at KURS

  • ERP systems & Java background

  • Speaker at JPoint, Devoops, Heisenbug, JUG.MSK, etc.

  • Twitter @inponomarev

Our plan

kafka
  1. What is a log and why is it important

  2. What is Kafka and what can it do

  3. What is the streaming architecture and what are stream processors capable of?

  4. Kafka and JavaScript

  5. How to Try Kafka Today

kafka

What is a log?

oak log
  • Append data to the end

  • Data can not be changed

  • Read sequentially

 — What is our life?

 — What is our life?

oak log

 — A log!

How to automate a warehouse?

itembinqty

X

A

8

X

B

2

Y

B

1

All is well, until…​

emptyshelf

What could it be?

  • Stolen (broken, thrown away, etc.)

  • Lost: it’s somewhere nearby

What could it be?

  • Stolen (broken, thrown away, etc.)

  • Lost: it’s somewhere nearby

  • THIS IS YOUR DUMB PROGRAM NOT WORKING

upset cropped

What are we going to do?

  • "Current snapshot"? There’s nothing we can do.

  • "Database is a warehouse" is simple but doesn’t work

messy

How should we actually design the warehouse automation?

noway
  • "A database is a warehouse"

How should we actually design the warehouse automation?

noway
  • "A database is a warehouse"

okay
  • The database reflects the processes.

  • Current state is the result of the execution of processes

Warehouse Ledger

dateitembinqtydesciption

02.04.2020

X

A

10

initial qty

02.04.2020

X

B

2

initial qty

02.04.2020

Y

B

1

initial qty

Warehouse Ledger

dateitembinqtydesciption

02.04.2020

X

A

10

initial qty

02.04.2020

X

B

2

initial qty

02.04.2020

Y

B

1

initial qty

09.04.2020

X

A

-2

John Doe on assignment #1

Warehouse Ledger

dateitembinqtydesciption

02.04.2020

X

A

10

initial qty

02.04.2020

X

B

2

initial qty

02.04.2020

Y

B

1

initial qty

09.04.2020

X

A

-2

John Doe on assignment #1

09.04.2020

X

B

2

John Doe on assignment #1

What if we made a mistake with the accounting?

dateitembinqtydesciption

9.04.2020

X

A

-2

John Doe on assignment #1

9.04.2020

X

B

2

John Doe on assignment #1

9.04.2020

X

A

2

Assignment #1 reversal

9.04.2020

X

B

-2

Assignment #1 reversal

What questions can already be answered?

  • How much do we have in stock of Y?

  • What is in bin B?

  • How many items did John Doe move on April 9?

  • What adjustments were made to the data?

Investigating the incident

  • On April 9, John Doe was to move the items from A to B.

  • Let’s see what is in A?

  • Let’s ask John?

What requirements can be easily implemented?

  • The load on the shelf is limited to 100 kg

  •  — Add the "weight" field to the Ledger!

  • It is necessary to calculate the salary of warehouse workers

  •  — You don’t even need to add anything.

Solution architecture: log does not work alone

erp architecture

Intermediate conclusions

The presence of a log allows you to

  • Add new functionality

  • Search for correlations of events, identify and investigate fraudulent behavior

  • Correct algorithmic errors and recalculate data

  • Our life is an append-only log

Our plan

kafka
  1. What is a log and why is it important

  2. What is Kafka and what can it do

  3. What is streaming architecture and what are stream processors capable of?

  4. Kafka and JavaScript

  5. How to try Kafka today

kafka

Our plan

kafka
  1. What is a log and why is it important

  2. What is Kafka and what can it do

    1. General information

    2. How a cluster works

    3. How writing works

    4. How reading works

    5. Retention and compactification

  3. What is streaming architecture and what are stream processors capable of?

  4. Kafka and JavaScript

  5. How to try Kafka today

kafka

Kafka is

kafka logo

In Kafka you can

okay
  • Write something to a named log (topic)

  • Read entries from the topic in FIFO order (within the partition)

  • Commit the offset of the latest processed entry

You can’t do it in Kafka

noway
  • Erase a record

  • Edit a record

  • Extract a random record from the log except by offset

Our plan

kafka
  1. What is a log and why is it important

  2. What is Kafka and what can it do

    1. General information

    2. How a cluster works

    3. How writing works

    4. How reading works

    5. Retention and compactification

  3. What is streaming architecture and what are stream processors capable of?

  4. Kafka and JavaScript

  5. How to try Kafka today

kafka

Kafka Cluster: Brokers and Zookeeper

cluster anatomy

Topics, partitions and messages

topics partitions

Topics, partitions and messages

topics partitions1

Topics, partitions and messages

topics partitions2

Replication of partitions

broker topics0

Replication of partitions

broker topics1

Replication of partitions

broker topics2

Replication of partitions

broker topics3

Replication of partitions

broker topics4

Replication of partitions

broker topics5

Our plan

kafka
  1. What is a log and why is it important

  2. What is Kafka and what can it do

    1. General information

    2. How a cluster works

    3. How writing works

    4. How reading works

    5. Retention and compactification

  3. What is streaming architecture and what are stream processors capable of?

  4. Kafka and JavaScript

  5. How to try Kafka today

kafka

Anatomy of a message

message anatomy

Anatomy of a message

message anatomy2
// hash the keyBytes to choose a partition
return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;

Throughput vs Latency

batch.size and linger.ms

low throughput
high throughput

Writing to Kafka

acks = 0

 

prod1

Writing to Kafka

acks = 1

 

prod2

Writing to Kafka

acks = -1

min.insync.replicas

prod3

Our plan

kafka
  1. What is a log and why is it important

  2. What is Kafka and what can it do

    1. General information

    2. How a cluster works

    3. How writing works

    4. How reading works

    5. Retention and compactification

  3. What is streaming architecture and what are stream processors capable of?

  4. Kafka and JavaScript

  5. How to try Kafka today

kafka

Reading from Kafka

ConsumerG0

Reading from Kafka

ConsumerG

Reading from Kafka

ConsumerG2

Reading from Kafka

ConsumerG3

Offset Commit

offcommit1

Offset Commit

offcommit2

Offset Commit

offcommit3

Offset Commit

offcommit4

Offset Commit

offcommit5

Offset Commit

offcommit6

Offset Commit

offcommit7

Our plan

kafka
  1. What is a log and why is it important

  2. What is Kafka and what can it do

    1. General information

    2. How a cluster works

    3. How writing works

    4. How reading works

    5. Retention and compactification

  3. What is streaming architecture and what are stream processors capable of?

  4. Kafka and JavaScript

  5. How to try Kafka today

kafka

How Retention works

retention0

How Retention works

retention1

How Retention works

retention2

How Retention works

retention3

How Retention works

retention4

How Retention works

retention5
tapeloop

Compactification of topics

log compaction
Source: Kafka Documentation

Our plan

kafka
  1. What is a log and why is it important

  2. What is Kafka and what can it do

  3. What is streaming architecture and what are stream processors capable of?

  4. Kafka and JavaScript

  5. How to try Kafka today

kafka

Stream data processing architecture

streaming arch1

Existing stream processing frameworks

spark logo
samza logo
storm logo
flink logo
kafka logo

When I am asked which streaming framework to use

weuseflink1

When I am asked which streaming framework to use

weuseflink2

Stateless Transformation

yelling topology1.png
KStream<String, String> stream =  streamsBuilder.stream(
     SRC_TOPIC, Consumed.with(Serdes.String(), Serdes.String());

Stateless Transformation

yelling topology2.png
KStream<String, String> upperCasedStream =
    stream.mapValues(String::toUpperCase);

Stateless Transformation

yelling topology3.png
upperCasedStream.to(SINK_TOPIC,
     Produced.with(Serdes.String(), Serdes.String());

Three lines of code, and what’s the big deal?

  • More messages per second? — more instances with the same 'application.id'!

w1

Adding nodes

w2

Limited only by the number of partitions

w4

Magic of Stateful Transformation

counting topology changelog1.png

Changes are replicated to a topic!

counting topology changelog2.png

Partitioning and local state

local partitioning oneworker.png

Partitioning and local state

local partitioning 1.png

Partitioning and local state

local partitioning 2.png

Partitioning and local state

local partitioning 25.png

Партиционирование и local state

local partitioning 3.png

Partitioning and local state

local partitioning 4.png

Partitioning and local state

local partitioning 5.png

Partitioning and local state

local partitioning 6.png

What else can streams do?

Join sources!

join storages.png

Aggregate data in time windows

Source: Kafka Streams in Action image::tumbling-window.png[width="70%"]

KSQL

CREATE STREAM pageviews_enriched AS
  SELECT pv.viewtime,
         pv.userid AS userid,
         pv.pageid,
         pv.timestring,
         u.gender,
         u.regionid,
         u.interests,
         u.contactinfo
  FROM pageviews_transformed pv
  LEFT JOIN users_5part u ON pv.userid = u.userid
  EMIT CHANGES;

KSQL

CREATE TABLE pageviews_per_region_per_30secs AS
  SELECT regionid,
         count(*)
  FROM pageviews_enriched
  WINDOW TUMBLING (SIZE 30 SECONDS)
  WHERE UCASE(gender)='FEMALE' AND LCASE(regionid)='region_6'
  GROUP BY regionid
  EMIT CHANGES;

Where are streaming systems needed?

  • Monitoring! Logs!

  • Track user activity

  • Anomaly detection (including fraud detection)

streams ok
tapeloop
streams noway

Things to keep in mind

  • If you change the data schema, the migration is not similar to RDBMS.

  • Once-only delivery:

    • In normal mode, a failure in KStreams causes the data to be read again.

    • Once-only delivery mode implies moving data between Kafka topics only.

If you need to store data all the time

  • Lambda architecture

nikita salnikov

Nikita Salnikov-Tarnovsky //https://2019.jokerconf.com/2019/talks/2qw2ljhlfoeiipjf0gfzzb/[Streaming application is not only code, but also 3-4 years of support in the product]

Our plan

kafka
  1. What is a log and why is it important

  2. What is Kafka and what can it do

  3. What is streaming architecture and what are stream processors capable of?

  4. Kafka and JavaScript

  5. How to try Kafka today

kafka

Kafka and Javascript

streaming arch2

Kafka and Javascript

streaming arch3

kafka-node

kafka node
  • Leader in stars and number of uses

  • Pure JavaScript implementation

  • Limited number of features.

node-rdkafka

node rdkafka
  • Github metrics are inferior comparing to kafka-node

  • JavaScript/C++

  • Wrapper around librdkafka, which is a very mature project

  • Able to work with Confluent Cloud

Converting/output stage on node.js?

streaming arch4

Converting/output stage on node.js?

nodefluent
  • node-sinek — another client

  • kafka-streams — KStreams re-implementation for node.js

  • kafka-connect — re-implementation of kafka-connect

KSQL + Serverless

streaming arch5

Our plan

kafka
  1. What is a log and why is it important

  2. What is Kafka and what can it do

  3. What is streaming architecture and what are stream processors capable of?

  4. Kafka and JavaScript

  5. How to try Kafka today

kafka

You decided to try Kafka. Where to start?

'kafkacat' is the best CLI tool

conduktor

Conduktor — the best GUI tool

conduktor screen

It’s hard to run Kafka in production

koshelev

Kafka: The Definitive Guide

kafka the definitive guide
  • Gwen Shapira, Todd Palino, Rajini Sivaram, Krit Petty

  • November 2021

Сообщества, конференции

kafka summit

Telegram

Grefnevaya Kafka

Meetup in Moscow

Moscow Apache Kafka® Meetup by Confluent — quarterly

That’s all!

kafka logo
conduktor

Thanks!