@inponomarev
Ivan Ponomarev, Synthesized/MIPT
![]() |
|
![]() |
| ![]() |
Append data to the end
Data can not be changed
Read sequentially
— A log!
item | bin | qty |
---|---|---|
X | A | 8 |
X | B | 2 |
Y | B | 1 |
|
| ![]() |
"Current snapshot"? There’s nothing we can do.
"Database is a warehouse" is simple but doesn’t work
![]() |
|
![]() |
|
![]() |
|
date | item | bin | qty | desciption |
---|---|---|---|---|
02.04.2020 | X | A | 10 | initial qty |
02.04.2020 | X | B | 2 | initial qty |
02.04.2020 | Y | B | 1 | initial qty |
date | item | bin | qty | desciption |
---|---|---|---|---|
02.04.2020 | X | A | 10 | initial qty |
02.04.2020 | X | B | 2 | initial qty |
02.04.2020 | Y | B | 1 | initial qty |
09.04.2020 | X | A | -2 | John Doe on assignment #1 |
date | item | bin | qty | desciption |
---|---|---|---|---|
02.04.2020 | X | A | 10 | initial qty |
02.04.2020 | X | B | 2 | initial qty |
02.04.2020 | Y | B | 1 | initial qty |
09.04.2020 | X | A | -2 | John Doe on assignment #1 |
09.04.2020 | X | B | 2 | John Doe on assignment #1 |
date | item | bin | qty | desciption |
---|---|---|---|---|
9.04.2020 | X | A | -2 | John Doe on assignment #1 |
9.04.2020 | X | B | 2 | John Doe on assignment #1 |
9.04.2020 | X | A | 2 | Assignment #1 reversal |
9.04.2020 | X | B | -2 | Assignment #1 reversal |
How much do we have in stock of Y?
What is in bin B?
How many items did John Doe move on April 9?
What adjustments were made to the data?
On April 9, John Doe was to move the items from A to B.
Let’s see what is in A?
Let’s ask John?
The load on the shelf is limited to 100 kg
— Add the "weight" field to the Ledger!
It is necessary to calculate the salary of warehouse workers
— You don’t even need to add anything.
The presence of a log allows you to
Add new functionality
Search for correlations of events, identify and investigate fraudulent behavior
Correct algorithmic errors and recalculate data
Our life is an append-only log
![]() |
| ![]() |
![]() |
| ![]() |
|
![]() |
|
![]() |
|
![]() |
| ![]() |
![]() |
| ![]() |
// hash the keyBytes to choose a partition
return Utils.toPositive(Utils.murmur2(keyBytes)) % numPartitions;
batch.size
and linger.ms
![]() | ![]() |
acks
= 0
acks
= 1
acks
= -1
min.insync.replicas
![]() |
| ![]() |
![]() |
| ![]() |
![]() |
| ![]() |
KStream<String, String> stream = streamsBuilder.stream(
SRC_TOPIC, Consumed.with(Serdes.String(), Serdes.String());
KStream<String, String> upperCasedStream =
stream.mapValues(String::toUpperCase);
upperCasedStream.to(SINK_TOPIC,
Produced.with(Serdes.String(), Serdes.String());
More messages per second? — more instances with the same 'application.id'!
Join sources!
Source: Kafka Streams in Action image::tumbling-window.png[width="70%"]
CREATE STREAM pageviews_enriched AS
SELECT pv.viewtime,
pv.userid AS userid,
pv.pageid,
pv.timestring,
u.gender,
u.regionid,
u.interests,
u.contactinfo
FROM pageviews_transformed pv
LEFT JOIN users_5part u ON pv.userid = u.userid
EMIT CHANGES;
CREATE TABLE pageviews_per_region_per_30secs AS
SELECT regionid,
count(*)
FROM pageviews_enriched
WINDOW TUMBLING (SIZE 30 SECONDS)
WHERE UCASE(gender)='FEMALE' AND LCASE(regionid)='region_6'
GROUP BY regionid
EMIT CHANGES;
Monitoring! Logs!
Track user activity
Anomaly detection (including fraud detection)
If you change the data schema, the migration is not similar to RDBMS.
Once-only delivery:
In normal mode, a failure in KStreams causes the data to be read again.
Once-only delivery mode implies moving data between Kafka topics only.
Lambda architecture
![]() | Nikita Salnikov-Tarnovsky //https://2019.jokerconf.com/2019/talks/2qw2ljhlfoeiipjf0gfzzb/[Streaming application is not only code, but also 3-4 years of support in the product] |
![]() |
| ![]() |
Leader in stars and number of uses
Pure JavaScript implementation
Limited number of features.
Github metrics are inferior comparing to kafka-node
JavaScript/C++
Wrapper around librdkafka, which is a very mature project
Able to work with Confluent Cloud
![]() |
|
![]() |
| ![]() |
'kafkacat' is the best CLI tool
Conduktor — the best GUI tool
![]() |
![]() |
|
Kafka Summit Conference: https://kafka-summit.org/
Grefnevaya Kafka
Moscow Apache Kafka® Meetup by Confluent — quarterly
![]() |
Thanks!