A lightweight stream processing application that transforms raw motion and collision events from a physics simulation into actionable per-ball analytics using Kafka Streams

Project Overview

Motion Event Analytics Kstreams is a Java-based stream processing application built on Kafka Streams with Spring Boot. It consumes position and collision events generated by a physics simulation, processes them in real-time, and emits per-ball analytics to a Kafka topic.

The project bridges the gap between event generation and actionable insights. Rather than simply passing through raw data, it computes meaningful metrics such as current speed, average speed over sliding windows, and collision counts — enabling downstream monitoring, dashboards, and alerting systems. Unlike the Apache Flink version of this project, everything runs in a standalone JVM process with no cluster overhead.

Architecture

The application is organized following the Single Responsibility Principle with a clear package structure:

MotionAnalyticsStream (com.simulation.analytics)

The Spring Boot entry point that enables Kafka Streams via @EnableKafkaStreams and binds configuration properties. The main method boots the entire application context, which automatically starts the Kafka Streams topology.

AppProperties (com.simulation.analytics)

A configuration properties class annotated with @ConfigurationProperties(prefix = "app"). It reads topic names from environment variables via application.yml with sensible defaults — motion-position, motion-collision, and motion-ball-analytics.

AnalyticsTopology (com.simulation.analytics.config)

This is where the entire stream processing graph is defined. It builds a Kafka Streams topology using the StreamsBuilder DSL — two source streams consume from motion-position and motion-collision using StringSerde, JSON is parsed manually with Jackson ObjectMapper inside flatMap operators, position events compute speed via Math.hypot(vx, vy), and collision events fan out each collision into two MetricUpdate records (one per ball). Both streams are merged, grouped by a composite key (sessionId|ballId), and aggregated into a persistent KeyValueStore named "analytics-store". The aggregated state is then projected into BallAnalytics records and written to the output topic.

MetricUpdate (com.simulation.analytics.model)

A unified data model that represents metric updates from both position and collision events. Position updates carry coordinates and speed; collision updates carry a boolean flag and no positional data. This lets the pipeline treat both event types uniformly once they are converted.

AnalyticsState (com.simulation.analytics.model)

The per-key state container that holds everything needed for each ball: latest position, current speed, a deque of speed samples for the 10-second average, and a deque of collision timestamps for the 30-second count. Sliding windows are implemented manually — on every apply() call, old samples are trimmed from the front of each deque based on the current timestamp.

SpeedSample (com.simulation.analytics.model)

A simple serializable data point storing a speed measurement with its timestamp. These are collected inside AnalyticsState for the sliding average speed calculation.

JsonSerde (com.simulation.analytics.serde)

A generic Kafka Streams Serde implementation that uses Jackson ObjectMapper for JSON serialization and deserialization. It handles both the state store value and the output topic.

Data Models (com.simulation.analytics.model)

A handful of straightforward Java records: PositionEvent and CollisionEvent represent the incoming Kafka messages, while BallAnalytics is the output record that gets published back to Kafka with all computed metrics.

Data Flow

+---------------------+     +----------+     +---------------------------+     +---------------------+
|  JavaFX Simulation  |---->|  Kafka   |---->|  Kafka Streams (this)     |---->|  Kafka Analytics    |
|  (motion-event-     |     |          |     |                           |     |  Topic              |
|   producer)         |     | motion-  |     | - flatMap both streams    |     |  motion-ball-       |
|                     |     | position |     | - Composite key           |     |  analytics          |
|  Generates:         |     | motion-  |     |   sessionId|ballId        |     |                     |
|  - Position events  |     | collision|     | - aggregate() state       |     |  Consumers:         |
|  - Collision events |     +----------+     | - 10s avg speed           |     |  - Monitoring       |
+---------------------+                      | - 30s collision count     |     |  - Dashboards       |
                                             +---------------------------+     |  - Alerting         |
                                                                               +---------------------+

Processing Pipeline

The topology defined in AnalyticsTopology assembles Kafka Streams DSL operators into a processing graph. It consumes PositionEvent records from the motion-position Kafka topic using StringSerde, with JSON parsed manually in a flatMap operator. It consumes CollisionEvent records from the motion-collision topic the same way. Both event types are converted into a unified MetricUpdate stream — position events include speed computed via Math.hypot(vx, vy), and collision events produce two updates (one per ball).

The stream is keyed by sessionId|ballId for per-ball state management, grouped by key, and aggregated into AnalyticsState using a materialized KeyValueStore named "analytics-store".

Each call to AnalyticsState.apply() updates the position, appends speed or collision data, and trims samples older than the sliding window thresholds (10 seconds for speed, 30 seconds for collisions). The aggregated state is then projected into BallAnalytics records and written to the motion-ball-analytics Kafka topic. A custom JsonSerde handles serialization for both the state store and the output topic.

Metrics Emitted

Each output record contains:

  • sessionId / ballId — identifies the ball and simulation session
  • timestampMs — event timestamp
  • latestX / latestY — most recent position
  • currentSpeed — instantaneous speed (m/s)
  • avgSpeed10s — average speed over the last 10 seconds
  • collisionsCount30s — number of collisions in the last 30 seconds

Key Differences from the Apache Flink Version

While both versions solve the same problem, they differ significantly in their approach. The Flink version runs on a Flink cluster with dedicated JobManager and TaskManager processes, uses Flink's KeyedProcessFunction with managed state, and relies on Flink's built-in SlidingEventTimeWindows for windowed aggregations.

The Kafka Streams version, by contrast, runs as a standalone JVM process (java -jar or Docker), uses the Kafka Streams aggregate() operator with a persistent KeyValueStore, and implements sliding windows manually by trimming deques inside AnalyticsState. Serialization in the Flink version uses Flink's type system and POJO serializers, while the Kafka Streams version uses a custom JsonSerde built on Jackson.

Configuration shifts from pure environment variables to Spring Boot's application.yml with env-var overrides, making the Kafka Streams version feel more like a conventional microservice.

Prerequisites

  • Java 21 (required, as specified by maven.compiler.release=21 in pom.xml)
  • Kafka Streams(bundled as Maven dependencies)
  • Apache Kafka (external, for event streaming)
  • Maven (for building the project)

Clone and Build

Package the fat JAR using Maven:

git clone https://github.com/wagnerjfr/motion-event-analytics-kstreams.git
cd motion-event-analytics-kstreams
mvn -DskipTests clean package

The output JAR is created at ./target/motion-event-analytics-kstreams.jar.

Configuration

All settings are configured in src/main/resources/application.yml with environment variable overrides:

spring:
  kafka:
    streams:
      application-id: motion-analytics-kstreams
      bootstrap-servers: ${KAFKA_BOOTSTRAP_SERVERS:localhost:9092}
      properties:
        auto.offset.reset: latest
        cache.max.bytes.buffering: 0
        commit.interval.ms: 100
app:
  topics:
    position: ${KAFKA_TOPIC_POSITION:motion-position}
    collision: ${KAFKA_TOPIC_COLLISION:motion-collision}
    analytics: ${KAFKA_TOPIC_ANALYTICS:motion-ball-analytics}

Non-default Kafka Streams properties include auto.offset.reset: latest (only new messages consumed), cache.max.bytes.buffering: 0 (no caching for minimal latency), and commit.interval.ms: 100 (frequent offset commits). The application ID is motion-analytics-kstreams.

Infrastructure Setup

The project reuses the same Kafka infrastructure as the Flink version. No cluster is needed — just a Kafka broker and the application JAR.

Create the Docker network (one-time setup)

docker network create kafka-net

Start Kafka (with dual listeners for host and Docker access)

docker run -d --name kafka --network kafka-net -p 9092:9092 \
  -e KAFKA_NODE_ID=1 \
  -e KAFKA_PROCESS_ROLES=broker,controller \
  -e KAFKA_LISTENERS=PLAINTEXT_HOST://:9092,PLAINTEXT_DOCKER://:29092,CONTROLLER://:9093 \
  -e KAFKA_ADVERTISED_LISTENERS=PLAINTEXT_HOST://localhost:9092,PLAINTEXT_DOCKER://kafka:29092 \
  -e KAFKA_INTER_BROKER_LISTENER_NAME=PLAINTEXT_DOCKER \
  -e KAFKA_CONTROLLER_LISTENER_NAMES=CONTROLLER \
  -e KAFKA_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT,PLAINTEXT_DOCKER:PLAINTEXT \
  -e KAFKA_CONTROLLER_QUORUM_VOTERS=1@kafka:9093 \
  -e KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR=1 \
  -e KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR=1 \
  -e KAFKA_TRANSACTION_STATE_LOG_MIN_ISR=1 \
  -e KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS=0 \
  apache/kafka:3.8.0

Create required Kafka topics

docker exec -it kafka /opt/kafka/bin/kafka-topics.sh --create --topic motion-position \
  --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1
docker exec -it kafka /opt/kafka/bin/kafka-topics.sh --create --topic motion-collision \
  --bootstrap-server localhost:9092 --partitions 3 --replication-factor 1

Run the Kafka Streams application (from host)

KAFKA_BOOTSTRAP_SERVERS=localhost:9092 java -jar target/motion-event-analytics-kstreams.jar

Or using Docker:

docker build -t motion-event-analytics-kstreams .
docker run -d --name motion-analytics --network kafka-net \
  -e KAFKA_BOOTSTRAP_SERVERS=kafka:29092 motion-event-analytics-kstreams

Generate events with the producer

Clone and run the motion-event-producer project:

git clone git@github.com:wagnerjfr/motion-event-producer.git
cd motion-event-producer
mvn -DskipTests compile

Run the producer:

APP_TRANSPORT_MODE=kafka \
KAFKA_BOOTSTRAP_SERVERS=localhost:9092 \
mvn javafx:run

Verify analytics output

docker exec -it kafka /opt/kafka/bin/kafka-console-consumer.sh --topic motion-ball-analytics \
  --bootstrap-server localhost:9092 --from-beginning

Output:

None

Why Kafka Streams?

Kafka Streams is ideal for this use case because it provides:

No cluster required: Unlike Flink, which needs a JobManager and TaskManager cluster, Kafka Streams runs as a library inside any JVM process. Deployment is as simple as java -jar or a Docker container.

Stateful processing via state stores: Tracking per-ball state (latest position, speed history, collision history) requires maintaining state across events. Kafka Streams' persistent KeyValueStore handles this reliably with local RocksDB-backed storage and changelog topics for fault tolerance.

Native Kafka integration: Kafka Streams builds directly on the Kafka consumer and producer APIs. No separate connector or bridge is needed — the topology reads from and writes to Kafka topics natively.

Embeddable and ops-friendly: Spring Boot integration makes this application look like any other microservice. Operations teams can monitor it with existing tools — health checks, metrics, logging — without learning a new cluster management interface.

Low-latency processing: With caching disabled and frequent offset commits, the application processes events with minimal delay. Events generated by the simulation at ~100ms intervals are reflected in analytics output within milliseconds.

Exactly-once semantics: Kafka Streams provides exactly-once processing guarantees out of the box, ensuring that each event is processed precisely once even in the face of failures or restarts.

Further Readings

📣 Call to Action

If you are interested in following along with my journey, I invite you to dive into all the details provided below:

Thanks for reading