Streams Developer Guide on Apache Kafka

Streams Developer Guide on Apache Kafkahttps://kafka.apache.org/43/streams/developer-guide/Recent content in Streams Developer Guide on Apache KafkaHugo -- gohugo.ioenWriting a Streams Applicationhttps://kafka.apache.org/43/streams/developer-guide/write-streams-app/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/43/streams/developer-guide/write-streams-app/Any Java or Scala application that makes use of the Kafka Streams library is considered a Kafka Streams application. The computational logic of a Kafka Streams application is defined as a processor topology, which is a graph of stream processors (nodes) and streams (edges). You can define the processor topology with the Kafka Streams APIs: Kafka Streams DSL A high-level API that provides the most common data transformation operations such as map, filter, join, and aggregations out of the box.Configuring a Streams Applicationhttps://kafka.apache.org/43/streams/developer-guide/config-streams/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/43/streams/developer-guide/config-streams/Kafka and Kafka Streams configuration options must be configured before using Streams. You can configure Kafka Streams by specifying parameters in a java.util.Properties instance. Create a java.util.Properties instance. Set the parameters. For example: import java.util.Properties; import org.apache.kafka.streams.StreamsConfig; Properties settings = new Properties(); // Set a few key parameters settings.put(StreamsConfig.APPLICATION_ID_CONFIG, "my-first-streams-application"); settings.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "kafka-broker1:9092"); // Any further settings settings.put(... , ...); Configuration parameter reference This section contains the most common Streams configuration parameters.Streams DSLhttps://kafka.apache.org/43/streams/developer-guide/dsl-api/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/43/streams/developer-guide/dsl-api/The Kafka Streams DSL (Domain Specific Language) is built on top of the Streams Processor API. It is recommended for most users, especially beginners. Most data processing operations can be expressed in just a few lines of DSL code. * Aggregating * Joining * Join co-partitioning requirements * KStream-KStream Join * KTable-KTable Equi-Join * KTable-KTable Foreign-Key Join * KStream-KTable Join * KStream-GlobalKTable Join * Windowing * Hopping time windows * Tumbling time windows * Sliding time windows * Session Windows * Window Final Results * Applying processors (Processor API integration) * Transformers removal and migration to processors Naming Operators in a Streams DSL application Controlling KTable update rate Using timestamp-based semantics for table processors Writing streams back to Kafka Testing a Streams application Kafka Streams DSL for Scala Sample Usage Implicit Serdes User-Defined Serdes Overview In comparison to the Processor API, only the DSL supports:Processor APIhttps://kafka.apache.org/43/streams/developer-guide/processor-api/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/43/streams/developer-guide/processor-api/The Processor API allows developers to define and connect custom processors and to interact with state stores. With the Processor API, you can define arbitrary stream processors that process one received record at a time, and connect these processors with their associated state stores to compose the processor topology that represents a customized processing logic. Overview The Processor API can be used to implement both stateless as well as stateful operations, where the latter is achieved through the use of state stores.Naming Operators in a Streams DSL applicationhttps://kafka.apache.org/43/streams/developer-guide/dsl-topology-naming/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/43/streams/developer-guide/dsl-topology-naming/Developer Guide for Kafka Streams Naming Operators in a Kafka Streams DSL Application You now can give names to processors when using the Kafka Streams DSL. In the PAPI there are Processors and State Stores and you are required to explicitly name each one. At the DSL layer, there are operators. A single DSL operator may compile down to multiple Processors and State Stores, and if required repartition topics. But with the Kafka Streams DSL, all these names are generated for you.Data Types and Serializationhttps://kafka.apache.org/43/streams/developer-guide/datatypes/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/43/streams/developer-guide/datatypes/Every Kafka Streams application must provide Serdes (Serializer/Deserializer) for the data types of record keys and record values (e.g. java.lang.String) to materialize the data when necessary. Operations that require such Serdes information include: stream(), table(), to(), repartition(), groupByKey(), groupBy(). You can provide Serdes by using either of these methods, but you must use at least one: By setting default Serdes in the java.util.Properties config instance. By specifying explicit Serdes when calling the appropriate API methods, thus overriding the defaults.Testing a Streams Applicationhttps://kafka.apache.org/43/streams/developer-guide/testing/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/43/streams/developer-guide/testing/Testing Kafka Streams Importing the test utilities To test a Kafka Streams application, Kafka provides a test-utils artifact that can be added as regular dependency to your test code base. Example pom.xml snippet when using Maven: <dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams-test-utils</artifactId> <version>4.3.0</version> <scope>test</scope> </dependency> Testing a Streams application The test-utils package provides a TopologyTestDriver that can be used pipe data through a Topology that is either assembled manually using Processor API or via the DSL using StreamsBuilder.Interactive Querieshttps://kafka.apache.org/43/streams/developer-guide/interactive-queries/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/43/streams/developer-guide/interactive-queries/Interactive queries allow you to leverage the state of your application from outside your application. The Kafka Streams enables your applications to be queryable. The full state of your application is typically split across many distributed instances of your application, and across many state stores that are managed locally by these application instances. There are local and remote components to interactively querying the state of your application. Local state An application instance can query the locally managed portion of the state and directly query its own local state stores.Memory Managementhttps://kafka.apache.org/43/streams/developer-guide/memory-mgmt/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/43/streams/developer-guide/memory-mgmt/You can specify the total memory (RAM) size used for internal caching and compacting of records. This caching happens before the records are written to state stores or forwarded downstream to other nodes. The record caches are implemented slightly different in the DSL and Processor API. Record caches in the DSL You can specify the total memory (RAM) size of the record cache for an instance of the processing topology.Running Streams Applicationshttps://kafka.apache.org/43/streams/developer-guide/running-app/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/43/streams/developer-guide/running-app/You can run Java applications that use the Kafka Streams library without any additional configuration or requirements. Kafka Streams also provides the ability to receive notification of the various states of the application. The ability to monitor the runtime status is discussed in the monitoring guide. Starting a Kafka Streams application You can package your Java application as a fat JAR file and then start the application like this:Managing Streams Application Topicshttps://kafka.apache.org/43/streams/developer-guide/manage-topics/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/43/streams/developer-guide/manage-topics/A Kafka Streams application continuously reads from Kafka topics, processes the read data, and then writes the processing results back into Kafka topics. The application may also auto-create other Kafka topics in the Kafka brokers, for example state store changelogs topics. This section describes the differences these topic types and how to manage the topics and your applications. Kafka Streams distinguishes between user topics and internal topics. User topics User topics exist externally to an application and are read from or written to by the application, including:Streams Securityhttps://kafka.apache.org/43/streams/developer-guide/security/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/43/streams/developer-guide/security/Kafka Streams natively integrates with the Kafka’s security features and supports all of the client-side security features in Kafka. Streams leverages the Java Producer and Consumer API. To secure your Stream processing applications, configure the security settings in the corresponding Kafka producer and consumer clients, and then specify the corresponding configuration settings in your Kafka Streams application. Kafka supports cluster encryption and authentication, including a mix of authenticated and unauthenticated, and encrypted and non-encrypted clients.Application Reset Toolhttps://kafka.apache.org/43/streams/developer-guide/app-reset-tool/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/43/streams/developer-guide/app-reset-tool/You can reset an application and force it to reprocess its data from scratch by using the application reset tool. This can be useful for development and testing, or when fixing bugs. The application reset tool handles the Kafka Streams user topics (input, and output) and internal topics differently when resetting the application. Here’s what the application reset tool does for each topic type: Input topics: Reset offsets to specified position (by default to the beginning of the topic).Streams Rebalance Protocolhttps://kafka.apache.org/43/streams/developer-guide/streams-rebalance-protocol/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/43/streams/developer-guide/streams-rebalance-protocol/The Streams Rebalance Protocol is a broker-driven rebalancing system designed specifically for Kafka Streams applications. Following the pattern of KIP-848, which moved rebalance coordination of plain consumers from clients to brokers, KIP-1071 extends this model to Kafka Streams workloads. Overview Instead of clients computing new assignments on the client during rebalance events involving all members of the group, assignments are computed continuously on the broker. Instead of using a consumer group, the streams application registers as a streams group with the broker, which manages and exposes all metadata required for coordination of the streams application instances.Kafka Streams Groups Toolhttps://kafka.apache.org/43/streams/developer-guide/kafka-streams-group-sh/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/43/streams/developer-guide/kafka-streams-group-sh/Use kafka-streams-groups.sh to manage Streams groups for the Streams Rebalance Protocol (KIP‑1071): list and describe groups, inspect members and offsets/lag, reset or delete offsets for input topics, and delete groups (optionally including internal topics). Overview A Streams group is a broker‑coordinated group type for Kafka Streams that uses Streams‑specific RPCs and metadata, distinct from classic consumer groups. The CLI surfaces Streams‑specific states, assignments, and input‑topic offsets to simplify visibility and administration.Migrating from Streams Scala to Java APIhttps://kafka.apache.org/43/streams/developer-guide/scala-migration/Mon, 01 Jan 0001 00:00:00 +0000https://kafka.apache.org/43/streams/developer-guide/scala-migration/⚠️ DEPRECATION NOTICE: The kafka-streams-scala library is deprecated as of Kafka 4.3 and will be removed in Kafka 5.0. This guide will help you migrate your Scala applications to use the Java Streams API directly. For more information, see KIP-1244. Migration Overview The Java Streams API works well from Scala with minimal adjustments. The main differences are: Use Java types directly instead of Scala wrapper classes Configure Serdes explicitly via StreamsConfig or pass them to methods Example: Word Count Application Scala Wrapper Approach (Deprecated) import java.