Interface StateStore

All Known Subinterfaces:
KeyValueStore<K,V>, SessionStore<K,AGG>, SessionStoreWithHeaders<K,AGG>, TimestampedKeyValueStore<K,V>, TimestampedKeyValueStoreWithHeaders<K,V>, TimestampedWindowStore<K,V>, TimestampedWindowStoreWithHeaders<K,V>, VersionedBytesStore, VersionedKeyValueStore<K,V>, WindowStore<K,V>

public interface StateStore
A storage engine for managing state maintained by a stream processor.

If the store is implemented as a persistent store, it must use the store name as directory name and write all data into this store directory. The store directory must be created with the state directory. The state directory can be obtained via #stateDir() using the ProcessorContext provided via init(...).

Using nested store directories within the state directory isolates different state stores. If a state store would write into the state directory directly, it might conflict with others state stores and thus, data might get corrupted and/or Streams might fail with an error. Furthermore, Kafka Streams relies on using the store name as store directory name to perform internal cleanup tasks.

This interface does not specify any query capabilities, which, of course, would be query engine specific. Instead, it just specifies the minimum functionality required to reload a storage engine from its changelog as well as basic lifecycle management.

  • Method Details

    • name

      String name()
      The name of this store.
      Returns:
      the storage name
    • init

      void init(StateStoreContext stateStoreContext, StateStore root)
      Initializes this state store.

      The implementation of this function must register the root store in the stateStoreContext via the StateStoreContext.register(StateStore, StateRestoreCallback, CommitCallback) function, where the first StateStore parameter should always be the passed-in root object, and the second parameter should be an object of user's implementation of the StateRestoreCallback interface used for restoring the state store from the changelog.

      Note that if the state store engine itself supports bulk writes, users can implement another interface BatchingStateRestoreCallback which extends StateRestoreCallback to let users implement bulk-load restoration logic instead of restoring one record at a time.

      Throws:
      IllegalStateException - If store gets registered after initialized is already finished
      StreamsException - if the store's change log does not contain the partition
    • flush

      @Deprecated default void flush()
      Deprecated.
      Flush any cached data
    • commit

      default void commit(Map<TopicPartition,Long> changelogOffsets)
      Commit all written records to this StateStore.

      This method CANNOT be called by users from processors. Doing so will throw an UnsupportedOperationException.

      Instead, users should call ProcessorContext#commit() to request a Task commit.

      If managesOffsets() returns true, the given changelogOffsets will be guaranteed to be persisted to disk along with the written records.

      changelogOffsets will usually contain a single partition, in the case of a regular StateStore. However, they may contain multiple partitions in the case of a Global StateStore with multiple partitions. All provided partitions MUST be persisted to disk.

      Implementations SHOULD ensure that changelogOffsets are committed to disk atomically with the records they represent, if possible.

      Empty map: If changelogOffsets is empty, implementations that manage offsets MUST remove all previously committed offsets. After an empty commit, committedOffset(TopicPartition) should return null for all partitions. This is used during corruption recovery to clear stale offsets so that restoration can restart from the beginning.

      Null values: If a value in changelogOffsets is null, implementations that manage offsets MUST remove the committed offset for that partition. After such a commit, committedOffset(TopicPartition) should return null for the affected partition.

      Parameters:
      changelogOffsets - The changelog offset(s) corresponding to the most recently written records. An empty map signals that all committed offsets should be cleared. A null value for a partition signals that its committed offset should be removed.
    • committedOffset

      default Long committedOffset(TopicPartition partition)
      Returns the most recently committed offset for the given TopicPartition.

      If managesOffsets() and persistent() both return true, this method will return the offset that corresponds to the changelog record most recently written to this store, for the given partition.

      Parameters:
      partition - The partition to get the committed offset for.
      Returns:
      The last committed offset for the partition; or null if no offset has been committed for the partition, or if either persistent() or managesOffsets() return false.
    • close

      void close()
      Close the storage engine. Note that this function needs to be idempotent since it may be called several times on the same state store.

      Users only need to implement this function but should NEVER need to call this api explicitly as it will be called by the library automatically when necessary

    • persistent

      boolean persistent()
      Return if the storage is persistent or not.
      Returns:
      true if the storage is persistent—false otherwise
    • managesOffsets

      @Deprecated default boolean managesOffsets()
      Deprecated.
      New implementations should always return true and manage their own offsets. In the future, this method will be removed and it will be assumed to always return true.
      Determines if this StateStore manages its own offsets.

      If this method returns true, then offsets provided to commit(Map) will be retrievable using committedOffset(TopicPartition).

      If this method returns false, offsets provided to commit(Map) will be ignored, and committedOffset(TopicPartition) will be expected to always return null.

      This method is provided to enable custom StateStores to opt-in to managing their own offsets. This is required, to ensure that custom StateStores provide the consistency guarantees that Kafka Streams expects when operating under an exactly-once processing.guarantee.

      New implementations are required to implement this method and return true. Existing implementations should upgrade to managing their own offsets as soon as possible, as the legacy offset management is deprecated and will be removed in a future version.

      Returns:
      Whether this StateStore manages its own offsets.
    • isOpen

      boolean isOpen()
      Is this store open for reading and writing
      Returns:
      true if the store is open
    • query

      @Evolving default <R> QueryResult<R> query(Query<R> query, PositionBound positionBound, QueryConfig config)
      Execute a query. Returns a QueryResult containing either result data or a failure.

      If the store doesn't know how to handle the given query, the result shall be a FailureReason.UNKNOWN_QUERY_TYPE. If the store couldn't satisfy the given position bound, the result shall be a FailureReason.NOT_UP_TO_BOUND.

      Note to store implementers: if your store does not support position tracking, you can correctly respond FailureReason.NOT_UP_TO_BOUND if the argument is anything but PositionBound.unbounded(). Be sure to explain in the failure message that bounded positions are not supported.

      Type Parameters:
      R - The result type
      Parameters:
      query - The query to execute
      positionBound - The position the store must be at or past
      config - Per query configuration parameters, such as whether the store should collect detailed execution info for the query
    • getPosition

      @Evolving default Position getPosition()
      Returns the position the state store is at with respect to the input topic/partitions