Worker Lifecycle

Each worker connects to the actual leader broker with a single TCP connection over which are multiplexed deliveries of messages.

If the TCP connections goes down for any reason the node must look for a new leader broker and retry the connection.

When a node boots it created an unique id ("process id").

From the broker point of view if a node does not reconnect within a configurable amount of time it is considered dead, and the actual worker process id cannot be used any more.

Status of a Workers

A Worker could be in the following states:

  • CONNECTED: a valid TCP connection is active
  • DISCONNECTED: no TCP connection is active, but there are tasks assigned to the node
  • DEAD: no connection is present for a long time

When a Node transitions from the DISCONNECTED to the DEAD status then recovery is scheduled for each task present on the node (based on the retry policy of the task).