Worker Lifecycle
Each worker connects to the actual leader broker with a single TCP connection over which are multiplexed deliveries of messages.
If the TCP connections goes down for any reason the node must look for a new leader broker and retry the connection.
When a node boots it created an unique id ("process id").
From the broker point of view if a node does not reconnect within a configurable amount of time it is considered dead, and the actual worker process id cannot be used any more.
Status of a Workers
A Worker could be in the following states:
- CONNECTED: a valid TCP connection is active
- DISCONNECTED: no TCP connection is active, but there are tasks assigned to the node
- DEAD: no connection is present for a long time
When a Node transitions from the DISCONNECTED to the DEAD status then recovery is scheduled for each task present on the node (based on the retry policy of the task).
