From Kerrighed

Definitions

  • owner or manager: the node owning the "master" copy (state WRITE_OWNER or READ_OWNER). This node is the last one who did a grab on an object and is the node used to serialize accesses to an object.
  • prob owner: this is the node we believe to be the owner. Have a look at this paper to know more about this and about prob owner chains.
  • default owner: this is the node used to initialize the prob owner. All nodes do agree an a given default owner for an object. It can change depending on the object and the set. To avoid useless communications, we try to define default owner as the node who will do the first access to an object.

States

  • INV_COPY: the object is not present. The local node is not the manager of the object. This is the default first state for an object on a non manager node.
  • INV_OWNER: the object is not present. The local node is the manager of the object. This is the default first state for an object on the manager node.
  • INV_FILLING: the object is not present. There exist no copy cluster wide and we are doing a first touch on the local node. The object is in this state during the time the object is filled with data. This state only make sense when the filling of the object can block (a read on disk for instance).
  • READ_COPY: the object is present. The local node is not the manager. The object can only be read.
  • READ_OWNER: the object is present. The local node is the manager. The object can only be read.
  • WRITE_OWNER: the object is present. The local node is the manager. The object can be read and write. When an object is in this state, there is no other copies of this object cluster wide.
  • WRITE_GHOST: this state is used for an object present but not used locally. Mainly this is the case after a flush. You move an object from one node to another node which does not use the object. This state is mainly used for some optimisations.
  • WAIT_ACK_INV: the object is present but cannot be accessed. The local node sent a write request to the manager. The manager sent invalidation requests to copy holders and sent an object copy to the local node. The local node then switch to WAIT_ACK_INV state to wait for invalidation ACKs. When all the ACK are received, the local node switch to WRITE_OWNER state.
  • WAIT_ACK_WRITE: the object is present. The local node is the manager. The object was in READ_OWNER state and the local node asked for write access on the object. The local node has sent invalidation requests to copy holders and is now waiting for acknowledgements.
  • WAIT_OBJ_READ: the object is not present. The local node is not the manager. The local node asked for a read access to an object. A read access request has been sent to the manager. The node is now waiting for a read copy.
  • WAIT_OBJ_WRITE: the object could be present but read-only. The local node is not the manager. The local node asked for a write access to an object. A write access request has been sent to the manager. The node is now waiting for a write copy.
  • WAIT_CHG_OWN_ACK: the object is present. The local node was the manager and asked to flush the object. A copy of the object has been sent to a remote node which will become the new owner and the local node is now waiting for an acknowledgement.
  • WAIT_OBJ_RM_ACK: is quite close to WAIT_ACK_INV. When the manager of a object receive a global remove request, it sends local remove requests to all nodes, switch to WAIT_OBJ_RM_ACK and wait for the acks.
  • WAIT_OBJ_RM_ACK2: when a node which is the default owner receives an object remove request, the object switch to this state until all copies have been removed and the object manager acknowledge that the cluster wide removal is over. This serializes the removal of an object and a potential concurrent first touch on the same object.
  • WAIT_OBJ_RM_DONE: when a global remove request is made by a node which is not the manager (through the ctnr_remove_object function), the node sends a global remove request to the manager switch to the WAIT_OBJ_REMOVE and wait for the manager ACK.
  • INV_NO_COPY: