PROPOSAL: Validator Performance Management

Executive Summary

In its current state, the PoS validator network is largely permissioned in that the previously-selected set of validators during testnet has largely persisted. Following the spirit of gradual decentralization, certain steps need to be taken with the end state being the validators assuming care for the network.

To arrive at a state of decentralized self-governance, the validators will have to self-regulate network participation to an agreed set of parameters.

The Polygon Governance Team wants to now propose, discuss, and gather consensus around a framework seeking to aid that aim.

Introduction

Self-regulation in this context refers to the setting and administration of conditions for the admission, participation, and as the case may be, the forced exit of validators from the “club” - with the last part formalizing previously-achieved consensus on the subject.

This includes setting parameters in a fair, transparent, and self-enforcing standard for:

  • measuring performance
  • compliance with conditions of participation
  • choosing and acting on remedial measures, and
  • actions to address breaches of compliance if and when they arise

The purpose of this initial proposal is to invite and assemble the wisdom of the validators and to collectively arrive at answers to questions that will eventually lead to a state of network self-governance.

The Preliminary Path to Self-Regulation

The Performance Management Proposal is divided into two parts. Part A proposes the preliminary parameters for network monitoring. Part B proposes the preliminary standards for remedial action for non-compliance with the standards proposed in Part A, up to and including a forced exit of a validator node from the network through the unbonding of their stake.

Part A: Network monitoring

The aim of Part A is to develop a fair framework to manage validator performance through a self-enforced performance standard across the network.

There could be reasons both technical or social, that could lead to temporary conditions where validator nodes are underperforming from the common standard. To be a fair process, a process that leads to the forced exit of a validator for technical underperformance should accommodate these realities and should be approached with some caution.

Q1) In a self-governed network, what are the parameters for the technical performance measurement of a validator node?

The proposed parameters are:

  • checkpoints signed expressed as a percentage, and
  • a time interval over which the checkpoint compliance, once established, is measured.

Q2) In a self-governed network, what is a non-compliant validator?

The proposed parameters are:

  • less than 98% checkpoints signed, and
  • measured over a continuous 14 day interval.

Q3) In a self-governed network, how should the technical performance measurement period be monitored?

The proposed parameters are:

  • initially, to manually monitor performance on the Polygon Web Wallet v2 page; and
  • in a future proposal, establish an automated Performance Deficiency Report (“PDR”) that measures performance over the 14 day interval.

Q4) In a self-governed network, who should be responsible to monitor the technical performance of validators?

The proposed parameters are:

  • initially, the Performance Monitor (“PM”) will continue to be members of the polygon team; and
  • in a future proposal, the validators will transition to assume the responsibility for the PM and self-monitor performance.

Q5) In a self-governed network, how should non-compliance with the performance standard be recognized and communicated to a validator operator? And by whom?

The proposed parameters are:

  • the PM will maintain a call-out list of the validator node operators contact information;
  • Validator node operators will have a positive responsibility to keep their contact information accurate and up to date;
  • the PM will periodically test the call-out list to confirm its currency;
  • on the generation of a PDR, the PM will communicate a Notice of Deficiency (“NOD”) directly to the delinquent validator node operator by the means recorded in the call-out list; and
  • in a future proposal, the NOD will be automatically self-generated and delivered to the delinquent validator.

Before moving to the next step of adoption of the parameters by a vote of the validator community, here is a poll to gather a measure of soft consensus.

  • Yes - I agree with the parameters in Part A
  • Yes, I agree but see my comments below for consideration for inclusion
  • No- I do not agree with the parameters in Part A - see my comments below

0 voters

Part B - Remedial measures and corrective action

The aim of Part B is to develop a fair framework to manage validator performance through a self-enforced performance standard across the network that incorporates the technical performance parameters from Part A, and additionally incorporate remedial measures for underperformance, up to and including the forced exit of validators by unbonding their stake.

The health of the validator network is connected to its efficiency, and its efficiency is connected to validator checkpoints and validator communications. When a validator is offline, or does not respond to communications when prompted, this can have an adverse effect on the network and by extension it can affect the success of the other members of the validator community.

In a prior post Off-boarding Offline Validator, the community already expressed a preference for using a multi-sig kick mechanism. When triggered, this would unbond the stake of a validator if the occasion was necessary. This proposal is to establish a consensus across the validator community of what the parameters should be and what qualifies as the “occasion” to unbond the stake of a validator. The above post also describes the technical implementation of validator offboarding.

Q6) In a self-governed network, what is the process for a remedial response to a non-compliant validator?

The proposed parameters are:

  • the Grace Period (“GP”) is 7 days;
  • on issuance of a Notice of Deficiency (“NOD”) from the Performance Monitor (“PM”) the operator will have a grace period to correct the deficiency noted in the NOD;
  • if the deficiency is corrected within the GP there is no further action;
  • if the deficiency is not corrected within the initial GP, then the delinquent validator will be issued a Final Notice (“FN”) of the intent of the community to implement a forced exit procedure by offboarding the validator from the network by unbonding their stake.
  • the FN is followed by a second GP.
  • if the deficiency is corrected within the second GP there is no further action.
  • if the deficiency is not corrected at the end of the second GP, the validator’s stake will be unbonded and the validator will be off-boarded from the network.

Before moving to the next step of adoption of the parameters by a vote in the validator community, here is a poll to gather a measure of soft consensus.

  • Yes- I agree with the parameters in Part B
  • Yes- I agree but see my comments below for consideration and inclusion
  • No - I do not agree with the parameters in Part B - see my comments below

0 voters

Conclusion

In summary, under the proposed parameters a validator operator who has been underperforming the common standard for 14 consecutive days will have a second 14 day period to correct the deficiency before a process to unbound their stake is implemented.

The parameters in Part A and Part B are proposals for ideation by the validator community specifically and the Polygon community at large. Suggestions for suitable alternate parameters are invited and encouraged during the incubation period to adjust the proposals into a consensus prior to moving to the next step of adoption of the parameters by a community vote. Once adopted, the framework will allow further decentralization of the network by means of validator self-regulation.

3 Likes

Our concern with part A only being 14 days to be 98% checkpoints signed or better, is that it takes the system time to catch up and get a validator back up to 100% checkpoints signed. Perhaps a little more wiggle room is needed, like 17 days.

Our belief is that Part B should allow a validator more time to comply, like 10 days.