Quick summary
- You may have been experiencing some issues with the Polygon POS chain including errors with Public RPCs and increased reorgs.
- This is the highest priority for the Polygon POS team, and we’re actively working on short-term and long-term mitigation strategies to resolve these issues. We appreciate your patience.
Details
The Polygon PoS team is actively investigating a series of issues and we appreciate the community’s support in bringing these to our attention. You may be currently experiencing some issues with Polygon PoS performance while the team works on these problems, including:
- Occasional RPC failure: Public RPCs may be under stress and occasionally fail. To prevent this, we recommend trying multiple RPC nodes, since it might be possible that public RPCs might be under stress and are not synced to the latest canonical block.
- Frequent reorgs: You might face frequent non-deep reorgs. We have been actively monitoring and ensuring all validators have reduced latency on the network using BDN (BloxRoute Network), and ensuring other full nodes have the correct view of the chain. We are also working with the Etherscan/Polygonscan team to make sure their nodes are connected with the network with minimum latency.
Investigating the issues and with the help of the community, the team has arrived at the following observations so far:
- In the last few days, the mempool size on Polygon PoS has been increasing rapidly on nodes and has been causing high memory issues. That caused a delay in block processing and a few nodes were going out-of-sync, including RPC nodes.
- With nodes processing a high load of transactions from the mempool, for a few nodes the average block time has increased to ~2.7-3 seconds. That has caused backup block producers (BPs) to kick in and start producing the blocks whenever the primary block producer missed the block time deadline. As multiple producers started creating blocks at the same height, the reorg frequency increased in the last few days. Please refer to the backup BP mechanism and wiggle for more context on the backup block producer design.
Mitigation Plan
We’re currently working through both short-term and long-term ways to mitigate these issues. A few short term mitigations:
- Additional full nodes have been added to the polygon-rpc.com aggregator to ensure redundancy of RPC nodes for users
- Experiments with increased block time and wiggle time on Mumbai testnet to account for delays in block processing. Refer to this forum post for more details.
- We’re investigating and making code fixes for the mempool size and quota violation issues.
A few notes on the the mid to long term plan:
- Based on the results of experiments to fix this on the Mumbai testnet, the team will propose the relevant configuration changes to the community and implement the ones that solve the issue.
- Over the last six months, the Polygon PoS team has been integrating the learnings from these issues to work on a significant upgrade to the architecture of the chain. This update may include features including a simplified architecture to reduce communications complexity between nodes, a superior consensus approach, simplified staking, bridge component design, and more. More information will come soon as this effort progresses.
Please rest assured these current issues are the highest priority for the Polygon PoS team and we’ll keep you updated as soon as we mitigate these issues and use these learnings to strengthen the chain in the future.
Update as of Mar 2nd, 3pm EST:
We’re currently achieving performance improvements across the board as we work on the following issues towards a goal of restoring the network to its optimal state:
-
RPC: One major issue has been solved, and we’re solving an undiagnosed growth issue by reducing the txn pool size.
- We have addressed an issue where gas prices were not matching between RPC nodes and validators, which was causing instability.
- We’re tracking RPC nodes (memory use and total activity) more tightly to catch emerging issues in our own nodes and take corrective actions.
- There is an as yet undiagnosed growth issue being profiled and investigated, and this is the reason we’ve reduced the current txn pool size. Once we solve this we’ll return to normal.
-
Reorg issues: Proposing block time increase
- We’ve proposed a block interval and wiggle time increase on the forum which will solve the reorg issues.
-
Transaction confirmation time: Few dapps have reported that they are experiencing an increase in tx confirmation time.
- We have seen transaction confirmation time decrease in last few days, but we have to dig deeper into this problem to ensure it decreases further and comes back to normal range.
- Due to the decentralized nature of the network, we can’t do much on the Bor side. However, we are trying to reduce transaction sync latency for major RPC providers like Alchemy.
- One possible short term solution is to increase the fees for your transactions.