Investigating Reorg and RPC Issues on Polygon POS

delroy · February 25, 2022, 6:21pm

Quick summary

You may have been experiencing some issues with the Polygon POS chain including errors with Public RPCs and increased reorgs.
This is the highest priority for the Polygon POS team, and we’re actively working on short-term and long-term mitigation strategies to resolve these issues. We appreciate your patience.

Details

The Polygon PoS team is actively investigating a series of issues and we appreciate the community’s support in bringing these to our attention. You may be currently experiencing some issues with Polygon PoS performance while the team works on these problems, including:

Occasional RPC failure: Public RPCs may be under stress and occasionally fail. To prevent this, we recommend trying multiple RPC nodes, since it might be possible that public RPCs might be under stress and are not synced to the latest canonical block.
Frequent reorgs: You might face frequent non-deep reorgs. We have been actively monitoring and ensuring all validators have reduced latency on the network using BDN (BloxRoute Network), and ensuring other full nodes have the correct view of the chain. We are also working with the Etherscan/Polygonscan team to make sure their nodes are connected with the network with minimum latency.

Investigating the issues and with the help of the community, the team has arrived at the following observations so far:

In the last few days, the mempool size on Polygon PoS has been increasing rapidly on nodes and has been causing high memory issues. That caused a delay in block processing and a few nodes were going out-of-sync, including RPC nodes.
With nodes processing a high load of transactions from the mempool, for a few nodes the average block time has increased to ~2.7-3 seconds. That has caused backup block producers (BPs) to kick in and start producing the blocks whenever the primary block producer missed the block time deadline. As multiple producers started creating blocks at the same height, the reorg frequency increased in the last few days. Please refer to the backup BP mechanism and wiggle for more context on the backup block producer design.

Mitigation Plan

We’re currently working through both short-term and long-term ways to mitigate these issues. A few short term mitigations:

Additional full nodes have been added to the polygon-rpc.com aggregator to ensure redundancy of RPC nodes for users
Experiments with increased block time and wiggle time on Mumbai testnet to account for delays in block processing. Refer to this forum post for more details.
We’re investigating and making code fixes for the mempool size and quota violation issues.

A few notes on the the mid to long term plan:

Based on the results of experiments to fix this on the Mumbai testnet, the team will propose the relevant configuration changes to the community and implement the ones that solve the issue.
Over the last six months, the Polygon PoS team has been integrating the learnings from these issues to work on a significant upgrade to the architecture of the chain. This update may include features including a simplified architecture to reduce communications complexity between nodes, a superior consensus approach, simplified staking, bridge component design, and more. More information will come soon as this effort progresses.

Please rest assured these current issues are the highest priority for the Polygon PoS team and we’ll keep you updated as soon as we mitigate these issues and use these learnings to strengthen the chain in the future.

Update as of Mar 2nd, 3pm EST:

We’re currently achieving performance improvements across the board as we work on the following issues towards a goal of restoring the network to its optimal state:

RPC: One major issue has been solved, and we’re solving an undiagnosed growth issue by reducing the txn pool size.
- We have addressed an issue where gas prices were not matching between RPC nodes and validators, which was causing instability.
- We’re tracking RPC nodes (memory use and total activity) more tightly to catch emerging issues in our own nodes and take corrective actions.
- There is an as yet undiagnosed growth issue being profiled and investigated, and this is the reason we’ve reduced the current txn pool size. Once we solve this we’ll return to normal.
Reorg issues: Proposing block time increase
- We’ve proposed a block interval and wiggle time increase on the forum which will solve the reorg issues.
Transaction confirmation time: Few dapps have reported that they are experiencing an increase in tx confirmation time.
- We have seen transaction confirmation time decrease in last few days, but we have to dig deeper into this problem to ensure it decreases further and comes back to normal range.
- Due to the decentralized nature of the network, we can’t do much on the Bor side. However, we are trying to reduce transaction sync latency for major RPC providers like Alchemy.
- One possible short term solution is to increase the fees for your transactions.

andonis · February 27, 2022, 9:35am

Hi,

I remarked that the maxpeers number in the start.sh file has a big impact on the CPU.

A simple solution, for validator with a not so powerful CPU, will be to reduce the maxpeers number to 50 or 100. They will relie more on the BND network (bloxroute). It is more relevant for validator with sentry and validator on the same machine. What do you think ?

Infinityvoid.io · March 1, 2022, 5:21am

@delroy
Any estimated time for this? People aren’t able to buy/sell NFTs on polygon network. It’s impacting the sales of the projects on the polygon network.

Looking forward to your reply

wethepeoplenft · March 3, 2022, 5:21am

The most pivotal thing that needs to be addressed is bridging your polygon on the website.
-says insufficient funds when you have enough eth to pay the gas
-just spins

ssharma · March 3, 2022, 12:57pm

I have tried increasing fees for my transaction. I am trying to deploy a contract on the mumbai test net, but it’s still failing. I am getting the error transaction underpriced. Any suggestions on what I might be missing?

iansh · March 3, 2022, 3:07pm

I’m getting the same thing on Mumbai. Any transaction returns as underpriced…

xtremetom · March 7, 2022, 12:15pm

I was able to work on testnet by changing providers

3blksonpoly · March 9, 2022, 8:10am

Could you please provide more details? I haven’t had any luck in deploying my smart contracts since 2 days. Cheers!

xtremetom · March 9, 2022, 10:46am

I believe I ran into most issues on Infura, I swapped to Alchemy and Mumbai deployments are ok. There was an issues last night (12hrs ago) where transactions just weren’t getting into the Alchemy mempool. However, I have since been able to deploy without issue

JH8080 · March 10, 2022, 2:01pm

Hi anyone experiencing issues depositing from Eth to Polygon via the bridge?
I deposited some funds earlier today. The transaction status shows ‘success’ on Etherscan, but my WETH on the polygon chain is not displaying. It’s already been 5 hours

andonis · March 10, 2022, 2:27pm

There is a bug in the bridge. May be the last update of heimdall is going to fix it (https://github.com/maticnetwork/heimdall/pull/781, may be not.

JH8080 · March 10, 2022, 2:32pm

Thank you for the reply! So my Eth shouldn’t be missing right? Because it’s been a few hours. But thanks for the reply!

andonis · March 10, 2022, 2:52pm

May be, your eth will be missing