Pre-PIP Discussion: Addressing Reorgs and Gas Spikes

Thanks for opening this for discussion.

It is a shame that Polygon network continues to be so unaware of the network conditions that it is effectively creating, due to the randomness of tx propagation and how this affects the behavior of MEV bots on the network.

I created this pull request nearly one year ago:

It was closed because apparently the Polygon team was already planning to address this, but there was never any update.

If you look at periods of high network congestion, you are guaranteed to find that much of it is due to competitive spam.

This doesn’t need to be handed off to a third party, for-profit company like Thogard’s PFL in order to fix. In fact, doing so goes completely against the open source ethos that Polygon has worked so hard to uphold.

Tl;dr there are some legacy design decisions and parameter values (like the tx_fetcher constants) that need to be addressed. They are not difficult to address, but so far nobody at Polygon has followed through with looking into this.

Please consider investigating these measures before proposing a hard fork.

2 Likes

@adamb thanks for this but I’m not sure why you would say that Polygon is ‘unaware’ of this issue when the PR you’re referencing was recently re-opened and acknowledged.

While this does seem to be a significant spam scenario it surely is not the only one and we have recently seen periods where other scenarios have created periods of high gas load.

Given this, it seems reasonable to prioritize changes that mitigate the overall negative effects of all spam, regardless of the cause. And while periods of spam do sometimes lead to full blocks, which can crowd out legitimate transactions, the most common negative effect is artificially high gas prices and occasional massive price spikes.

This is why it makes sense to prioritize changing the base gas price calculation above other mitigations.

Polygon will of course continue to research and address other issues that impact the performance and throughput of the system.

Thanks again.

2 Likes

@Thogard thanks again for your analysis on this issue.

The proposed change should have a positive effect on the predictability of gas price, if for no other reason than just keeping the price more stable. But I would not assert that this is the primary goal of the change.

Informally, I think of the rate of change in the calculation as governing the allowed capacity of price change over time / blocks. When EIP-1559 was initially implemented in Polygon, the rate was set to be the same as Ethereum since the bias is generally towards maximum compatibility with the rest of the ecosystem. But since Polygon produces blocks at 6X the rate of Ethereum and the calculation is exponential across blocks, the capacity for price change over time is greatly magnified. This is what allows for massive spikes that have a huge negative impact on the usability of the system.

Hope this helps.

2 Likes

Just one thing to add about reducing sprint length, this is the first step of introducing real, not probabilistic finality to V1 chain. @sandeep

Well, because it was closed because @ferranbt essentially promised that it was being worked on internally, and then nothing happened, and I had to pester folks to reopen it and take it seriously.

Go ask some users whether they have trouble getting their transactions included during peak network load. I hear users complaining about this pretty much every time we have an “event”, like price volatility or a NFT project hogging blockspace.

We started work on it. I believe, I am the person you could ask about it - a stuff engineer in V1+Edge.
THank you for your analysis. We decided to start with introducing a bunch of txpool related metrics, at the moment it’s not obvious what change and how will change the system - so we need a few key metrics to measure and optimize. As next steps we’re going to work on, I believe, 2 of them: txpool efficiency(ratio of normal to normal + spam transactions, and median/95/99/max time for a transaction to be included to the chain.

Thanks for your reply.

Can you come up with any reason why the sentry should NOT send a full transaction packet to the validator? Sending TransactionHashes only adds an unnecessary roundtrip.

There are things that need measurement and there are things that (IMO) are just common sense.

FastLane’s mechanism of action isn’t the removal of random propagation alone - it’s also the 250ms head start it gives to the MEV auction winner. That 250ms advantage greatly exceeds the variance in the p2p layer that’s leftover even when the direct vs announced mechanism is removed at the sentry level.

The direct-only PR alone would not solve the problem. It would help it, but it wouldn’t solve it.

FastLane solves it and gives validators MEV revenue. MEV-Bor would also solve it.

Outside of MEV, there is no way to fully remove the randomness in the p2p layer. It can be reduced, but not removed entirely. Furthermore, transaction announcements exist for a reason - to prevent redundant data transmission. Many validators run 4+ sentries. Your PR would increase the load on the validators by a non-trivial amount as the validator node would be blasted with a redundant copy of every full transaction from each of its sentries. (The FastLane sentry patch does the opposite - it announces all txs, meaning the validator only receives the full tx once. Gotta keep those data costs down :slight_smile: )

Regardless, it’s a fact that giving validators access to MEV revenue is inevitable. Whether it’s via validators responding to your DMs to set up a back-room MEV revenue split with your bot or via validators joining the FastLane protocol, most validators will start (or have already started) to receive MEV revenue. The FastLane path just has the extra benefits of solving the searcher spam issue and being non-harmful to users (no sandwich attacks, no front running, etc) .

Thank you for the response. Funny story - ironically enough, I think I was actually the first external (non-polygon) person to raise awareness on this issue and target the 12.5% as being inappropriate for polygon due to the block times. (You can search the polygon discord for the text “12.5%” to see my very informal complaining from before EIP-1559 went live.)

I understand the proposal and its mechanism for action - it slows down the baseFee’s ability to grow past a user’s transaction’s maxFeePerGas parameter. But my concern is still that pricing users out via rising baseFee is exactly what EIP-1559 is supposed to do, and allowing users to set a high maxFeePerGas to handle growing baseFees was a specific design feature for EIP-1559. This proposed solution is a good one that will help users… but I can’t help but feel like we’re putting duct tape on something that needs to be structurally changed.

If we take for granted that we want the chain to be fully compatible with Ethereum’s EIPs and that we can’t change EIP-1559 other than modifying some of the variables, then it seems to me that the real fix should be done at the wallet / gas price estimator level. A higher maxFeePerGas estimate from the user’s wallet will also help users avoid the scenario where baseFee exceeds the maxFeePerGas and their transaction gets priced out of block inclusion.

What program does metamask use when making a gasPrice estimate?

If they use bor’s (geth’s) built in estimator, I can submit a PR to the official bor repo by this weekend to fix it (smooth/boost the maxFeePerGas based on estimated time to tx inclusion in a block and accounting for the size of the mempool and the maxFeePerGas of txs at set thresholds to predict the steady baseFee rise due to regular usage). But I’m not sure if that’s what they use.

Is there someone at metamask we can reach out to for more information on their gas price estimation methodology?
(Edit: this would be in addition to your proposals, both of which I support)

You used to use “CPU load” as a reason to FUD against this change, now it’s about bandwidth. You realize that they’re all in the same data center, right? Lmfao.

This is just plain and simple FUD from someone who wants to build a monopoly.

The Lens Protocol Core Team is supportive of both changes.

We especially encourage validators to support changing the BaseFeeChangeDenominator from 8 to 16.

For services operating relays, the current 1559 configuration can cause the gas price to change must faster than most current systems can react, leading to increased uncertainty if a given transaction will need to have its transaction fee setting updated in order to be included within an acceptable timeframe.

This new reduced growth rate will bring the rate of change within the capabilities of current systems, and will provide a better UX for users using gasless systems such as Lens API, Biconomy, Gelato Relay and OpenZeppelin Defender.

Polygon PoS is already the blockchain of choice for applications that use these systems to make applications with the best end user experience, and we hope validators support these proposed changes to help Polygon PoS continue to grow its lead in this key area.

4 Likes

Just checking on this - if the reason is bandwidth usage, I can collect metrics and report back. I suspect it will be increased, but only slightly, and will only be between the sentries and validator which are almost certainly within the same intranet.

If the issue is cpu load, I can collect metrics too and I suspect it will be decreased.

To me this just seems like a very logical change so I’m unsure why it’s being opposed, with the exception of thogard’s opposition which is very obvious to me.

The last reviewer who worked at Polygon seemed to agree, and simply did not want to merge because he had other plans for the sentry/validator setup that seem like they have not come to fruition.

Fwiw. I also support the changes in this hard fork for all the reasons that others have listed

Fun fact - literally none of the seven validators currently active or pending w/ PFL are in the same data center.

CPU load is still a concern. As is bandwidth.

Re: monopoly - every single PFL node could crash and every validator using PFL would still continue to function just fine. That’s the whole point of our design.

Everyone here understands that higher MEV payments to validators means less money leftover for you. Your motive behind your PR is transparent. My motive is also transparent, but my solution actually fixes the problem and pays validators for doing so. Yours does neither - it simply continues to reward and incentivize spam (albeit to a lesser degree than before), places higher load on validators, and cuts them off from decentralized MEV revenue. But it leaves more money for you.

I am saying that validators are in the same d/c as their sentries. Is this not the case? I could care less whether they are in the same data center as each other, or as the PFL sentries.

Flashbots has done a great job with their vision of making MEV extraction open, transparent, and decentralized. Why do you insist on reinventing the wheel with such ineptitude?

In most scenarios, that is not the case.

Our system is significantly more open than pre-merge Flashbots was. It’s also ecosystem friendly and blocks sandwich attacks. Furthermore, a fork of flashbots was already tried on polygon. We learned from their challenges.

You know this. But again - you are the quintessential spammer who has the most to lose from our success.

  1. You benefit from the problem that we are setting out to solve (incentivization of spam).
  2. You benefit from validators not receiving MEV revenue
  3. You haven’t researched or tested any of your proposals

So forgive me for not taking your feedback on PFL seriously.

1 Like

Does anyone not affiliated with PFL care to comment?

Thank you for your comments @adamb @Thogard. I think the debate around PFL / @adamb’s PR could warrant its own post. For now, I don’t think this conversation should be continued in this thread as it is concerning the SprintLength and BaseFeeChangeDenominator changes put forward.

1 Like

@LensCoreTeam thanks so much for the feedback.

1 Like