-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Is there an existing issue?
- I have searched the existing issues
Experiencing problems? Have you tried our Stack Exchange first?
- This is not a support question.
Description of bug
Yesterday my validator incurred in 4 offenses related to Execution Worker hitting lenient timeout, leading to multiple disputes for voting invalid a valid candidate. Same issue as of #10121.
What happened
During Polkadot's session 12175, my validator luizv/StakeHarvest/v2, triggered 7 disputes regarding invalid voting against valid. It lead to 4 offenses on blocks 29084281 (3 times) and 29084282 (1 time). After 4 offenses over two blocks, my validator was off-chain disabled for 1 session.
Logs
This spammed a couple times before set it for off-chain disabled.
1765831076.164627 dot polkadot[4071360]: 2025-12-15 17:37:56 execution worker exceeded lenient timeout for execution, child worker likely stalled worker_pid=4081918 validation_code_hash=0x8835cd86a709e1491dccb7562a391e6f7f45f79ee6726b309235d9b0225052e0
1765831076.164627 dot polkadot[4071360]: 2025-12-15 17:37:56 execution worker concluded, error occurred: candidate validation: invalid: hard timeout artifact_id=ArtifactId { code_hash: 0x8835cd86a709e1491dccb7562a391e6f7f45f79ee6726b309235d9b0225052e0, executor_params_prep_hash: 0x50b16a713c0f8774ba3a0757c722b48c6eb4c1972f6e58c54fd0851f21a08633 } worker=Worker(1v1) worker_rip=true
1765831076.164627 dot polkadot[4071360]: 2025-12-15 17:37:56 Failed to validate candidate para_id=Id(3369) candidate_hash=0x28a6afca5f97a03efc20eec7edb79838d49cd1587a79a59a7dd01465a0f27243 error=Invalid(HardTimeout) traceID=54034606565267263698501495153445279800
1765831076.164627 dot polkadot[4071360]: 2025-12-15 17:37:56 Detected invalid candidate as an approval checker. reason=Timeout candidate_hash=0x28a6afca5f97a03efc20eec7edb79838d49cd1587a79a59a7dd01465a0f27243 para_id=Id(3369) traceID=54034606565267263698501495153445279800
1765831076.164627 dot polkadot[4071360]: 2025-12-15 17:37:56 New dispute initiated for candidate. candidate_hash=0x28a6afca5f97a03efc20eec7edb79838d49cd1587a79a59a7dd01465a0f27243 session=12175 traceID=54034606565267263698501495153445279800
Steps to reproduce
No idea how to reproduce it, but
Operational Context
-
The disputes occurred during node normal operation.
-
The problem occurred around 45 minutes after a successful node restart.
-
This previous restart was due to changing connection from gateway (4 links, load-balanced) to a single router connected directly, with occurred also with no issues.
-
Restart was successful and it operated well for ~40 minutes. Other 3 nodes with same configs didn't presented problems.
-
Also no power issues.
-
No CPU nor memory issues monitored over prometheus.
-
My hardware is really performant and my node was performing really well with A+ rating on turboflakes.
-
Run two nodes with this server, a KSM and a DOT. The DOT was the one that incurred on the offenses.
-
Specs:
💻 Version: latest, 1.20.2
💻 CPU architecture: x86_64
💻 Target environment: gnu
💻 CPU: INTEL(R) XEON(R) GOLD 6526Y
💻 CPU cores: 16
💻 Memory: 63554MB
💻 Kernel: 6.12.0-55.24.1.el10_0.x86_64
💻 Linux distribution: Red Hat Enterprise Linux 10.0 (Coughlan)