Skip to content

Execution Worker unexpectedly timeouted #10670

@luizv

Description

@luizv

Is there an existing issue?

  • I have searched the existing issues

Experiencing problems? Have you tried our Stack Exchange first?

  • This is not a support question.

Description of bug

Yesterday my validator incurred in 4 offenses related to Execution Worker hitting lenient timeout, leading to multiple disputes for voting invalid a valid candidate. Same issue as of #10121.

What happened
During Polkadot's session 12175, my validator luizv/StakeHarvest/v2, triggered 7 disputes regarding invalid voting against valid. It lead to 4 offenses on blocks 29084281 (3 times) and 29084282 (1 time). After 4 offenses over two blocks, my validator was off-chain disabled for 1 session.

Logs
This spammed a couple times before set it for off-chain disabled.

1765831076.164627 dot polkadot[4071360]: 2025-12-15 17:37:56 execution worker exceeded lenient timeout for execution, child worker likely stalled worker_pid=4081918 validation_code_hash=0x8835cd86a709e1491dccb7562a391e6f7f45f79ee6726b309235d9b0225052e0

1765831076.164627 dot polkadot[4071360]: 2025-12-15 17:37:56 execution worker concluded, error occurred: candidate validation: invalid: hard timeout artifact_id=ArtifactId { code_hash: 0x8835cd86a709e1491dccb7562a391e6f7f45f79ee6726b309235d9b0225052e0, executor_params_prep_hash: 0x50b16a713c0f8774ba3a0757c722b48c6eb4c1972f6e58c54fd0851f21a08633 } worker=Worker(1v1) worker_rip=true

1765831076.164627 dot polkadot[4071360]: 2025-12-15 17:37:56 Failed to validate candidate para_id=Id(3369) candidate_hash=0x28a6afca5f97a03efc20eec7edb79838d49cd1587a79a59a7dd01465a0f27243 error=Invalid(HardTimeout) traceID=54034606565267263698501495153445279800

1765831076.164627 dot polkadot[4071360]: 2025-12-15 17:37:56 Detected invalid candidate as an approval checker. reason=Timeout candidate_hash=0x28a6afca5f97a03efc20eec7edb79838d49cd1587a79a59a7dd01465a0f27243 para_id=Id(3369) traceID=54034606565267263698501495153445279800

1765831076.164627 dot polkadot[4071360]: 2025-12-15 17:37:56 New dispute initiated for candidate. candidate_hash=0x28a6afca5f97a03efc20eec7edb79838d49cd1587a79a59a7dd01465a0f27243 session=12175 traceID=54034606565267263698501495153445279800

Steps to reproduce

No idea how to reproduce it, but

Operational Context

  • The disputes occurred during node normal operation.

  • The problem occurred around 45 minutes after a successful node restart.

  • This previous restart was due to changing connection from gateway (4 links, load-balanced) to a single router connected directly, with occurred also with no issues.

  • Restart was successful and it operated well for ~40 minutes. Other 3 nodes with same configs didn't presented problems.

  • Also no power issues.

  • No CPU nor memory issues monitored over prometheus.

  • My hardware is really performant and my node was performing really well with A+ rating on turboflakes.

  • Run two nodes with this server, a KSM and a DOT. The DOT was the one that incurred on the offenses.

  • Specs:

💻 Version: latest, 1.20.2
💻 CPU architecture: x86_64
💻 Target environment: gnu
💻 CPU: INTEL(R) XEON(R) GOLD 6526Y
💻 CPU cores: 16
💻 Memory: 63554MB
💻 Kernel: 6.12.0-55.24.1.el10_0.x86_64
💻 Linux distribution: Red Hat Enterprise Linux 10.0 (Coughlan)

Metadata

Metadata

Assignees

No one assigned

    Labels

    I10-unconfirmedIssue might be valid, but it's not yet known.I2-bugThe node fails to follow expected behavior.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions