Skip to content

NUCLEO_F429ZI occassionally fails to set up ethernet #9470

@michalpasztamobica

Description

@michalpasztamobica

Description

Target: NUCLEO_F429ZI
Toolchain: any
Tools:
mbed-cli 1.8.3
mbed-os @ af52c30 -< this is today's latest, but I also tried with release 5.11.1, 5.10.3 and 5.9.4

Steps to reproduce:

The issue is rather hard to reproduce - which is probably why it went unnoticed for so long.

Basically, any repeated loop of:

  1. setting the ethernet up on NUCLEO_F429ZI
  2. resetting the board without setting ethernet down
    Will eventually lead to step (1) failing, due to PHY_LINKED_STATUS flag not getting set in the PHY_BSR (Transceiver Basic Status Register).

We sometimes (once in a 100 runs) see this happen in features-netsocket-* greentea tests of mbed-os. The most recent example is NUCLEO_F429ZI-IAR.mbed-os-tests-netsocket-dns.mbed-os-tests-netsocket-dns failing with:

[1547720172.68][CONN][RXD] :156::FAIL: Expected 0 Was -3004
[1547720172.75][CONN][RXD] >>> failure with reason 'Assertion Failed' during 'Test Setup Handler'

(-3004 stand for NSAPI_ERROR_NO_CONNECTION)

We see this much more often with our icetea tests, as they reset the board before every test case and the inability to connect is much more visible.

Finally I wrote my own simple icetea test for cliapp which basically repeatedly calls

Bench.reset_dut(self, 1)
interfaceUp(self, ["dut1"])

With NUCLEO_F429ZI this test fails after a random number of iterations, claiming that no connection can be established. On average it takes about 100 resets (it failed after 17 resets but another time after 130 resets). Note - we do not explicitly deinitialize Ethernet before resetting.

I ran this test on K64F, which has a different EMAC driver, and it did not fail a single time in a 1000 runs.

Most importantly however - I ran this test on UBLOX_EVK_ODIN_W2, which has exactly the same STM32_EMAC driver as NUCLEO_F429ZI and it also never failed in a 1000 runs. I checked that the two boards (UBLOX and NUCLEO) have identical Ethernet phy configuration and should execute exactly the same source code.

I took the effort to minimize the possibility that this is a network configuration issue by running the tests on multiple platforms (K64F, UBLOX_EVK_ODIN_W2 and NUCLEO_F429ZI) on two different networks (ARM's internal testing network and my local office network). Only F429ZI was having connectivity issues and it happenned on both networks. I locally also secured a static IP address but even then, the issue was reproducible.

Digging into the code I found that the STM32 EMAC driver calls back to the higher layer depending on the the PHY_LINKED_STATUS flag. I therefore suppose that the root cause is this flag not always being set.

Issue request type

[ ] Question
[ ] Enhancement
[x] Bug

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions