Fix erase type determination for [Q/O/]BlockDevice::erase() #13947

LDong-Arm · 2020-11-23T15:30:28Z

Summary of changes

[Q/O/]BlockDevice::erase() use sfdp_iterate_next_largest_erase_type() to determine the largest erase size suitable for the requested address and size, as it means better erase performance than smaller chunks. But there are several issues:

sfdp_iterate_next_largest_erase_type();
- An alignment check is lacking. If the start address is aligned to a smaller erase unit but not a larger unit, this function wrongly returns the larger one, when the totally size to erase is larger than the larger unit. (Part of the cause of SPIFBlockDevice: erase block calculation issue #13528)
- Wrong comparison (total size > erase size), should be >=. Erasing one single unit bd->erase(addr, bd->get_erase_size(addr)); results in the following on all [Q/O/]BlockDevice targets:
```
[ERR ][SFDP]: No erase type was found for current region addr
```
  Note: Only visible with trace enabled, and erase carries on anyway due to another issue on this list... (Seen in SPIFBlockDevice: erase block calculation issue #13528 and other places such as the Mbed port for MCUboot).
- Incorrect region boundary check (though we rarely reach the end of a storage).
- If no applicable erase type has been found, it incorrectly returns the last type in the loop instead of an error. (This was why despite the second issue above, erase carries on anyway.)
- It tries to optimise computation by removing any unused erase types (from the type mask/list of supported erase types), but those types could be the correct one to use when erasing the next chunks.
[Q/O/]BlockDevice::erase(): It first checks the address and size, and only continues if they are aligned. But after that, it tries to "handle" unaligned erase by erasing whole block. Such "handling" is not only unnecessary, but hides the error in sfdp_iterate_next_largest_erase_type()'s return value mentioned above, results in data being wrongly erased. (Part of the cause of SPIFBlockDevice: erase block calculation issue #13528)

Huge thanks to @boraozgen for identifying several of them.

This PR fixes those issues. A unit test for the optimal erase type determination algorithm is added too.

Impact of changes

[Q/O/]BlockDevice::erase()'s erase type determination now works, and it correctly erases requested address ranges.

Migration actions required

None.

Documentation

None.

Pull request type

[x] Patch update (Bug fix / Target update / Docs update / Test update / Refactor)
[] Feature update (New feature / Functionality change / New API)
[] Major update (Breaking change E.g. Return code change / API behaviour change)

Test results

[] No Tests required for this change (E.g docs only update)
[x] Covered by existing mbed-os tests (Greentea or Unittest)
[] Tests / results supplied as part of this PR

Reviewers

@ARMmbed/mbed-os-core @boraozgen

…t check

drivers/source/SFDP.cpp

… is checked [Q/O/SPIFBlockDevice::erase() begin with an alignment check, after which unaligned erases should not happen or be allowed. If the erase address is not aligned to the value returned by sfdp_iterate_next_largest_erase_type(), it indicates an internal error in erase table parsing which should not be hidden.

ciarmcom · 2020-11-23T16:00:24Z

@LDong-Arm, thank you for your changes.
@ARMmbed/mbed-os-hal @ARMmbed/mbed-os-core @ARMmbed/mbed-os-maintainers please review.

boraozgen · 2020-11-24T09:45:45Z

Hi @LDong-Arm, thanks for working on this issue.

I did not check the code yet, but I wanted to try it out as a black box. As far I have seen the issue with the unwanted erasure of the beginning of the block seems to be solved. However I noticed that in those unaligned cases the algorithm fall back to the smallest erase block, and does not use the more efficient larger blocks. Is this what you intended? I can give details of my test if you please.
How about some unit tests for this? It would guarantee the intended operation.
A quick look at the code shows that there is a lot of duplicate code between the [Q/O/]SPIF classes. Could these be merged together?

LDong-Arm · 2020-11-24T10:08:01Z

Hi @boraozgen, thanks for the feedback.

However I noticed that in those unaligned cases the algorithm fall back to the smallest erase block, and does not use the more efficient larger blocks. Is this what you intended? I can give details of my test if you please.

The algorithm is expected to choose the largest block that satisfies alignment (and size) requirements. In the example you gave in #13528, storage.erase(8192, 61440), the new algorithm should erase 6 * 4KB blocks, 1 * 32KB block, and 1 * 4KB block, just as you originally suggested (which I think is optimal). But is it that you ran the code already, and it didn't choose the correct blocks according to the traces?

How about some unit tests for this? It would guarantee the intended operation.

I completely agree. This an a few other SFDP and BlockDevice algorithm issues would've been caught with unit tests, but we can raise a separate issue.

A quick look at the code shows that there is a lot of duplicate code between the [Q/O/]SPIF classes. Could these be merged together?

That would be a good refactoring, we can create raise a separate issue.

boraozgen · 2020-11-24T12:55:42Z

Okay, I think I isolated the issue further. When the bitfield gets reduced to a value, it does not get reset before the region changes. This leads to the following case:

I use this chip. See page 5 for memory map.

When I call spif.erase(0, 0x4000);, the 8K blocks are used:

[140ms][][DBG ][Main]: Erasing from 0 to 0x4000
[4042ms][][DBG ][SPIF]: erase - addr: 0, in_size: 16384
[6491ms][][DBG ][SPIF]: erase - addr: 0, size:16384, Inst: 0xd8h, erase size: 8192 ,
[6499ms][][DBG ][SPIF]: erase - Region: 0, Type:1
[6504ms][][DBG ][SPIF]: Erase Inst: 0xd8h, addr: 0, size: 16384
[8436ms][][DBG ][SPIF]: erase - addr: 8192, size:8192, Inst: 0xd8h, erase size: 8192 ,
[8444ms][][DBG ][SPIF]: erase - Region: 0, Type:1
[8449ms][][DBG ][SPIF]: Erase Inst: 0xd8h, addr: 8192, size: 8192

However, when I call spif.erase(0x1000, 0x4000);, it falls back (correctly) to 4K for the first block, but does not try the 8K block for the second part:

[8473ms][][DBG ][Main]: Erasing from 0x1000 to 0x5000
[8962ms][][DBG ][SPIF]: erase - addr: 4096, in_size: 16384
[79902ms][][DBG ][SPIF]: erase - addr: 4096, size:16384, Inst: 0x20h, erase size: 4096 ,
[79910ms][][DBG ][SPIF]: erase - Region: 0, Type:0
[79916ms][][DBG ][SPIF]: Erase Inst: 0x20h, addr: 4096, size: 16384
[407377ms][][DBG ][SPIF]: erase - addr: 8192, size:12288, Inst: 0x20h, erase size: 4096 ,
[407385ms][][DBG ][SPIF]: erase - Region: 0, Type:0
[407390ms][][DBG ][SPIF]: Erase Inst: 0x20h, addr: 8192, size: 12288
[408426ms][][DBG ][SPIF]: erase - addr: 12288, size:8192, Inst: 0x20h, erase size: 4096 ,
[408434ms][][DBG ][SPIF]: erase - Region: 0, Type:0
[408439ms][][DBG ][SPIF]: Erase Inst: 0x20h, addr: 12288, size: 8192
[408902ms][][DBG ][SPIF]: erase - addr: 16384, size:4096, Inst: 0x20h, erase size: 4096 ,
[408910ms][][DBG ][SPIF]: erase - Region: 0, Type:0
[408915ms][][DBG ][SPIF]: Erase Inst: 0x20h, addr: 16384, size: 4096

Removing the following line seems to fix it. I think that line is meant to optimize for speed by avoiding checking the same types again. I could not think of a side-effect. Any thoughts?

mbed-os/drivers/source/SFDP.cpp

Line 419 in 7525134

bitfield &= ~type_mask;

boraozgen · 2020-11-24T12:58:58Z

I completely agree. This an a few other SFDP and BlockDevice algorithm issues would've been caught with unit tests, but we can raise a separate issue.

How about starting with a unit test for this algorithm? I'm not familiar with the Mbed unit test infrastructure, otherwise I would suggest one. It would also be easier for others like me to extend the tests.

evedon · 2020-11-24T14:58:15Z

How about starting with a unit test for this algorithm? I'm not familiar with the Mbed unit test infrastructure, otherwise I would suggest one. It would also be easier for others like me to extend the tests.

I agree. Let's implement a unit test for this algorithm in this PR.

LDong-Arm · 2020-11-24T18:02:47Z

Removing the following line seems to fix it. I think that line is meant to optimize for speed by avoiding checking the same types again. I could not think of a side-effect. Any thoughts?

mbed-os/drivers/source/SFDP.cpp

Line 419 in 7525134

bitfield &= ~type_mask;

Very good finding. I believe the originally implementation didn't take into account our scenario. The speed improvement (which is not functionally correct) here is rather minimal.

The supported erase types of a given flash region are indicated in bitfields of the variable `type_mask`. Even if an erase type is unused for the current chunk (e.g. size too large, unaligned, etc.), its bitfield should NOT be cleared - the same erase type might actually be useful for the next chunk. The function argument is now a value instead of a reference.

LDong-Arm · 2020-11-24T18:41:06Z

Now it works correctly on my target.

…pplicable

LDong-Arm · 2020-11-25T13:49:28Z

drivers/source/SFDP.cpp

-    if (idx == -1) {
-        tr_error("No erase type was found for current region addr");
-    }
-    return largest_erase_type;


The old implementation incorrectly returned the last erase type it checked, instead of an error.

LDong-Arm · 2020-11-25T13:57:05Z

@boraozgen @evedon I've added a unit test as requested, and fixed another issue in sfdp_iterate_next_largest_erase_type() (it returned the last type in the loop instead of an error, if no suitable type was found), please review, thanks!

To run all unit tests (I'm not aware of any way to run only one test, but all tests are extremely quick to run),

mbed test --unittest

more info here. I recommend using Linux to run it if possible to avoid dependency issues.

drivers/tests/UNITTESTS/SFDP/test_sfdp.cpp

evedon

Looks good. Excellent work.

As a starting point, only sfdp_iterate_next_largest_erase_type(), which the pull request is intended to fix, is tested. More test cases shall be added in the future.

Pull request has been modified.

LDong-Arm · 2020-11-26T10:30:33Z

Thanks @evedon. I've just pushed final changes, fixing a few compiler warning around integers and chrono.

0xc0170 · 2020-11-26T13:19:11Z

CI started

mbed-ci · 2020-11-26T15:44:40Z

Jenkins CI Test : ✔️ SUCCESS

Build Number: 1 | 🔒 Jenkins CI Job | 🌐 Logs & Artifacts

CLICK for Detailed Summary

jobs	Status
jenkins-ci/mbed-os-ci_unittests	✔️
jenkins-ci/mbed-os-ci_build-example-ARM	✔️
jenkins-ci/mbed-os-ci_build-greentea-GCC_ARM	✔️
jenkins-ci/mbed-os-ci_cmake-example-GCC_ARM	✔️
jenkins-ci/mbed-os-ci_build-greentea-ARM	✔️
jenkins-ci/mbed-os-ci_cmake-example-ARM	✔️
jenkins-ci/mbed-os-ci_build-example-GCC_ARM	✔️
jenkins-ci/mbed-os-ci_build-cloud-example-GCC_ARM	✔️
jenkins-ci/mbed-os-ci_build-cloud-example-ARM	✔️
jenkins-ci/mbed-os-ci_cmake-example-test	✔️
jenkins-ci/mbed-os-ci_dynamic-memory-usage	✔️
jenkins-ci/mbed-os-ci_greentea-test	✔️
jenkins-ci/mbed-os-ci_cloud-client-pytest	✔️

LDong-Arm mentioned this pull request Nov 23, 2020

SPIFBlockDevice: erase block calculation issue #13528

Closed

mergify bot added the needs: work label Nov 23, 2020

LDong-Arm force-pushed the erase_algorithm_fix branch from 64cbaca to 986318f Compare November 23, 2020 15:41

sfdp_iterate_next_largest_erase_type: fix size check and add alignmen…

52627db

…t check

LDong-Arm force-pushed the erase_algorithm_fix branch from 986318f to d43e3c9 Compare November 23, 2020 15:45

LDong-Arm commented Nov 23, 2020

View reviewed changes

drivers/source/SFDP.cpp Show resolved Hide resolved

LDong-Arm commented Nov 23, 2020

View reviewed changes

drivers/source/SFDP.cpp Show resolved Hide resolved

LDong-Arm force-pushed the erase_algorithm_fix branch from d43e3c9 to 7525134 Compare November 23, 2020 15:55

ciarmcom added the release-type: patch Indentifies a PR as containing just a patch label Nov 23, 2020

ciarmcom requested review from a team November 23, 2020 16:00

ciarmcom added the needs: review label Nov 23, 2020

sfdp_iterate_next_largest_erase_type: return -1 if no erase type is a…

e0bd9a1

…pplicable

LDong-Arm commented Nov 25, 2020

View reviewed changes

LDong-Arm force-pushed the erase_algorithm_fix branch from 750c8f7 to 8cc8ce3 Compare November 25, 2020 14:05

LDong-Arm commented Nov 25, 2020

View reviewed changes

drivers/tests/UNITTESTS/SFDP/test_sfdp.cpp Outdated Show resolved Hide resolved

LDong-Arm mentioned this pull request Nov 25, 2020

PSoC6x KVstore failures #13352

Closed

evedon previously approved these changes Nov 25, 2020

View reviewed changes

mergify bot added needs: CI and removed needs: review needs: work labels Nov 25, 2020

LDong-Arm added 2 commits November 26, 2020 09:55

Initial unit test for SFDP

fb0f968

As a starting point, only sfdp_iterate_next_largest_erase_type(), which the pull request is intended to fix, is tested. More test cases shall be added in the future.

Fix integer type warnings in SFDP and *SPIFBlockDevice

c41f7cb

LDong-Arm force-pushed the erase_algorithm_fix branch from 8cc8ce3 to c41f7cb Compare November 26, 2020 10:29

evedon approved these changes Nov 26, 2020

View reviewed changes

0xc0170 approved these changes Nov 26, 2020

View reviewed changes

mergify bot added ready for merge and removed needs: CI labels Nov 26, 2020

0xc0170 merged commit 61e4b55 into ARMmbed:master Nov 26, 2020

mergify bot removed the ready for merge label Nov 26, 2020

mbedmain added release-version: 6.6.0 Release-pending and removed release-type: patch Indentifies a PR as containing just a patch Release-pending labels Dec 11, 2020

Fix erase type determination for [Q/O/]BlockDevice::erase() #13947

Fix erase type determination for [Q/O/]BlockDevice::erase() #13947

Uh oh!

Conversation

LDong-Arm commented Nov 23, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Impact of changes

Migration actions required

Documentation

Pull request type

Test results

Reviewers

Uh oh!

Uh oh!

Uh oh!

ciarmcom commented Nov 23, 2020

Uh oh!

boraozgen commented Nov 24, 2020

Uh oh!

LDong-Arm commented Nov 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

boraozgen commented Nov 24, 2020

Uh oh!

boraozgen commented Nov 24, 2020

Uh oh!

evedon commented Nov 24, 2020

Uh oh!

LDong-Arm commented Nov 24, 2020

Uh oh!

LDong-Arm commented Nov 24, 2020

Uh oh!

LDong-Arm Nov 25, 2020

Choose a reason for hiding this comment

Uh oh!

LDong-Arm commented Nov 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

evedon left a comment

Choose a reason for hiding this comment

Uh oh!

LDong-Arm commented Nov 26, 2020

Uh oh!

0xc0170 commented Nov 26, 2020

Uh oh!

mbed-ci commented Nov 26, 2020

Jenkins CI Test : ✔️ SUCCESS

Build Number: 1 | 🔒 Jenkins CI Job | 🌐 Logs & Artifacts

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

LDong-Arm commented Nov 23, 2020 •

edited

Loading

LDong-Arm commented Nov 24, 2020 •

edited

Loading

LDong-Arm commented Nov 25, 2020 •

edited

Loading