Skip to content

Conversation

LDong-Arm
Copy link
Contributor

@LDong-Arm LDong-Arm commented Nov 23, 2020

Summary of changes

Fixes: #13528

[Q/O/]BlockDevice::erase() use sfdp_iterate_next_largest_erase_type() to determine the largest erase size suitable for the requested address and size, as it means better erase performance than smaller chunks. But there are several issues:

  • sfdp_iterate_next_largest_erase_type();
    • An alignment check is lacking. If the start address is aligned to a smaller erase unit but not a larger unit, this function wrongly returns the larger one, when the totally size to erase is larger than the larger unit. (Part of the cause of SPIFBlockDevice: erase block calculation issue #13528)
    • Wrong comparison (total size > erase size), should be >=. Erasing one single unit bd->erase(addr, bd->get_erase_size(addr)); results in the following on all [Q/O/]BlockDevice targets:
      [ERR ][SFDP]: No erase type was found for current region addr
      
      Note: Only visible with trace enabled, and erase carries on anyway due to another issue on this list... (Seen in SPIFBlockDevice: erase block calculation issue #13528 and other places such as the Mbed port for MCUboot).
    • Incorrect region boundary check (though we rarely reach the end of a storage).
    • If no applicable erase type has been found, it incorrectly returns the last type in the loop instead of an error. (This was why despite the second issue above, erase carries on anyway.)
    • It tries to optimise computation by removing any unused erase types (from the type mask/list of supported erase types), but those types could be the correct one to use when erasing the next chunks.
  • [Q/O/]BlockDevice::erase(): It first checks the address and size, and only continues if they are aligned. But after that, it tries to "handle" unaligned erase by erasing whole block. Such "handling" is not only unnecessary, but hides the error in sfdp_iterate_next_largest_erase_type()'s return value mentioned above, results in data being wrongly erased. (Part of the cause of SPIFBlockDevice: erase block calculation issue #13528)

Huge thanks to @boraozgen for identifying several of them.

This PR fixes those issues. A unit test for the optimal erase type determination algorithm is added too.

Impact of changes

[Q/O/]BlockDevice::erase()'s erase type determination now works, and it correctly erases requested address ranges.

Migration actions required

None.

Documentation

None.


Pull request type

[x] Patch update (Bug fix / Target update / Docs update / Test update / Refactor)
[] Feature update (New feature / Functionality change / New API)
[] Major update (Breaking change E.g. Return code change / API behaviour change)

Test results

[] No Tests required for this change (E.g docs only update)
[x] Covered by existing mbed-os tests (Greentea or Unittest)
[] Tests / results supplied as part of this PR

Reviewers

@ARMmbed/mbed-os-core @boraozgen


… is checked

[Q/O/SPIFBlockDevice::erase() begin with an alignment check,
after which unaligned erases should not happen or be allowed.

If the erase address is not aligned to the value returned by
sfdp_iterate_next_largest_erase_type(), it indicates an
internal error in erase table parsing which should not be
hidden.
@ciarmcom ciarmcom added the release-type: patch Indentifies a PR as containing just a patch label Nov 23, 2020
@ciarmcom ciarmcom requested review from a team November 23, 2020 16:00
@ciarmcom
Copy link
Member

@LDong-Arm, thank you for your changes.
@ARMmbed/mbed-os-hal @ARMmbed/mbed-os-core @ARMmbed/mbed-os-maintainers please review.

@boraozgen
Copy link
Contributor

Hi @LDong-Arm, thanks for working on this issue.

  • I did not check the code yet, but I wanted to try it out as a black box. As far I have seen the issue with the unwanted erasure of the beginning of the block seems to be solved. However I noticed that in those unaligned cases the algorithm fall back to the smallest erase block, and does not use the more efficient larger blocks. Is this what you intended? I can give details of my test if you please.

  • How about some unit tests for this? It would guarantee the intended operation.

  • A quick look at the code shows that there is a lot of duplicate code between the [Q/O/]SPIF classes. Could these be merged together?

@LDong-Arm
Copy link
Contributor Author

LDong-Arm commented Nov 24, 2020

Hi @boraozgen, thanks for the feedback.

However I noticed that in those unaligned cases the algorithm fall back to the smallest erase block, and does not use the more efficient larger blocks. Is this what you intended? I can give details of my test if you please.

The algorithm is expected to choose the largest block that satisfies alignment (and size) requirements. In the example you gave in #13528, storage.erase(8192, 61440), the new algorithm should erase 6 * 4KB blocks, 1 * 32KB block, and 1 * 4KB block, just as you originally suggested (which I think is optimal). But is it that you ran the code already, and it didn't choose the correct blocks according to the traces?

  • How about some unit tests for this? It would guarantee the intended operation.

I completely agree. This an a few other SFDP and BlockDevice algorithm issues would've been caught with unit tests, but we can raise a separate issue.

  • A quick look at the code shows that there is a lot of duplicate code between the [Q/O/]SPIF classes. Could these be merged together?

That would be a good refactoring, we can create raise a separate issue.

@boraozgen
Copy link
Contributor

Okay, I think I isolated the issue further. When the bitfield gets reduced to a value, it does not get reset before the region changes. This leads to the following case:

I use this chip. See page 5 for memory map.

When I call spif.erase(0, 0x4000);, the 8K blocks are used:

[140ms][][DBG ][Main]: Erasing from 0 to 0x4000
[4042ms][][DBG ][SPIF]: erase - addr: 0, in_size: 16384
[6491ms][][DBG ][SPIF]: erase - addr: 0, size:16384, Inst: 0xd8h, erase size: 8192 ,
[6499ms][][DBG ][SPIF]: erase - Region: 0, Type:1
[6504ms][][DBG ][SPIF]: Erase Inst: 0xd8h, addr: 0, size: 16384
[8436ms][][DBG ][SPIF]: erase - addr: 8192, size:8192, Inst: 0xd8h, erase size: 8192 ,
[8444ms][][DBG ][SPIF]: erase - Region: 0, Type:1
[8449ms][][DBG ][SPIF]: Erase Inst: 0xd8h, addr: 8192, size: 8192

However, when I call spif.erase(0x1000, 0x4000);, it falls back (correctly) to 4K for the first block, but does not try the 8K block for the second part:

[8473ms][][DBG ][Main]: Erasing from 0x1000 to 0x5000
[8962ms][][DBG ][SPIF]: erase - addr: 4096, in_size: 16384
[79902ms][][DBG ][SPIF]: erase - addr: 4096, size:16384, Inst: 0x20h, erase size: 4096 ,
[79910ms][][DBG ][SPIF]: erase - Region: 0, Type:0
[79916ms][][DBG ][SPIF]: Erase Inst: 0x20h, addr: 4096, size: 16384
[407377ms][][DBG ][SPIF]: erase - addr: 8192, size:12288, Inst: 0x20h, erase size: 4096 ,
[407385ms][][DBG ][SPIF]: erase - Region: 0, Type:0
[407390ms][][DBG ][SPIF]: Erase Inst: 0x20h, addr: 8192, size: 12288
[408426ms][][DBG ][SPIF]: erase - addr: 12288, size:8192, Inst: 0x20h, erase size: 4096 ,
[408434ms][][DBG ][SPIF]: erase - Region: 0, Type:0
[408439ms][][DBG ][SPIF]: Erase Inst: 0x20h, addr: 12288, size: 8192
[408902ms][][DBG ][SPIF]: erase - addr: 16384, size:4096, Inst: 0x20h, erase size: 4096 ,
[408910ms][][DBG ][SPIF]: erase - Region: 0, Type:0
[408915ms][][DBG ][SPIF]: Erase Inst: 0x20h, addr: 16384, size: 4096

Removing the following line seems to fix it. I think that line is meant to optimize for speed by avoiding checking the same types again. I could not think of a side-effect. Any thoughts?

bitfield &= ~type_mask;

@boraozgen
Copy link
Contributor

I completely agree. This an a few other SFDP and BlockDevice algorithm issues would've been caught with unit tests, but we can raise a separate issue.

How about starting with a unit test for this algorithm? I'm not familiar with the Mbed unit test infrastructure, otherwise I would suggest one. It would also be easier for others like me to extend the tests.

@evedon
Copy link
Contributor

evedon commented Nov 24, 2020

How about starting with a unit test for this algorithm? I'm not familiar with the Mbed unit test infrastructure, otherwise I would suggest one. It would also be easier for others like me to extend the tests.

I agree. Let's implement a unit test for this algorithm in this PR.

@LDong-Arm
Copy link
Contributor Author

Removing the following line seems to fix it. I think that line is meant to optimize for speed by avoiding checking the same types again. I could not think of a side-effect. Any thoughts?

bitfield &= ~type_mask;

Very good finding. I believe the originally implementation didn't take into account our scenario. The speed improvement (which is not functionally correct) here is rather minimal.

The supported erase types of a given flash region are indicated
in bitfields of the variable `type_mask`. Even if an erase type
is unused for the current chunk (e.g. size too large, unaligned, etc.),
its bitfield should NOT be cleared - the same erase type might
actually be useful for the next chunk.

The function argument is now a value instead of a reference.
@LDong-Arm
Copy link
Contributor Author

Now it works correctly on my target.

Comment on lines -420 to -423
if (idx == -1) {
tr_error("No erase type was found for current region addr");
}
return largest_erase_type;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old implementation incorrectly returned the last erase type it checked, instead of an error.

@LDong-Arm
Copy link
Contributor Author

LDong-Arm commented Nov 25, 2020

@boraozgen @evedon I've added a unit test as requested, and fixed another issue in sfdp_iterate_next_largest_erase_type() (it returned the last type in the loop instead of an error, if no suitable type was found), please review, thanks!

To run all unit tests (I'm not aware of any way to run only one test, but all tests are extremely quick to run),

mbed test --unittest

more info here. I recommend using Linux to run it if possible to avoid dependency issues.

evedon
evedon previously approved these changes Nov 25, 2020
Copy link
Contributor

@evedon evedon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Excellent work.

As a starting point, only sfdp_iterate_next_largest_erase_type(),
which the pull request is intended to fix, is tested. More test
cases shall be added in the future.
@mergify mergify bot dismissed evedon’s stale review November 26, 2020 10:30

Pull request has been modified.

@LDong-Arm
Copy link
Contributor Author

Thanks @evedon. I've just pushed final changes, fixing a few compiler warning around integers and chrono.

@0xc0170
Copy link
Contributor

0xc0170 commented Nov 26, 2020

CI started

@mbed-ci
Copy link

mbed-ci commented Nov 26, 2020

Jenkins CI Test : ✔️ SUCCESS

Build Number: 1 | 🔒 Jenkins CI Job | 🌐 Logs & Artifacts

CLICK for Detailed Summary

jobs Status
jenkins-ci/mbed-os-ci_unittests ✔️
jenkins-ci/mbed-os-ci_build-example-ARM ✔️
jenkins-ci/mbed-os-ci_build-greentea-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_cmake-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_build-greentea-ARM ✔️
jenkins-ci/mbed-os-ci_cmake-example-ARM ✔️
jenkins-ci/mbed-os-ci_build-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_build-cloud-example-GCC_ARM ✔️
jenkins-ci/mbed-os-ci_build-cloud-example-ARM ✔️
jenkins-ci/mbed-os-ci_cmake-example-test ✔️
jenkins-ci/mbed-os-ci_dynamic-memory-usage ✔️
jenkins-ci/mbed-os-ci_greentea-test ✔️
jenkins-ci/mbed-os-ci_cloud-client-pytest ✔️

@0xc0170 0xc0170 merged commit 61e4b55 into ARMmbed:master Nov 26, 2020
@mergify mergify bot removed the ready for merge label Nov 26, 2020
@mbedmain mbedmain added release-version: 6.6.0 Release-pending and removed release-type: patch Indentifies a PR as containing just a patch Release-pending labels Dec 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SPIFBlockDevice: erase block calculation issue

7 participants