Skip to content

Conversation

@NMertsch
Copy link
Contributor

@NMertsch NMertsch commented Apr 30, 2025

Summary

Align license names with the SPDX License List.

Fixes #2270
Accompanies beeware/briefcase-template#191

Details

I changed the non-standard license names listed in the cookiecutter template to SPDX terms (formatted as "[Full Name] (Identifier)"):

  • BSD-3-Clause
    • old: BSD license
    • new: BSD 3-Clause "New" or "Revised" License (BSD-3-Clause)
  • MIT
    • old: MIT license
    • new: MIT License (MIT)
  • Apache-2.0
    • old: Apache Software License
    • new: Apache License 2.0 (Apache-2.0)
  • GPL-2.0
    • old: GNU General Public License v2 (GPLv2)
    • new: GNU General Public License v2.0 only (GPL-2.0)
  • GPL-2.0+
    • old: GNU General Public License v2 or later (GPLv2+)
    • new: GNU General Public License v2.0 or later (GPL-2.0+)
  • GPL-3.0
    • old: GNU General Public License v3 (GPLv3)
    • new: GNU General Public License v3.0 only (GPL-3.0)
  • GPL-3.0+
    • old: GNU General Public License v3 or later (GPLv3+)
    • new: GNU General Public License v3.0 or later (GPL-3.0+)

PR Checklist:

  • All new features have been tested
  • All new features have been documented
  • I have read the CONTRIBUTING.md file
  • will abide by the code of conduct

briefcase-template-repo: https://github.com/NMertsch/briefcase-template
briefcase-template-ref: 2270-improve-license-selection-wording

@freakboy3742
Copy link
Member

As noted with the template PR - it may be preferable to use SPDX labels as identifiers (for the purposes of -Q etc).

@NMertsch
Copy link
Contributor Author

NMertsch commented May 1, 2025

Thank you @freakboy3742, I hope my changes are aligned with your expectations. I tested it locally with my briefcase-template fork: briefcase new --template path/to/briefcase-template.

I'm slightly bothered by all the duplicate strings which have to match each other. Do you think it makes sense to use a dataclass like this one to model licenses?

from dataclasses import dataclass

@dataclass(frozen=True)
class SpdxLicense:
    id: str
    name: str

@NMertsch
Copy link
Contributor Author

NMertsch commented May 1, 2025

Tests fail in both PRs because they use the main branch of the other repo. Is there something I can do about that?

@NMertsch NMertsch marked this pull request as ready for review May 1, 2025 22:29
@freakboy3742
Copy link
Member

Tests fail in both PRs because they use the main branch of the other repo. Is there something I can do about that?

Yes - as noted on the template PR, we can make the template PR test against this branch; we can then land that PR, then re-test this one once we're happy with it.

I've made the configuration tweak on the template PR so we can run the test suite there; that looks like it's passing, so once this PR is ready to merge (an initial scan looks promising; I'll do a full review shortly), I'll merge both.

@freakboy3742
Copy link
Member

I'm slightly bothered by all the duplicate strings which have to match each other. Do you think it makes sense to use a dataclass like this one to model licenses?

Your instincts about the duplicated text are definitely sound; however, I'm not convinced a dataclass is what is missing here. A dataclass certainly could be part of the solution, but the real problem is that the big dictionary of known licenses:

        {
             "BSD-3-Clause": 'BSD-3-Clause: BSD 3-Clause "New" or "Revised" License',
             "MIT": "MIT: MIT License",
             "Apache-2.0": "Apache-2.0: Apache License 2.0",
             "GPL-2.0": "GPL-2.0: GNU General Public License v2.0 only",
             "GPL-2.0+": "GPL-2.0+: GNU General Public License v2.0 or later",
             "GPL-3.0": "GPL-3.0: GNU General Public License v3.0 only",
             "GPL-3.0+": "GPL-3.0+: GNU General Public License v3.0 or later",
             "Proprietary": "Proprietary",
             "Other": "Other",
         }

is replicated in multiple places. A cleanup to define this dictionary of constants in new.py and then using that dictionary in convert and the tests definitely seems like a worthwhile cleanup.

Arguably, the "hint patterns" could also be captured in this data source dictionary - at which point, maybe a dataclass (or at least a class wrapper) makes sense, because you could write a match() function that wraps up the "does this random text match this license?" logic into something tied to the license. However, my instinct is that this is probably overkill - licenses are something that are selected once, and then never used again - we're not going to get a whole lot of reuse from building a "comprehensive SPDX license validation framework".

As an aside - there's 2 pieces of work that are related to this. They don't need to be resolved as part of this PR, but given you're now familiar with the licenses part of the codebase, and you might be looking for a next place to contribute:

  1. Support PEP 639 license metadata #2146 - PEP639 has changed how licenses can/should be represented in pyproject.toml; we need to update how we're handling license fields. Annoyingly, there's already code in Briefcase to deprecate how licenses are represented, because when we added the convert command, PEP639 hadn't been finalised, and the jury was out on the "right" way to display a license.
  2. Derive new app wizard properties from cookiecutter metadata #392 - Ideally, the list of known licenses wouldn't be hard-coded in Briefcase and the template - the template would be the canonical source (so that an end-user could modify the template to add an new licence if they wanted). Or, I guess we could go direct to SPDX (or some other codified source of licence text) for a list of canonical licenses that a Briefcase project could use.

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, and works great in my testing.

As noted in my other comment, refactoring the "dictionary of constants" into a common constant definition is really the only note I've got here; as part of that refactor, I'd also be inclined to modify how the human-readable version of the labels are rendered.

The SPDX identifier is a good canonical identifier, but actual humans are more likely to understand "BSD 3-Clause "new" or "revised" license" than "BSD-3-Clause". On that basis, I'd suggest the human labels should be of the form:

BSD 3-Clause "New" or "Revised" License (BSD-3-Clause)

That way, the SDPX label is visible, but it's presented more as "clarifying" text, rather than "authoritative" (if that makes sense?)

@NMertsch
Copy link
Contributor Author

NMertsch commented May 2, 2025

As an aside - there's 2 pieces of work that are related to this. They don't need to be resolved as part of this PR, but given you're now familiar with the licenses part of the codebase, and you might be looking for a next place to contribute: [...]

I'd love to resolve these as well, but my vacation is over soon and I won't have time for it in the coming weeks. Do you plan to host another Beeware sprint at EuroPython in July, and can these tasks wait until then?

(I'll look at the other comments later/tomorrow. Thanks already for being so helpful and responsive. I definitely learnt quite a bit from you in the past days!)

@freakboy3742
Copy link
Member

I'd love to resolve these as well, but my vacation is over soon and I won't have time for it in the coming weeks. Do you plan to host another Beeware sprint at EuroPython in July, and can these tasks wait until then?

Yes, I'll be at the EuroPython sprints (I'm also teaching a tutorial). There's no urgency on these tasks though; assuming nobody resolves them before EuroPython, you'll be more than welcome to work on them.

(I'll look at the other comments later/tomorrow. Thanks already for being so helpful and responsive. I definitely learnt quite a bit from you in the past days!)

My pleasure - thanks for the great ideas and contributions!

@NMertsch
Copy link
Contributor Author

NMertsch commented May 2, 2025

A cleanup to define this dictionary of constants in new.py and then using that dictionary in convert and the tests definitely seems like a worthwhile cleanup.

Done in c8eb3dd.

The SPDX identifier is a good canonical identifier, but actual humans are more likely to understand "BSD 3-Clause "new" or "revised" license" than "BSD-3-Clause". On that basis, I'd suggest the human labels should be of the form:

BSD 3-Clause "New" or "Revised" License (BSD-3-Clause)

That way, the SDPX label is visible, but it's presented more as "clarifying" text, rather than "authoritative" (if that makes sense?)

It makes sense to me, but I'm not certain that we both think of the same thing. Is 59a0d4e what you imagined?

Arguably, the "hint patterns" could also be captured in this data source dictionary - at which point, maybe a dataclass (or at least a class wrapper) makes sense [...]

The hint patterns need to be checked in a specific order, so I think there are two things that should not me mixed:

  • Combination of license ID, license name, and display string (e.g. as a dataclass)
  • Hint patterns and their order (e.g. as dict with that dataclass as key type)

My idea would be a file like this:

# licenses.py
from dataclasses import dataclass

@dataclass(frozen=True)
class LicenseOption:
    """A license option for briefcase projects."""

    spdx_id: str | None
    """The SPDX License Identifier, or ``None`` for non-SPDX options."""
    name: str
    """The SPDX License Name, or a suitable name to identify a non-SPDX option."""

    def __str__(self) -> str:
        """The string representation of this license option."""
        return self.name if self.spdx_id is None else f"{self.name} (self.spdx_id)"

Bsd3Clause = LicenseOption("BSD-3-Clause", 'BSD 3-Clause "New" or "Revised" License')
Mit = LicenseOption("MIT", "MIT License")
Apache2 = LicenseOption("Apache-2.0", "Apache License 2.0")
[...]
Other = LicenseOption(None, "Other")

LICENSE_OPTIONS = [Bsd3Clause, Mit, Apache2, ..., Other]

The license selection prompts in new.py and convert.py could then use {str(license_): license_ for license in LICENSE_OPTIONS} as options, and the current dictionary hint_patterns: dict[str, list[str]] = {"Apache-2.0": ["Apache"], ...} could then be refactored to hint_patterns: dict[LicenseOption, list[str]] = {Apache2: ["Apache"], ...}.

If this needs another round of refinement, I'm fine with keeping this PR as it is, and deferring such ideas to a future PR (e.g. as part of working in #392 or #2146).

@freakboy3742
Copy link
Member

A cleanup to define this dictionary of constants in new.py and then using that dictionary in convert and the tests definitely seems like a worthwhile cleanup.

Done in c8eb3dd.

Looks good - although the uses in the tests (especially examples like here and here, where we are literally checking that the known license list is the one that has been passed in as the list of question options, seem like additional opportunities for cleanup.

It makes sense to me, but I'm not certain that we both think of the same thing. Is 59a0d4e what you imagined?

Yes - that's what I had in mind.

Arguably, the "hint patterns" could also be captured in this data source dictionary - at which point, maybe a dataclass (or at least a class wrapper) makes sense [...]

The hint patterns need to be checked in a specific order:

This is true, but as of Python 3.5, we can guarantee that dictionary insertion order is the same as iteration order, so the dependencies in licenses can be encoded safely in the definition - but, even if that wasn't true, we can define a list, and convert to a dictionary as required; that doesn't change the "set of matches" that are associated with finding a single license, and having if license.matches(full_text) logic.

If this needs another round of refinement, I'm fine with keeping this PR as it is, and deferring such ideas to a future PR (e.g. as part of working in #392 or #2146).

I can see what you're aiming at - but as I said previously, my instinct is that this is probably overkill, even in the context of the additional PRs. My bet for now would be to leave it until it's clear that there's a benefit to be gained.

@NMertsch
Copy link
Contributor Author

NMertsch commented May 3, 2025

the uses in the tests (especially examples like here and here, where we are literally checking that the known license list is the one that has been passed in as the list of question options, seem like additional opportunities for cleanup.

I hope 69f6d08 is what you had in mind. Otherwise please make the change yourself or be more explicit :)

Is there anything else missing here?

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most recent change wasn't quite what I had in mind - validating the full API call is still worth it, to ensure that the full list of options being presented is as expected; however, that's an easy tweak for me to make.

I've also pulled out the default license in the new command as a constant; it's only used in one place, but I think it's worth being explicit.

Lastly, I've added a removal changenote - this is an edge case, but it's worth being explicit that if anyone is using the -Q license=XXX option in a script somewhere, their script will need to change.

With that - I think this is ready to land! Thanks for the pair of PRs!

@freakboy3742 freakboy3742 merged commit ce6918c into beeware:main May 4, 2025
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

License selection mentions ambiguous "BSD license"

2 participants