Skip to content

IRI expansion with missing @base does not conform to RFC 3986 #187

@RinkeHoekstra

Description

@RinkeHoekstra

RFC 3986 section 5.1 specifies that relative URIs should be expanded against the document's base URI. In absence of an explicit base, there are prescribed steps to determine the base IRI for a given document:

5.1.1. Base URI Embedded in Content . . . . . . . . . . 29
5.1.2. Base URI from the Encapsulating Entity . . . . . 29
5.1.3. Base URI from the Retrieval URI . . . . . . . . 30
5.1.4. Default Base URI . . . . . . . . . . . . . . . . 30

The current implementation in pyLD ignores the last two requirements. For 5.1.3 this is understandable, as the library only operates on a data payload. However, 5.1.4 is the catch-all that would ensure that @id values are always expanded to absolute IRIs.

In absence of this, non-IRI @id values in documents that do not explicitly specify a base in a context are not expanded to an absolute IRI. This means that the to_rdf function ignores them when producing N-Quads output. This is a showstopper for RDFLib/rdflib#2308.

The JSON-LD spec does allow for a means to prevent expansion against a base by setting @base to null (see https://www.w3.org/TR/json-ld/#base-iri) but does not specify that null is the default.

This violates test t0060 in and t0060.

The output should be something similar to (with a different application-specific base):

[
  {
    "@id": "https://w3c.github.io/json-ld-api/tests/document-relative",
    "@type": [ "https://w3c.github.io/json-ld-api/tests/expand/0060-in.jsonld#document-relative" ],
    "http://example.com/vocab#property": [
      {
        "@id": "http://example.org/document-base-overwritten",
        "@type": [ "http://example.org/test/#document-base-overwritten" ],
        "http://example.com/vocab#property": [
          {
            "@id": "https://w3c.github.io/json-ld-api/tests/document-relative",
            "@type": [ "https://w3c.github.io/json-ld-api/tests/expand/0060-in.jsonld#document-relative" ]
          },
          {
            "@id": "../document-relative",
            "@type": [ "#document-relative" ],
            "http://example.com/vocab#property": [ { "@value": "only @base is cleared" } ]
          }
        ]
      }
    ]
  }
]

But the output of pyld is:

  {
    "@id": "../document-relative",
    "@type": [
      "#document-relative"
    ],
    "http://example.com/vocab#property": [
      {
        "@id": "http://example.org/document-base-overwritten",
        "@type": [
          "http://example.org/test/#document-base-overwritten"
        ],
        "http://example.com/vocab#property": [
          {
            "@id": "../document-relative",
            "@type": [
              "#document-relative"
            ]
          },
          {
            "@id": "../document-relative",
            "@type": [
              "#document-relative"
            ],
            "http://example.com/vocab#property": [
              {
                "@value": "only @base is cleared"
              }
            ]
          }
        ]
      }
    ]
  }
]

The resulting N-Quads only returns a single triple:

http://example.org/document-base-overwritten> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/test/#document-base-overwritten> .

This is not a duplicate of #143 as that issue is about a case where the @base is specified.

The problem appears to reside here:

pyld/lib/pyld/jsonld.py

Lines 3186 to 3202 in 316fbc2

# handle @base
if '@base' in ctx:
base = ctx['@base']
if base is None:
base = None
elif _is_absolute_iri(base):
base = base
elif _is_relative_iri(base):
base = prepend_base(active_ctx.get('@base'), base)
else:
raise JsonLdError(
'Invalid JSON-LD syntax; the value of "@base" in a '
'@context must be a string or null.',
'jsonld.SyntaxError', {'context': ctx},
code='invalid base IRI')
rval['@base'] = base
defined['@base'] = True

Where in absence of a@base (or an explicit null base, see https://www.w3.org/TR/json-ld/#base-iri) a default base needs to be set.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions