Skip to content

Conversation

domel
Copy link
Contributor

@domel domel commented Sep 8, 2025

See #37

This PR adds ns/rdf-xsd.ttl, an informational, machine-readable Turtle file that declares the XML Schema datatypes which RDF gives special recognition (per RDF Concepts). The goal is to provide a canonical source for tools and datasets that today rely on ad-hoc “xsd vocabularies”.

A new file: ns/rdf-xsd.ttl with:

  • An owl:Ontology header (title, date, short description).
  • For each included XSD datatype:
    a rdfs:Datatype
    rdfs:isDefinedBy <http://www.w3.org/2001/XMLSchema#>
    rdfs:label (human-readable name)
  • A brief informative comment describing value/lexical space.

A small number of RDFS subclass statements that reflect well-known XSD 1.1 derivation relationships, e.g.:
xsd:integer rdfs:subClassOf xsd:decimal

  • Numeric tower: byte ⊑ short ⊑ int ⊑ long ⊑ integer ⊑ decimal
  • Unsigned chain and bounds: unsignedByte ⊑ unsignedShort ⊑ unsignedInt ⊑ unsignedLong ⊑ nonNegativeInteger
  • Sign-restricted chains: positiveInteger ⊑ nonNegativeInteger, negativeInteger ⊑ nonPositiveInteger
  • String tokenization chain: normalizedString ⊑ string, token ⊑ normalizedString, language ⊑ token, Name ⊑ token, NCName ⊑ Name, NMTOKEN ⊑ token
  • Temporal/duration refinements: dateTimeStamp ⊑ dateTime, yearMonthDuration ⊑ duration, dayTimeDuration ⊑ duration

Datatypes covered:
string, boolean, decimal, integer, double, float, date, time, dateTime, dateTimeStamp, gYear, gMonth, gDay, gYearMonth, gMonthDay, duration, yearMonthDuration, dayTimeDuration, byte, short, int, long, unsignedByte, unsignedShort, unsignedInt, unsignedLong, positiveInteger, nonNegativeInteger, negativeInteger, nonPositiveInteger, hexBinary, base64Binary, anyURI, language, normalizedString, token, NMTOKEN, Name, NCName.

Facets covered:
length, minLength, maxLength, pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minExclusive, minInclusive, totalDigits, fractionDigits, Assertions, explicitTimezone.

@domel
Copy link
Contributor Author

domel commented Sep 8, 2025

At this stage, the repository contains rdf-xsd.ttl as a minimum plan, i.e., only the canonical list of datatypes, a few subclass relationships, labels, and brief comments.

I have two open questions for the group before moving further:

  1. Constraining facets as RDF properties?
    Should we include the constraining facets defined in XSD 1.1 Datatypes (e.g., xsd:maxInclusive, xsd:minExclusive, etc.) as rdf:Property declarations in the Turtle file?
    Example:
    xsd:maxInclusive a rdf:Property ;
        rdfs:label "max inclusive" ;
        rdfs:comment "The inclusive upper bound of an ordered datatype." .
    
  2. Use of facets for datatype restrictions?
    If we do declare these facets, should we also apply them to model the restricted datatypes directly in Turtle using owl:onDatatype + owl:withRestrictions?
    Example:
     xsd:int a rdfs:Datatype ;
         rdfs:label "int" ;
         owl:onDatatype xsd:long ;
         rdfs:subClassOf rdfs:Datatype ;
         owl:withRestrictions (
             [ xsd:maxInclusive 2147483647 ]
             [ xsd:minInclusive -2147483648 ]
         ) .
    

Feedback on whether this is in scope for rdf-xsd.ttl would be very helpful before proceeding.

@domel domel requested review from afs, gkellogg and niklasl September 8, 2025 16:00
@niklasl
Copy link

niklasl commented Sep 8, 2025

I think this looks very good!

I would also support adding the constraint properties themselves. I cannot argue for defining OWL restrictions on these as in-scope for the WG though (but if straightforward, I would not object to it as a courtesy). Could a maintenance group add this later on if asked for?

@domel
Copy link
Contributor Author

domel commented Sep 9, 2025

Now the Turtle file has facets.

@domel domel marked this pull request as ready for review September 9, 2025 11:40
@TallTed
Copy link
Member

TallTed commented Sep 12, 2025

Please confirm that I've inferred correctly, that means rdfs:subClassOf?

I think it's worth including this — or the correct meaning of that symbol — before the Numeric tower and subsequent bullet points. And maybe adding ⊑ decimal to the current tower.

@domel
Copy link
Contributor Author

domel commented Sep 13, 2025

@TallTed Yes, ⊑ is intended to denote rdfs:subClassOf. Thanks for pointing this out. I agree it would be good to clarify this explicitly before the numeric tower.

As for the numeric tower itself, both xsd:decimal and xsd:integer are already included correctly in rdf-xsd.ttl. The PR description may have been a bit imprecise. I will update it to avoid confusion.

@afs
Copy link
Contributor

afs commented Sep 13, 2025

Numeric tower: byte ⊑ short ⊑ int ⊑ long ⊑ integer ⊑ decimal

XSD calls the relationship "derived" types.

Copy link
Member

@TallTed TallTed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not closely read the whole PR. Some of the suggestions here are more questions, and may impact multiple lines on which I've not specifically commented. Another review is likely, after these are acted upon.

ns/rdf-xsd.ttl Outdated
rdfs:label "NCName" ;
rdfs:comment "XML 'non-colonized' names matching the NCName production (i.e., Name without the colon). Derived from xsd:Name." .

xsd:length a rdf:Property ;
Copy link
Contributor

@afs afs Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constraining facets as RDF properties?

I would say these over complicate this file as are not needed.

  1. Properties are an RDF feature; isn't the URI xsd:length etc minting URIs in someone else's namespace?
  2. RDF Concepts does not mention facets.

owl:onDatatype + owl:withRestrictions?

Too much OWL for this base RDF file.

This seems to be SHACL!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say these over complicate this file as are not needed.

Questionably "not needed". We cannot assume that future readers have any foundation that we have not built for them!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you agree with my point! (NB in referring to "Constraining facets as RDF properties?")

Facets are not mentioned as part of RDF Datatypes. They are an XML Schema way to derive datatypes.

We cannot assume that future readers have any foundation that we have not built for them!

We build on other standards. "Future readers" are not coming with zero-context to this file. This file is exact details - the reader is mostly likely someone - or a machine - that wants exact details to add to their knowledge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the others in this thread that facets can indeed be useful. In practice, they are often employed together with OWL when defining constraints. At the moment this is something of a grey area, and including them would close that gap. And of course, I always prefer the “better more than less” approach, if someone does not want to use them, they can simply ignore them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot assume that future readers have any foundation that we have not built for them!

We build on other standards. "Future readers" are not coming with zero-context to this file.

Foundation building can certainly include pointers to existing standards upon which we build our standards — but we cannot assume that future readers have already found such existing standards, and so omit our pointers!

(I think we're in agreement, even if we're not quite communicating what we've agreed.)

@domel domel requested review from TallTed and afs September 27, 2025 12:30
@domel domel requested a review from afs September 28, 2025 15:32
@afs
Copy link
Contributor

afs commented Sep 28, 2025

Suggestion: split this into two PRs, for the datatypes definitions, so that can proceed; and for the facets.

would close that gap [with OWL].

The introduction of facet vocabulary only happened in this PR. Facets deserves an issue and WG discussion because it is a new feature. It also would presumably need tests, leading to an implementation report, and clarity about the relationship with OWL (i.e. text in RDF Schema).

Do they form part of D-entailment?
What about SPARQL?

if someone does not want to use them, they can simply ignore them.

I don't see it like that. It is the nature of these namespace documents that it is very hard to remove defined terms. Therefore, we should be sure that the terms are the right ones, with the right definitions.

@domel
Copy link
Contributor Author

domel commented Sep 28, 2025

I’m in favor of splitting this into two PRs: one that contains only the XSD datatype definitions (so that work can proceed now), and a second, focused PR for facets. If @niklasl agrees, we can do exactly that.

D-entailment. Facets would not be part of D-entailment. RDF 1.2’s D-entailment hinges on recognized datatypes; facets are an XML Schema mechanism for deriving datatypes and should remain outside RDF’s entailment regime unless the WG explicitly decides otherwise. Any facet vocabulary we add should therefore be clearly marked as informative with no normative semantic consequences for RDF entailment.

SPARQL. Likewise, there is no effect on SPARQL semantics. SPARQL does not treat facets as built-in operators or properties; tools might reference such IRIs for documentation or code generation, but there would be no change to SPARQL evaluation or algebra.

Why a (non-normative) facet vocabulary still helps. My working assumption, shared, I believe, with @niklasl in earlier exchanges, is that facets appear in predicate position in RDF graphs (e.g., within OWL 2 datatype restrictions) and are therefore reasonably modeled as rdf:Property for documentation and discoverability purposes. In practice, people already use them this way; see, for example, this discussion: #64 (comment)

@afs
Copy link
Contributor

afs commented Sep 29, 2025

Why a (non-normative) facet vocabulary still helps.

I would have thought this file was normative.

@ektrah
Copy link
Member

ektrah commented Sep 29, 2025

If IRIs of the form http://www.w3.org/2001/XMLSchema#xxx must first have their semantics defined for use in RDF, another option would be to maintain them in a registry like the rdf: and rdfs: vocabularies. This would allow even other specifications to "unlock" xsd: terms for use in RDF by adding them to registry. The rdf-xsd.ttl file would simply be the machine-readable version of the registry.

@domel
Copy link
Contributor Author

domel commented Sep 29, 2025

@ektrah https://w3c.github.io/rdf-schema/spec/#ch_datatypes
https://w3c.github.io/rdf-concepts/spec/#xsd-datatypes

@domel
Copy link
Contributor Author

domel commented Sep 29, 2025

Why a (non-normative) facet vocabulary still helps.

I would have though this file was normative.

I would very much prefer this file to be normative. However, if we are unable to reach a clear shared position in the near term, a non-normative facet vocabulary is still a meaningful improvement over the status quo: it gives implementers and content authors a stable reference while we continue the discussion. Non-normative status need not be permanent; we can iterate with issues, tests, and implementations, and once consensus emerges, promote it to normative in a subsequent revision. In short, normative is ideal; if that’s not yet achievable, publishing a non-normative vocabulary is better than publishing nothing.

rdfs:isDefinedBy <https://www.w3.org/TR/xmlschema11-2/#date> ;
rdfs:label "date" ;
rdfs:comment "Calendar dates as top-open day intervals. Lexical form: YYYY-MM-DD with optional timezone." .
rdfs:comment "Calendar dates as top-open day intervals. Lexical form: YYYY-MM-DD with optional timezone. Timezone lexical forms: 'Z' or an offset of the form ±hh:mm with 00 ≤ mm ≤ 59 and |offset| ≤ 14:00 (e.g., '2025-09-27Z', '2025-09-27+02:00')." .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any default treatment of dates that omit the timezone?

Is there a reason behind omitting such a date in the examples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand, in the old and new files, there is the same fragment rdfs:comment "Calendar dates as top-open day intervals. Lexical form: YYYY-MM-DD with optional timezone. Timezone lexical forms: 'Z' or an offset of the form ±hh:mm with 00 ≤ mm ≤ 59 and |offset| ≤ 14:00 (e.g., '2025-09-27Z', '2025-09-27+02:00')." .

xsd:time a rdfs:Datatype ;
rdfs:isDefinedBy <https://www.w3.org/TR/xmlschema11-2/#time> ;
rdfs:label "time" ;
rdfs:comment "Times that recur each day or occur on some day. Lexical form: hh:mm:ss(.s+)? with optional timezone. Timezone lexical forms: 'Z' or ±hh:mm with |offset| ≤ 14:00 (e.g., '23:59:59Z', '08:15:30.5-05:00')." .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any default treatment of times that omit the timezone?

Is there a reason behind omitting such a time in the examples?

xsd:dateTime a rdfs:Datatype ;
rdfs:isDefinedBy <https://www.w3.org/TR/xmlschema11-2/#dateTime> ;
rdfs:label "dateTime" ;
rdfs:comment "Instants of time, optionally with timezone. Lexical form: YYYY-MM-DDThh:mm:ss(.s+)? with optional timezone. Values with different explicit offsets can be equal as instants. Timezone lexical forms: 'Z' or ±hh:mm with |offset| ≤ 14:00 (e.g., '2025-09-27T14:03:00+02:00')." .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any default treatment of datetimes that omit the timezone?

Is there a reason behind omitting such a datetime in the examples?

xsd:double a rdfs:Datatype ;
rdfs:isDefinedBy <https://www.w3.org/TR/xmlschema11-2/#double> ;
rdfs:label "double" ;
rdfs:comment "IEEE 754 64-bit floating-point. Lexical forms use decimal or scientific notation with optional leading sign and exponent; special tokens 'INF', '+INF', '-INF', and 'NaN' are permitted; '+0' and '-0' are distinct lexical forms. Mapping from lexical forms to values is specific to this datatype and its precision. Lexical form (regex-style): (\\+|-)?([0-9]+(\\.[0-9]*)?|\\.[0-9]+)([Ee](\\+|-)?[0-9]+)? |(\\+|-)?INF|NaN." .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot mentally process this regex for the lexical form. Can we provide an informative more human-friendly version, leaving the regex to be the normative version? (and same for similar datatypes?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex comes from the XSD 1.1 Dataset spec. I'm not sure if it's possible to capture all the nuances in an informal description.

rdfs:subClassOf xsd:dateTime ;
rdfs:isDefinedBy <https://www.w3.org/TR/xmlschema11-2/#dateTimeStamp> ;
rdfs:label "dateTimeStamp" ;
rdfs:comment "dateTime with a required explicit timezone offset. Lexical form: YYYY-MM-DDThh:mm:ss(.s+)?(Z|±hh:mm); the timezone offset is mandatory." .
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason behind the omission of examples here and following (and earlier, if I failed to notice some were missing)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consistency

@domel
Copy link
Contributor Author

domel commented Oct 3, 2025

OK. I (temporarily) removed facets. I think we are ready to merge. @TallTed, can you remove the blocking?

Copy link
Member

@TallTed TallTed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only way to remove the block is to Approve, but please note that I have not carefully reviewed the changes.

@domel
Copy link
Contributor Author

domel commented Oct 3, 2025

Understood, thanks for flagging. To unblock progress, I propose we approve this PR now with the explicit caveat that the review was not exhaustive. We can then follow up with a series of smaller, focused PRs that will be easier to review and track.

@afs
Copy link
Contributor

afs commented Oct 6, 2025

I think the only way to remove the block is to Approve

You should be able to re-request a review. This sets it to "awaiting" (the orange dot).

@afs afs removed the request for review from gkellogg October 6, 2025 08:24
@domel domel requested a review from TallTed October 6, 2025 11:44
@domel domel merged commit 15ac8d0 into main Oct 6, 2025
1 check passed
@pfps
Copy link
Contributor

pfps commented Oct 9, 2025

A small number of RDFS subclass axioms that reflect well-known XSD 1.1 derivation relationships, e.g.:
xsd:integer rdfs:subClassOf xsd:decimal

These are not RDFS axioms, i.e., they are not RDFS-entailed from the empty graph. I don't believe that there is any indication in the file nor in any WG document that they are axioms, but the claim above needs to be removed.

@niklasl
Copy link

niklasl commented Oct 9, 2025

A small number of RDFS subclass axioms that reflect well-known XSD 1.1 derivation relationships, e.g.:
xsd:integer rdfs:subClassOf xsd:decimal

These are not RDFS axioms, i.e., they are not RDFS-entailed from the empty graph. I don't believe that there is any indication in the file nor in any WG document that they are axioms, but the claim above needs to be removed.

@pfps That's true. By remove the claim, do you mean edit the description in this PR? Or is it enough that we agree here in the comment section?

@niklasl
Copy link

niklasl commented Oct 9, 2025

Also, I'm not sure how to go from recognizing these to "import" (owl:imports) this resource. Ideally an RDF serialization of this document would be served when negotiating on http://www.w3.org/2001/XMLSchema#, but that may be administratively unrealistic. For that reason, I would like for it (which I presume can end up under http://www.w3.org/ns/rdf-xsd) to be linked to from http://www.w3.org/2000/01/rdf-schema# using rdfs:seeAlso—so tools can at least "follow the 'scent' with their noses".

@pfps
Copy link
Contributor

pfps commented Oct 9, 2025

I think that the "axiom" bit should be scrubbed from the description, or at least modified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants