-
Notifications
You must be signed in to change notification settings - Fork 2
Add canonical Turtle for RDF-recognized XSD datatypes (ns/rdf-xsd.ttl
)
#64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
At this stage, the repository contains I have two open questions for the group before moving further:
Feedback on whether this is in scope for |
I think this looks very good! I would also support adding the constraint properties themselves. I cannot argue for defining OWL restrictions on these as in-scope for the WG though (but if straightforward, I would not object to it as a courtesy). Could a maintenance group add this later on if asked for? |
Now the Turtle file has facets. |
Please confirm that I've inferred correctly, that I think it's worth including this — or the correct meaning of that |
@TallTed Yes, ⊑ is intended to denote As for the numeric tower itself, both |
XSD calls the relationship "derived" types. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not closely read the whole PR. Some of the suggestions here are more questions, and may impact multiple lines on which I've not specifically commented. Another review is likely, after these are acted upon.
ns/rdf-xsd.ttl
Outdated
rdfs:label "NCName" ; | ||
rdfs:comment "XML 'non-colonized' names matching the NCName production (i.e., Name without the colon). Derived from xsd:Name." . | ||
|
||
xsd:length a rdf:Property ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Constraining facets as RDF properties?
I would say these over complicate this file as are not needed.
- Properties are an RDF feature; isn't the URI
xsd:length
etc minting URIs in someone else's namespace? - RDF Concepts does not mention facets.
owl:onDatatype + owl:withRestrictions?
Too much OWL for this base RDF file.
This seems to be SHACL!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say these over complicate this file as are not needed.
Questionably "not needed". We cannot assume that future readers have any foundation that we have not built for them!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then you agree with my point! (NB in referring to "Constraining facets as RDF properties?")
Facets are not mentioned as part of RDF Datatypes. They are an XML Schema way to derive datatypes.
We cannot assume that future readers have any foundation that we have not built for them!
We build on other standards. "Future readers" are not coming with zero-context to this file. This file is exact details - the reader is mostly likely someone - or a machine - that wants exact details to add to their knowledge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the others in this thread that facets can indeed be useful. In practice, they are often employed together with OWL when defining constraints. At the moment this is something of a grey area, and including them would close that gap. And of course, I always prefer the “better more than less” approach, if someone does not want to use them, they can simply ignore them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot assume that future readers have any foundation that we have not built for them!
We build on other standards. "Future readers" are not coming with zero-context to this file.
Foundation building can certainly include pointers to existing standards upon which we build our standards — but we cannot assume that future readers have already found such existing standards, and so omit our pointers!
(I think we're in agreement, even if we're not quite communicating what we've agreed.)
Suggestion: split this into two PRs, for the datatypes definitions, so that can proceed; and for the facets.
The introduction of facet vocabulary only happened in this PR. Facets deserves an issue and WG discussion because it is a new feature. It also would presumably need tests, leading to an implementation report, and clarity about the relationship with OWL (i.e. text in RDF Schema). Do they form part of D-entailment?
I don't see it like that. It is the nature of these namespace documents that it is very hard to remove defined terms. Therefore, we should be sure that the terms are the right ones, with the right definitions. |
I’m in favor of splitting this into two PRs: one that contains only the XSD datatype definitions (so that work can proceed now), and a second, focused PR for facets. If @niklasl agrees, we can do exactly that. D-entailment. Facets would not be part of D-entailment. RDF 1.2’s D-entailment hinges on recognized datatypes; facets are an XML Schema mechanism for deriving datatypes and should remain outside RDF’s entailment regime unless the WG explicitly decides otherwise. Any facet vocabulary we add should therefore be clearly marked as informative with no normative semantic consequences for RDF entailment. SPARQL. Likewise, there is no effect on SPARQL semantics. SPARQL does not treat facets as built-in operators or properties; tools might reference such IRIs for documentation or code generation, but there would be no change to SPARQL evaluation or algebra. Why a (non-normative) facet vocabulary still helps. My working assumption, shared, I believe, with @niklasl in earlier exchanges, is that facets appear in predicate position in RDF graphs (e.g., within OWL 2 datatype restrictions) and are therefore reasonably modeled as |
I would have thought this file was normative. |
If IRIs of the form |
I would very much prefer this file to be normative. However, if we are unable to reach a clear shared position in the near term, a non-normative facet vocabulary is still a meaningful improvement over the status quo: it gives implementers and content authors a stable reference while we continue the discussion. Non-normative status need not be permanent; we can iterate with issues, tests, and implementations, and once consensus emerges, promote it to normative in a subsequent revision. In short, normative is ideal; if that’s not yet achievable, publishing a non-normative vocabulary is better than publishing nothing. |
rdfs:isDefinedBy <https://www.w3.org/TR/xmlschema11-2/#date> ; | ||
rdfs:label "date" ; | ||
rdfs:comment "Calendar dates as top-open day intervals. Lexical form: YYYY-MM-DD with optional timezone." . | ||
rdfs:comment "Calendar dates as top-open day intervals. Lexical form: YYYY-MM-DD with optional timezone. Timezone lexical forms: 'Z' or an offset of the form ±hh:mm with 00 ≤ mm ≤ 59 and |offset| ≤ 14:00 (e.g., '2025-09-27Z', '2025-09-27+02:00')." . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any default treatment of dates that omit the timezone?
Is there a reason behind omitting such a date in the examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand, in the old and new files, there is the same fragment rdfs:comment "Calendar dates as top-open day intervals. Lexical form: YYYY-MM-DD with optional timezone. Timezone lexical forms: 'Z' or an offset of the form ±hh:mm with 00 ≤ mm ≤ 59 and |offset| ≤ 14:00 (e.g., '2025-09-27Z', '2025-09-27+02:00')." .
xsd:time a rdfs:Datatype ; | ||
rdfs:isDefinedBy <https://www.w3.org/TR/xmlschema11-2/#time> ; | ||
rdfs:label "time" ; | ||
rdfs:comment "Times that recur each day or occur on some day. Lexical form: hh:mm:ss(.s+)? with optional timezone. Timezone lexical forms: 'Z' or ±hh:mm with |offset| ≤ 14:00 (e.g., '23:59:59Z', '08:15:30.5-05:00')." . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any default treatment of times that omit the timezone?
Is there a reason behind omitting such a time in the examples?
xsd:dateTime a rdfs:Datatype ; | ||
rdfs:isDefinedBy <https://www.w3.org/TR/xmlschema11-2/#dateTime> ; | ||
rdfs:label "dateTime" ; | ||
rdfs:comment "Instants of time, optionally with timezone. Lexical form: YYYY-MM-DDThh:mm:ss(.s+)? with optional timezone. Values with different explicit offsets can be equal as instants. Timezone lexical forms: 'Z' or ±hh:mm with |offset| ≤ 14:00 (e.g., '2025-09-27T14:03:00+02:00')." . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any default treatment of datetimes that omit the timezone?
Is there a reason behind omitting such a datetime in the examples?
xsd:double a rdfs:Datatype ; | ||
rdfs:isDefinedBy <https://www.w3.org/TR/xmlschema11-2/#double> ; | ||
rdfs:label "double" ; | ||
rdfs:comment "IEEE 754 64-bit floating-point. Lexical forms use decimal or scientific notation with optional leading sign and exponent; special tokens 'INF', '+INF', '-INF', and 'NaN' are permitted; '+0' and '-0' are distinct lexical forms. Mapping from lexical forms to values is specific to this datatype and its precision. Lexical form (regex-style): (\\+|-)?([0-9]+(\\.[0-9]*)?|\\.[0-9]+)([Ee](\\+|-)?[0-9]+)? |(\\+|-)?INF|NaN." . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot mentally process this regex for the lexical form. Can we provide an informative more human-friendly version, leaving the regex to be the normative version? (and same for similar datatypes?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex comes from the XSD 1.1 Dataset spec. I'm not sure if it's possible to capture all the nuances in an informal description.
rdfs:subClassOf xsd:dateTime ; | ||
rdfs:isDefinedBy <https://www.w3.org/TR/xmlschema11-2/#dateTimeStamp> ; | ||
rdfs:label "dateTimeStamp" ; | ||
rdfs:comment "dateTime with a required explicit timezone offset. Lexical form: YYYY-MM-DDThh:mm:ss(.s+)?(Z|±hh:mm); the timezone offset is mandatory." . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason behind the omission of examples here and following (and earlier, if I failed to notice some were missing)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consistency
OK. I (temporarily) removed facets. I think we are ready to merge. @TallTed, can you remove the blocking? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only way to remove the block is to Approve, but please note that I have not carefully reviewed the changes.
Understood, thanks for flagging. To unblock progress, I propose we approve this PR now with the explicit caveat that the review was not exhaustive. We can then follow up with a series of smaller, focused PRs that will be easier to review and track. |
You should be able to re-request a review. This sets it to "awaiting" (the orange dot). |
These are not RDFS axioms, i.e., they are not RDFS-entailed from the empty graph. I don't believe that there is any indication in the file nor in any WG document that they are axioms, but the claim above needs to be removed. |
@pfps That's true. By remove the claim, do you mean edit the description in this PR? Or is it enough that we agree here in the comment section? |
Also, I'm not sure how to go from recognizing these to "import" ( |
I think that the "axiom" bit should be scrubbed from the description, or at least modified. |
See #37
This PR adds ns/rdf-xsd.ttl, an informational, machine-readable Turtle file that declares the XML Schema datatypes which RDF gives special recognition (per RDF Concepts). The goal is to provide a canonical source for tools and datasets that today rely on ad-hoc “xsd vocabularies”.
A new file:
ns/rdf-xsd.ttl
with:owl:Ontology
header (title, date, short description).a rdfs:Datatype
rdfs:isDefinedBy <http://www.w3.org/2001/XMLSchema#>
rdfs:label
(human-readable name)A small number of RDFS subclass statements that reflect well-known XSD 1.1 derivation relationships, e.g.:
xsd:integer rdfs:subClassOf xsd:decimal
Datatypes covered:
string, boolean, decimal, integer, double, float, date, time, dateTime, dateTimeStamp, gYear, gMonth, gDay, gYearMonth, gMonthDay, duration, yearMonthDuration, dayTimeDuration, byte, short, int, long, unsignedByte, unsignedShort, unsignedInt, unsignedLong, positiveInteger, nonNegativeInteger, negativeInteger, nonPositiveInteger, hexBinary, base64Binary, anyURI, language, normalizedString, token, NMTOKEN, Name, NCName.
Facets covered:
length, minLength, maxLength, pattern, enumeration, whiteSpace, maxInclusive, maxExclusive, minExclusive, minInclusive, totalDigits, fractionDigits, Assertions, explicitTimezone.