Skip to content

[feature request / refactor] Produce structured data containing human-readable code descriptions (refactor RULES.md) #1324

@lauriemerrell

Description

@lauriemerrell

Describe the problem

Cal-ITP produces https://reports.calitp.org/, where we report on various aspects of GTFS data quality. One of the things we currently display on the site is a grid of validator notices output for a given feed in a given month. We like to display a human-readable notice description so that the notice can be understood by agencies and the general public, who may not be familiar with validator code names.

Currently, to update those human readable descriptions, we have to manually scrape the data from RULES.md for each validator version and turn it into a CSV that we can import through our pipeline.

To make the CSV, I:

  • Regex'd the .md file to extract the code with its simple description
  • Manually annotated with rule severity, because the current format doesn't actually contain a table with code, description, severity in one place (the severity is just indicated in the title of the table, which makes it harder to scrape)
  • Manually removed Markdown and HTML (RULES.md uses an inconsistent mixture of both)

This also opens up issues like #1322 because RULES.md is maintained separately as a text file and not related to the actual validator code.

It would be nice if the human readable description about rule implementation were available as structured data (CSV or JSON) and could be output by the validator itself, rather than requiring reference to the RULES.md file (analogous to the new notice_schema.json file that can be output by the JAR).

Proposed solution

Rule descriptions could be attributes within the rule implementation itself, and then RULES.md could be programmatically generated based on those attributes, rather than RULES.md being the source of truth but maintained separately.

Alternatives you've considered

No response

Additional context

It would be really nice to have something like code, severity, short_desc, detailed_desc, formatted_desc where formatted could contain Markdown (RULES.md has a shorter rule description in the tables at the top and then a slightly longer description below.)

Metadata

Metadata

Assignees

Labels

enhancementNew feature request or improvement on an existing featurestatus: Work in progressA PR that would close this issue has been opened.

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions