Skip to content

Potential false positives for equal_shape_distance_diff_coordinates #1258

@isabelle-dr

Description

@isabelle-dr

Problem
We hear from users that equal_shape_distance_diff_coordinates (which is currently an error) is often present in datasets that contain shapes, and the work needed to fix this issue in the datasets gives an incentive for users not to use shapes at all.

This rule was initially implemented in PR #1083, alongside two others:

  • decreasing_shape_distance: error, and
  • equal_shape_distance_same_coordinates: warning,

with the intention of validating the shapes.xt Reference:

Values must increase along with shape_pt_sequence; they must not be used to show reverse travel along a route.

What to do
Re-visit if the conditions that trigger equal_shape_distance_diff_coordinates should really be an error: talk to the community, and analyze production data.
Consider lowering the severity to a warning and opening a discussion in the specification to make it clearer.

Next Steps from Most Recent Comment

After a discussion with @qcdyx, the strategy to solve this issue is:

we are assuming that a portion of these notices come from a precision issue of the software creating shape files: there are two very close shape points that have distinct lat/lon values, but the shape_dist_traveled field is the same.
pull the actualDistanceBetweenShapePoints field from all datasets from the Mobility Database that trigger equal_shape_distance_diff_coordinates.
plot it on a histogram with frequency on the y and latitude and longitude diff value on the x. Then, assess based on what we see:
Spreadsheet values (example from @cka-y's past analytics work in https://github.com/MobilityData/mobility-database-catalogs/pull/275/files)

  • ID of each feed
  • URL of each feed
  • csvRowNumber
  • shapeDistTraveled
  • shapePtLat
  • shapePtLon
  • prevCsvRowNumber
  • prevShapeDistTraveled
  • prevshapePtLat
  • prevshapeptLon
  • actualDistanceBetweenShapePoints

Once this spreadsheet is created, we can see if there's a common threshold for actualDistanceBetweenShapePoints (how far apart are they typically for feeds generating this error?)

do we have a clear threshold that has the majority of the values below it?
if so, would it be reasonable to consider values before the threshold as equal_shape_distance_same_coordinates (which is a warning)
if so: does this need a spec amendment?

Additional Context
These three rules were initially created to replace the decreasing_or_equal_shape_distance notice because this rule was triggered by two things that deserved to be treated differently:

  1. shape_dist_traveled decreases between two consecutive shape points (which is a clear violation of the spec)
  2. shape_dist_traveled is equal between two consecutive shape points (also a violation but is not as big of a problem)

By digging deeper into number 2 above, we noticed that we were seeing two cases in production data:
2.1 shape_dist_traveled is equal between two consecutive shape points and the lat/long coordinates are equal (which seems fine)
2.2 shape_dist_traveled is equal between two consecutive shape points and the lat/long coordinates are not equal (which seems like a problem, but it could be caused by the scheduling software that rounds shape_dist_traveled when the two shape points are really close)

We went ahead and made our own interpretation of the specification based on what we saw in the production data: condition 2.1 would be a warning, whereas conditions 1 & 2.2 would be errors, which is slightly less strict than the spec that strictly mentions "must increase".

Metadata

Metadata

Assignees

Labels

GTFS ReferenceUsed for Adding or changing rules that belong in the GTFS referencebugSomething isn't working (crash, a rule has a problem)enhancementNew feature request or improvement on an existing feature

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions