Skip to content

Conversation

@leonardehrenfried
Copy link
Member

@leonardehrenfried leonardehrenfried commented Jul 6, 2023

Summary

Each vertex in the OTP graph has a materialized label and the vast majority of vertices are of type OsmVertex which have the duplicate label prefix of "osm:node:". This prefix is repeated millions of times in a large graph.

In order to cut down on the duplication this PR implements the following:

  • convert label from a string to an interface so there can be multiple implementations
  • use just the node id as the vertex label for OSM vertices
  • make getLabel abstract in order to simplify the inheritance chain

Memory savings

The speed test suggests that this saves about 5-8% of memory. For example the Norwegian graph in the speed test went from 3.58 to 3.4 GB of memory.

Test removed

The test GeometryProcessorTest has been disabled for years and even if we could make it work, it is already covered by other tests. It would have needed quite a bit of refactoring to get it to compile again so I decided to delete it.

@leonardehrenfried leonardehrenfried added !Optimization The feature is to improve performance. !Technical Debt Improve code quality, no functional changes. +Bump Serialization Id Add this label if you want the serialization id automatically bumped after merging the PR labels Jul 6, 2023
@leonardehrenfried leonardehrenfried requested a review from a team as a code owner July 6, 2023 12:05
@codecov
Copy link

codecov bot commented Jul 6, 2023

Codecov Report

Patch coverage: 74.10% and no project coverage change.

Comparison is base (74025a3) 65.64% compared to head (1452a51) 65.64%.

Additional details and impacted files
@@            Coverage Diff             @@
##             dev-2.x    #5223   +/-   ##
==========================================
  Coverage      65.64%   65.64%           
- Complexity     14712    14724   +12     
==========================================
  Files           1766     1769    +3     
  Lines          68580    68601   +21     
  Branches        7291     7290    -1     
==========================================
+ Hits           45017    45036   +19     
  Misses         21064    21064           
- Partials        2499     2501    +2     
Impacted Files Coverage Δ
...opentripplanner/ext/geocoder/GeocoderResource.java 0.00% <0.00%> (ø)
.../org/opentripplanner/ext/geocoder/LuceneIndex.java 68.57% <0.00%> (ø)
...builder/module/AddTransitModelEntitiesToGraph.java 57.22% <0.00%> (-0.34%) ⬇️
...raph_builder/module/islandpruning/GraphIsland.java 3.84% <0.00%> (-0.16%) ⬇️
...builder/module/islandpruning/PrunedStopIsland.java 0.00% <0.00%> (ø)
...street/model/vertex/OsmBoardingLocationVertex.java 100.00% <ø> (ø)
...street/model/vertex/TransitBoardingAreaVertex.java 0.00% <0.00%> (ø)
.../street/model/vertex/TransitPathwayNodeVertex.java 0.00% <0.00%> (ø)
...rg/opentripplanner/visualizer/GraphVisualizer.java 0.00% <0.00%> (ø)
...aph_builder/module/islandpruning/PruneIslands.java 77.54% <50.00%> (ø)
... and 24 more

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@vesameskanen
Copy link
Contributor

Very good improvement!

Using the new label type sometimes requires more verbose code than when using plain strings. Would it make sense to add a couple of simple helpers?

  • In Vertex base class, a new method getLabelString()
  • in Graph class, another vertex getter: getVertex(string label)

@vesameskanen vesameskanen self-requested a review July 9, 2023 07:30
@leonardehrenfried
Copy link
Member Author

* In Vertex base class, a new method getLabelString()

This one I find a good idea and will implement it.

* in Graph class, another vertex getter: getVertex(string label)

I'm not so sure about that one since it would have to know about all label formats and be able to parse them. What use case do you have in mind?

@vesameskanen
Copy link
Contributor

The use case is a very common (especially in tests) expression like:

graph.getVertex(VertexLabel.string("shilshole_22nd"))

It would then appear as :

graph.getVertex("shilshole_22nd")

So the overloaded getVertex version would assume that the label is of type VertexLabel.string. Maybe this is a bit vague idea, as one label type gets a special treatment.

@leonardehrenfried
Copy link
Member Author

Ok, that makes sense.

vesameskanen
vesameskanen previously approved these changes Jul 10, 2023
@leonardehrenfried leonardehrenfried changed the title Deduplicate vertex label values to save memory Deduplicate vertex labels to save memory Jul 11, 2023
@leonardehrenfried leonardehrenfried merged commit 19f8a2c into dev-2.x Jul 12, 2023
@leonardehrenfried leonardehrenfried deleted the vertex-labels branch July 12, 2023 19:44
t2gran pushed a commit that referenced this pull request Jul 12, 2023
t2gran pushed a commit that referenced this pull request Jul 12, 2023
@t2gran t2gran added this to the 2.4 (next release) milestone Jul 18, 2023
@leonardehrenfried leonardehrenfried changed the title Deduplicate vertex labels to save memory De-duplicate vertex labels to save memory Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

+Bump Serialization Id Add this label if you want the serialization id automatically bumped after merging the PR !Optimization The feature is to improve performance. !Technical Debt Improve code quality, no functional changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants