- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.1k
De-duplicate vertex labels to save memory #5223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
568f899    to
    c5f4125      
    Compare
  
    | Codecov ReportPatch coverage:  
 Additional details and impacted files@@            Coverage Diff             @@
##             dev-2.x    #5223   +/-   ##
==========================================
  Coverage      65.64%   65.64%           
- Complexity     14712    14724   +12     
==========================================
  Files           1766     1769    +3     
  Lines          68580    68601   +21     
  Branches        7291     7290    -1     
==========================================
+ Hits           45017    45036   +19     
  Misses         21064    21064           
- Partials        2499     2501    +2     
 ☔ View full report in Codecov by Sentry. | 
| Very good improvement! Using the new label type sometimes requires more verbose code than when using plain strings. Would it make sense to add a couple of simple helpers? 
 | 
| 
 This one I find a good idea and will implement it. 
 I'm not so sure about that one since it would have to know about all label formats and be able to parse them. What use case do you have in mind? | 
| The use case is a very common (especially in tests) expression like: graph.getVertex(VertexLabel.string("shilshole_22nd")) It would then appear as : graph.getVertex("shilshole_22nd") So the overloaded getVertex version would assume that the label is of type VertexLabel.string. Maybe this is a bit vague idea, as one label type gets a special treatment. | 
| Ok, that makes sense. | 
7f4650e    to
    bac6bad      
    Compare
  
    
Summary
Each vertex in the OTP graph has a materialized label and the vast majority of vertices are of type
OsmVertexwhich have the duplicate label prefix of "osm:node:". This prefix is repeated millions of times in a large graph.In order to cut down on the duplication this PR implements the following:
getLabelabstract in order to simplify the inheritance chainMemory savings
The speed test suggests that this saves about 5-8% of memory. For example the Norwegian graph in the speed test went from 3.58 to 3.4 GB of memory.
Test removed
The test
GeometryProcessorTesthas been disabled for years and even if we could make it work, it is already covered by other tests. It would have needed quite a bit of refactoring to get it to compile again so I decided to delete it.