Skip to content

back2source: Refine ScanCode.io d2d pipeline for Java #1404

@pombredanne

Description

@pombredanne

Update pipelines steps for the binary to source analysis for Java using strings and symbols.

The current implementation matches .java and .class files using path, classpath, java packages and compiler conventions. There are cases when we will not have a correct match with these techniques. For instance, the .class code may not be compiled from Java, but could have been generated directly as bytecode with ASM library or similar bytecode engineering, as this is common with Hibernate and other data framework or SOAP or web services that generate code from @ annotations or XML documents.
To recap:

Here the approach would be to:

  • Collect source symbols with the "purl2sym" collect_symbols* pipelines or custom processing for XML
  • Collect symbols from the binaries, either using lief or using binary strings as collectable in the scancode-toolkit (we are missing a plugin)
  • Match the source to binary symbols, sort by the most matches and report correct matches to create a relation between a source and a binary

Sub-issues

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Review ready

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions