-
-
Notifications
You must be signed in to change notification settings - Fork 116
Open
0 / 50 of 5 issues completed
Copy link
Description
Update pipelines steps for the binary to source analysis for Java using strings and symbols.
The current implementation matches .java and .class files using path, classpath, java packages and compiler conventions. There are cases when we will not have a correct match with these techniques. For instance, the .class code may not be compiled from Java, but could have been generated directly as bytecode with ASM library or similar bytecode engineering, as this is common with Hibernate and other data framework or SOAP or web services that generate code from @ annotations or XML documents.
To recap:
- Hibernate or JPA code generated from XML or annotations
- XML Schema, SOAP and web services code generated from XML, like with JAXB. See https://stackoverflow.com/questions/11463231/how-to-generate-jaxb-classes-from-xsd
- Other non-Java code, like Kotlin, Scala, Groovy, or AspectJ.
Here the approach would be to:
- Collect source symbols with the "purl2sym" collect_symbols* pipelines or custom processing for XML
- Collect symbols from the binaries, either using lief or using binary strings as collectable in the scancode-toolkit (we are missing a plugin)
- Match the source to binary symbols, sort by the most matches and report correct matches to create a relation between a source and a binary
Sub-issues
Metadata
Metadata
Assignees
Labels
No labels
Type
Projects
Status
Review ready