-
Notifications
You must be signed in to change notification settings - Fork 739
Add initial task graph (push model) #3463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Ben Sherman <[email protected]>
|
Nice, this is a nice start but as mentioned already I think we should go beyond the tracking via file path hashes. Also, I think it would be desirable to keep this graph independent by current process DAG. The first it's usually to have graph resolved ahead of the execution. The task graph to determine the execution provenance. Regarding point 2, there could be two choice: a) each task reports the upstream tasks in the TraceRecord |
|
A possible JSON meta file could like this |
e2b4a93 to
f32ea0b
Compare
Signed-off-by: Ben Sherman <[email protected]>
Signed-off-by: Ben Sherman <[email protected]>
|
Okay, I added the input/output files to the trace record with some basic metadata. I didn't need to do anything with the |
|
Will this feature be compatible with the -resume option? For example, if Now lets say we updated the the filter in step 3) but do not want to have to start from step 1) or step 2) because nothing with those tasks have changed will it be able to pick up right at step 3)? |
|
The task graph should work just fine with resume. It simply receives tasks from the task processor and it doesn't care whether they are new or cached. |
cefb067 to
e523afd
Compare
0d59b4c to
b93634e
Compare
|
Closing this PR because it uses a push model (Nextflow pushes task metadata to Tower during execution) whereas we want to use a pull model (Tower pulls task metadata from work directory after pipeline execution). I will create a new PR for the new approach. |
First pass at producing a task graph. Whenever a task is submitted (or discovered from cache), it is added as a node to the task graph with its hash, name, and list of predecessors. Each task has a list of input files, and the file paths point to their originating task.
Currently, you can produce the task graph by setting
dag.type = 'task'in the config and using the Mermaid renderer:Some lingering questions:
Since operators don't have working directories or hashes, I think it would be better to say "you should only create/edit files within processes if you want to have a complete provenance graph".
TraceObserverevents (so that plugins can access it), and also in the.command.tracefile (see alsoTraceRecord) produced by each task, which would become a JSON file to better handle things like input/output files.