Skip to content

A networkx backend that uses joblib to run graph algorithms in parallel.

License

Notifications You must be signed in to change notification settings

networkx/nx-parallel

Repository files navigation

nx-parallel

nx-parallel is a NetworkX backend that uses joblib for parallelization. This project aims to provide parallelized implementations of various NetworkX functions to improve performance. Refer NetworkX backends documentation to learn more about the backend architecture in NetworkX.

Algorithms in nx-parallel

Script used to generate the above list
import _nx_parallel as nxp
d = nxp.get_funcs_info() # temporarily add `from .update_get_info import *` to _nx_parallel/__init__.py
for func in d:
    print(f"- [{func}]({d[func]['url']})")

Installation

You can install the stable version of nx-parallel using pip or conda:

pip install nx-parallel

conda install nx-parallel

For more, see INSTALL.md.

Usage

You can run your networkx code with nx-parallel backend by:

export NETWORKX_AUTOMATIC_BACKENDS="parallel" && python nx_code.py

Note that for all functions inside nx_code.py that do not have an nx-parallel implementation, their original networkx implementation will be executed. You can also use the nx-parallel backend in your code for only some specific function calls in the following ways:

import networkx as nx
import nx_parallel as nxp

G = nx.path_graph(4)
H = nxp.ParallelGraph(G)

# method 1 : passing ParallelGraph object in networkx function (Type-based dispatching)
nx.betweenness_centrality(H)

# method 2 : using the 'backend' kwarg
nx.betweenness_centrality(G, backend="parallel")

# method 3 : using nx-parallel implementation with networkx object
nxp.betweenness_centrality(G)

# method 4 : using nx-parallel implementation with ParallelGraph object
nxp.betweenness_centrality(H)

You can also measure the performance gains of parallel algorithms by comparing them to their sequential counter parts. Following is a simple benchmarking setup:

import networkx as nx
import nx_parallel as nxp
from timeit import timeit

G = nx.erdos_renyi_graph(1000, 0.01)
H = nxp.ParallelGraph(G)

sequential = timeit(lambda: nx.betweenness_centrality(G), number=1)
print(f"Sequential: {sequential:.2f}s")

# by default, nx-parallel uses all available cores
parallel = timeit(lambda: nx.betweenness_centrality(H), number=1)
print(f"Parallel:   {parallel:.2f}s")

Output:

Sequential: 1.29s
Parallel:   0.62s

Setting Configurations

You can modify the default NetworkX configurations to control how parallel execution behaves.

Example :

import networkx as nx
import nx_parallel as nxp

G = nx.path_graph(4)

with nx.config.backends.parallel(n_jobs=2, verbose=10):
    nx.betweenness_centrality(G, backend="parallel")

For more on how to play with configurations in nx-parallel, see Config.md. Additionally, refer to the NetworkX's official backend and config docs for more.

You can also enable logging to observe which backend is used and how tasks are scheduled. Enable and configure logging in the following way:

import logging

nxl = logging.getLogger("networkx")
nxl.addHandler(logging.StreamHandler())
nxl.setLevel(logging.DEBUG)

With logging enabled, the example output is as follows:

Converting input graphs from 'networkx' backend to 'parallel' backend for call to 'betweenness_centrality'
Using backend 'parallel' for call to 'betweenness_centrality' with arguments: (G=<nx_parallel.interface.ParallelGraph object at 0x1027cc5f0>, k=None, normalized=True, weight=None, endpoints=False, seed=<random.Random object at 0x1588a9e20>)
[Parallel(n_jobs=2)]: Using backend LokyBackend with 2 concurrent workers.
[Parallel(n_jobs=2)]: Batch computation too fast (0.16860580444335938s.) Setting batch_size=2.
[Parallel(n_jobs=2)]: Done   2 out of   2 | elapsed:    0.2s finished

Refer to Introspection and Logging section in NetworkX's backend documentation for more.

Notes

  1. Some functions in networkx have the same name but different implementations, so to avoid these name conflicts at the time of dispatching networkx differentiates them by specifying the name parameter in the _dispatchable decorator of such algorithms. So, method 3 and method 4 are not recommended. But, you can use them if you know the correct name. For example:

    # using `name` parameter - nx-parallel as an independent package
    
    # run the parallel implementation in `connectivity/connectivity`
    nxp.all_pairs_node_connectivity(H)
    
    # runs the parallel implementation in `approximation/connectivity`
    nxp.approximate_all_pairs_node_connectivity(H)

    Also, if you are using nx-parallel as a backend then mentioning the subpackage to which the algorithm belongs is recommended to ensure that networkx dispatches to the correct implementation. For example:

    # with subpackage - nx-parallel as a backend
    nx.all_pairs_node_connectivity(H)
    nx.approximation.all_pairs_node_connectivity(H)
  2. Right now there isn't much difference between nx.Graph and nxp.ParallelGraph so method 3 would work fine but it is not recommended because in the future that might not be the case.

Feel free to contribute to nx-parallel. You can find the contributing guidelines here. If you'd like to implement a feature or fix a bug, we'd be happy to review a pull request. Please make sure to explain the changes you made in the pull request description. And feel free to open issues for any problems you face, or for new features you'd like to see implemented.

This project is managed under the NetworkX organisation, so the code of conduct of NetworkX applies here as well.

All code in this repository is available under the Berkeley Software Distribution (BSD) 3-Clause License (see LICENSE).

Thank you :)

About

A networkx backend that uses joblib to run graph algorithms in parallel.

Resources

License

Contributing

Stars

Watchers

Forks

Sponsor this project

  •  

Contributors 10

Languages