daisy
¶
dask + lazy = daisy
What is daisy
?¶
daisy
is an experiment to finally use lazy for something useful. daisy
is meant to be an alternative to dask.delayed()
for automatically creating
computation graphs from functions.
Example¶
Given the following setup:
from daisy import autodask, inline, register_get
from dask import delayed
from dask.threaded import get
from lazy import strict
import numpy as np
@inline
def f(a, b):
return a + b
def g(a, b):
return f(f(a, b), f(a, b))
autodask_g = autodask(g, inline=True)
delayed_g = delayed(g)
register_get(get)
arr = np.arange(1000000)
To start, let’s make sure these all do the same thing:
>>> (g(arr, arr) == delayed_g(arr, arr).compute()).all()
True
>>> (g(arr, arr) == autodask_g(arr, arr)).all()
True
Now we will run some not very scientific profiling runs:
In [1]: %timeit g(arr, arr)
100 loops, best of 3: 9.34 ms per loop
In [2]: %timeit delayed_g(arr, arr).compute()
100 loops, best of 3: 10.2 ms per loop
In [3]: %timeit strict(autodask_g(arr, arr))
100 loops, best of 3: 3.63 ms per loop
Why is this faster?¶
This is a very good case for autodask because we can dramatically reduce the
amount of work we are doing. In the normal function and dask.delayed
cases
we will fall f(a, b)
twice, and then add those together. In the autodask
case will will just directly execute a + b
once, and then add that to
itself. We have totally removed f
from the graph, and instead just use +
directly.
We have used a very large input here to see a speedup. One goal I have is to reduce the overhead to make this work for smaller inputs and smaller expressions. I would like to try this with real workloads to see if the amount of reduced work causes as dramatic of speedups.