`daisy`

¶

`dask + lazy = daisy`

## What is `daisy`

?¶

`daisy`

is an experiment to finally use lazy for something useful. `daisy`

is meant to be an alternative to `dask.delayed()`

for automatically creating
computation graphs from functions.

## Example¶

Given the following setup:

```
from daisy import autodask, inline, register_get
from dask import delayed
from dask.threaded import get
from lazy import strict
import numpy as np
@inline
def f(a, b):
return a + b
def g(a, b):
return f(f(a, b), f(a, b))
autodask_g = autodask(g, inline=True)
delayed_g = delayed(g)
register_get(get)
arr = np.arange(1000000)
```

To start, let’s make sure these all do the same thing:

```
>>> (g(arr, arr) == delayed_g(arr, arr).compute()).all()
True
>>> (g(arr, arr) == autodask_g(arr, arr)).all()
True
```

Now we will run some not very scientific profiling runs:

```
In [1]: %timeit g(arr, arr)
100 loops, best of 3: 9.34 ms per loop
In [2]: %timeit delayed_g(arr, arr).compute()
100 loops, best of 3: 10.2 ms per loop
In [3]: %timeit strict(autodask_g(arr, arr))
100 loops, best of 3: 3.63 ms per loop
```

### Why is this faster?¶

This is a very good case for autodask because we can dramatically reduce the
amount of work we are doing. In the normal function and `dask.delayed`

cases
we will fall `f(a, b)`

twice, and then add those together. In the `autodask`

case will will just directly execute `a + b`

once, and then add that to
itself. We have totally removed `f`

from the graph, and instead just use `+`

directly.

We have used a very large input here to see a speedup. One goal I have is to reduce the overhead to make this work for smaller inputs and smaller expressions. I would like to try this with real workloads to see if the amount of reduced work causes as dramatic of speedups.