API Reference¶

Main¶

daisy.autodask(func, *, inline)¶

Mark that a function should lazily build up a call graph to be executed by dask.

Parameters:	func (callable) – The function to transform. inline (bool) – Should the function be inlined into other `autodask` functions? This should normally be True unless the function is strict on the argument.
Returns:	transformed – `func` with the transformations needed to build the call graph.
Return type:	callable

Notes

autodask transforms a function to build up a call graph which can be executed by dask. This is very similar to dask.delayed() which provides an imperitive API to dask.

Functional purity

autodask may only be applied to functions which a pure functions of their inputs. This means that a function must always be safe to memoize.

There is no guarantee about the execution order of autodask defered code. Repeated calls to a function with the same arguments may only be computed a single time.

Note

Things to be on the look out for when checking if a function is pure:

IO
Mutating structures
Reading or writing to shared state (please stop this)
Randomness

IO may be okay if you are alright with only executing the call once and in an undefined order. You may force the partial order of execution by explicitly passing the results of one IO call into the other calls that must follow it.

Building up our task graph

Unlike dask.delayed(), autodask is lazy by default. This means that f(a, b) will automatically turn into a dask task graph like:

{'name': (f, a, b)}

Note

f, a and, b may also be deferred computations themselves.

Dask will perform best if we can encode more information into the task graph before feeding it to dask. To do this, we can pass inline=True to autodask before decorating. If a function is inlineable then instead of defering the computation, we will enter the code and add the body of that function to the dask graph. For example, imagine we have defined f like:

@autodask(inline=True)
def f(a, b):
    return a + b + 1

When calling this function we know that it is safe to replace the task (f, a, b) with the task graph:

{'name_1': (add, a, b),
 'result': (add, 'name_1', 1)}

This will give dask more information to optimize the expression. We can also use this to collapse shared work. For example, imagine we have

@autodask(inline=True)
def g(a, b):
    return f(a, b) + f(a, b)

Because f is inlineable, we will enter the code and see what it adds to the graph. Because we are doing the same work twice, we can reduce it to a more simple task graph that will look more like:

{'name_1': (add, a, b),
 'f_result': (add, 'name_1', 1),
 'result': (add, 'f_result', 'f_result')}

This shows that we will not duplicate the work needed to add compute f(a, b) twice.

When it is unsafe to pass inline=True

There is no default for inline because it is a very important decision! On the one hand, we almost always want to pass inline=True; however, there are cases when inlining is not possible, and attempting to do so will give much worse performance.

Functions cannot be inlined into the graph if they are strict on their inputs. This means that to return a final defered computation they must scrutinize at least one of the inputs and normalize it to a concrete value.

There are many operations which will force computation, here are some common cases:

Branching on the input

def f(x):
    if p(x):
        return x + 1
    else:
        return x - 1

Iterating over the input with a for loop

def f(xs):
   total = 0
   for x in xs:
       total += 0
   return total

Explicitly strictly evaluating an input

def f(x):
    return strict(x)

Differences with dask.delayed

Lazy by default vs eager by default

While both autodask and dask.delayed() serve the same purpose, they go about it in different ways. dask.delayed() is strict by default. This means that by default, most functions will be entered immediatly instead of creating a task. This can be bad if the function does not know how to work with the dask.delayed.Delayed object or is strict on an input. Here is an example of a function in the dask.delayed() API:

@dask.delayed
def f(a, b):
    # lazy call: this will create a node like ``(f, a, b)`` in the
    # resulting task graph
    c = delayed(g)(a, b)

    # strict call: this will enter the code ``h`` immediatly and add the
    # body to the graph. This may not be safe!
    return h(c, b)

autodask takes a different approach and is lazy by default. This means that by default function calls just create a new task for the graph and are not executed eagerly. Here is the same function in the autodask API:

@autodask
def f(a, b):
    # lazy call: this will create a node like ``(f, a, b)`` in the
    # resulting task graph **unless ``g`` is an inline function**!
    c = g(a, b)

    # strict call: this will enter the code ``h`` immediatly and add the
    # body to the graph. This may not be safe!
    return inline(h)(c, b)

One advantage of the autodask approach is that that the potentially unsafe operation is called out explicitly, while we choose a more conservative graph construction strategy by default. We also allow functions to opt-in to inlining if they know it is safe to do so.

Magic

autodask uses much darker magic than dask.delayed(). This is nice because it allows us to do things like translate:

@autodask(inline=True)
def f(a, b):
    return a is b

into a dask graph like:

{'result': (operator.is_, a, b)}

We can also defer things like comprehensions and even literal construction.

Warning

The magic required for autodask may be too much for people. It will not be easy to debug! dask.delayed() is a much more reasonable solution for most cases. You have been warned.

Miscellaneous¶

class daisy.autodaskthunk¶

A thunk which is evaluated with dask.

Parameters:	func (callable) – The code for the closure. args – The free variables. *kwargs – The free variables.

daisy.ltree_to_dask(node)¶

Convert an lazy.tree.LTree into a dask task graph.

Parameters:	node (LTree) – The node to convert into a dask graph.
Returns:	dask – The equivalent dask task graph.
Return type:	dict[str, any]

Notes

This function does common subexpression folding to produce a minimal graph.

Parameters:	get (callable[dict, str, any]) – The get function.
Returns:	get – The `get` function unchanged.
Return type:	callable[dict, str, any]