API Reference¶
Main¶
-
daisy.
autodask
(func, *, inline)¶ Mark that a function should lazily build up a call graph to be executed by dask.
Parameters: Returns: transformed –
func
with the transformations needed to build the call graph.Return type: Notes
autodask
transforms a function to build up a call graph which can be executed by dask. This is very similar todask.delayed()
which provides an imperitive API to dask.Functional purity
autodask
may only be applied to functions which a pure functions of their inputs. This means that a function must always be safe to memoize.There is no guarantee about the execution order of
autodask
defered code. Repeated calls to a function with the same arguments may only be computed a single time.Note
Things to be on the look out for when checking if a function is pure:
- IO
- Mutating structures
- Reading or writing to shared state (please stop this)
- Randomness
IO may be okay if you are alright with only executing the call once and in an undefined order. You may force the partial order of execution by explicitly passing the results of one IO call into the other calls that must follow it.
Building up our task graph
Unlike
dask.delayed()
,autodask
is lazy by default. This means thatf(a, b)
will automatically turn into a dask task graph like:{'name': (f, a, b)}
Note
f
,a
and,b
may also be deferred computations themselves.Dask will perform best if we can encode more information into the task graph before feeding it to dask. To do this, we can pass
inline=True
to autodask before decorating. If a function is inlineable then instead of defering the computation, we will enter the code and add the body of that function to the dask graph. For example, imagine we have definedf
like:@autodask(inline=True) def f(a, b): return a + b + 1
When calling this function we know that it is safe to replace the task
(f, a, b)
with the task graph:{'name_1': (add, a, b), 'result': (add, 'name_1', 1)}
This will give dask more information to optimize the expression. We can also use this to collapse shared work. For example, imagine we have
@autodask(inline=True) def g(a, b): return f(a, b) + f(a, b)
Because
f
is inlineable, we will enter the code and see what it adds to the graph. Because we are doing the same work twice, we can reduce it to a more simple task graph that will look more like:{'name_1': (add, a, b), 'f_result': (add, 'name_1', 1), 'result': (add, 'f_result', 'f_result')}
This shows that we will not duplicate the work needed to add compute
f(a, b)
twice.When it is unsafe to pass inline=True
There is no default for
inline
because it is a very important decision! On the one hand, we almost always want to passinline=True
; however, there are cases when inlining is not possible, and attempting to do so will give much worse performance.Functions cannot be inlined into the graph if they are strict on their inputs. This means that to return a final defered computation they must scrutinize at least one of the inputs and normalize it to a concrete value.
There are many operations which will force computation, here are some common cases:
Branching on the input
def f(x): if p(x): return x + 1 else: return x - 1
Iterating over the input with a for loop
def f(xs): total = 0 for x in xs: total += 0 return total
Explicitly strictly evaluating an input
def f(x): return strict(x)
Differences with dask.delayed
Lazy by default vs eager by default
While both
autodask
anddask.delayed()
serve the same purpose, they go about it in different ways.dask.delayed()
is strict by default. This means that by default, most functions will be entered immediatly instead of creating a task. This can be bad if the function does not know how to work with thedask.delayed.Delayed
object or is strict on an input. Here is an example of a function in thedask.delayed()
API:@dask.delayed def f(a, b): # lazy call: this will create a node like ``(f, a, b)`` in the # resulting task graph c = delayed(g)(a, b) # strict call: this will enter the code ``h`` immediatly and add the # body to the graph. This may not be safe! return h(c, b)
autodask
takes a different approach and is lazy by default. This means that by default function calls just create a new task for the graph and are not executed eagerly. Here is the same function in theautodask
API:@autodask def f(a, b): # lazy call: this will create a node like ``(f, a, b)`` in the # resulting task graph **unless ``g`` is an inline function**! c = g(a, b) # strict call: this will enter the code ``h`` immediatly and add the # body to the graph. This may not be safe! return inline(h)(c, b)
One advantage of the
autodask
approach is that that the potentially unsafe operation is called out explicitly, while we choose a more conservative graph construction strategy by default. We also allow functions to opt-in to inlining if they know it is safe to do so.Magic
autodask
uses much darker magic thandask.delayed()
. This is nice because it allows us to do things like translate:@autodask(inline=True) def f(a, b): return a is b
into a dask graph like:
{'result': (operator.is_, a, b)}
We can also defer things like comprehensions and even literal construction.
Warning
The magic required for
autodask
may be too much for people. It will not be easy to debug!dask.delayed()
is a much more reasonable solution for most cases. You have been warned.See also
daisy.inline()
,lazy.strict()
,dask.delayed()
-
class
daisy.
inline
(func)¶ A box that denotes that a function should be inlined in autodask.
Parameters: func (callable) – The function to wrap. Notes
inline
can allow non-autodask
functions to be inlined into the task graph. This is nice if you know that a function is a pure computation of its inputs and does not need to scrutinize an input to return a final computation.Functions cannot be inlined into the graph if they are strict on their inputs. This means that to return a final defered computation they must scrutinize at least one of the inputs and normalize it to a concrete value.
There are many operations which will force computation, here are some common cases:
Branching on the input
def f(x): if p(x): return x + 1 else: return x - 1
Iterating over the input with a for loop
def f(xs): total = 0 for x in xs: total += 0 return total
Explicitly strictly evaluating an input
def f(x): return strict(x)
See also
-
daisy.
register_get
(get)¶ Register the
get
function which will be used to evaluateautodaskthunk
generated dask graphs.By default,
dask.get()
will be used.Parameters: get (callable[dict, str, any]) – The get function. Returns: get – The get
function unchanged.Return type: callable[dict, str, any]
Miscellaneous¶
-
class
daisy.
autodaskthunk
¶ A thunk which is evaluated with dask.
Parameters: - func (callable) – The code for the closure.
- *args –
The free variables.
- **kwargs –
The free variables.
-
daisy.
ltree_to_dask
(node)¶ Convert an
lazy.tree.LTree
into a dask task graph.Parameters: node (LTree) – The node to convert into a dask graph. Returns: dask – The equivalent dask task graph. Return type: dict[str, any] Notes
This function does common subexpression folding to produce a minimal graph.