Daft Functions Roadmap #4824
Replies: 2 comments 11 replies
-
Hi @kevinzwang, can it be understood that there will be no concept of Expression in the future, and all will be uniformly called Function, but distinguished between system built-in functions (referred to as BIF) and User-defined functions (referred to as UDF)? Then what's the difference between BIF and UDF? What I can think of is:
@daft.func
def my_udf(x: int) -> str: # return dtype is inferred from type hint
return f"{input}"
df.with_column("y", my_udf(col("x")))
Take the built-in df = daft.from_pydict({
"json": [
'{"a": 1, "b": 2}',
'{"a": 3, "b": 4}',
],
})
df = df.with_column("a", df["json"].json.query(".a")) If we abandon the concept of Expression and switch to BIF, will it evolve into the following usage style? (It is assumed that df = df.with_column("a", json_query(df["json"], ".a")) This example is mainly to show that for users, |
Beta Was this translation helpful? Give feedback.
-
Do we think calling like normal functions would be confusing for customers because the DataFrame API supports passing ColumnInputType in many places? df.select("a").with_column("b", do_work("a")) # !!! ERROR !!! do_work will be evaluated on string literal Here we have "a" as a column reference in one context, but a string literal in another context. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Recently, we created a new
daft.functions
module which we hope to expand upon. The work here will serve two purposes:col("a").str.capitalize()
) which users have found confusingFor each relevant method on
daft.Expression
or its namespaces, we will do the following:Expression
. Keep the original but add a deprecation warning once the move is complete. We'll remove it in v0.6daft.functions
with the same nameexplode
is only applicable to expressions)if_else
, there may not need to be an expression method variant.Expr
enum or as aFunctionExpr
but should really be implemented as aScalarFunction
Tasks
str
namespacedt
namespaceembedding
namespacefloat
namespaceurl
namespacelist
namespacestruct
namespacemap
namespaceimage
namespacepartitioning
namespacejson
namespacebinary
namespaceBeta Was this translation helpful? Give feedback.
All reactions