feat: support count(1) in dataframe #4977

huleilei · 2025-08-14T08:52:00Z

Changes Made

Not count(1) in dataframe is not supported:

In [49]: df.count(1).show()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[49], line 1
----> 1 df.count(1).show()

    [... skipping hidden 1 frame]

File /data00/code/tmp2/Daft/daft/dataframe/dataframe.py:3011, in DataFrame.count(self, *cols)
   3008     raise ValueError("Cannot call count() with both * and column names")
   3010 # Otherwise, perform a column-wise count on the specified columns
-> 3011 return self._apply_agg_fn(Expression.count, cols)

File /data00/code/tmp2/Daft/daft/dataframe/dataframe.py:2854, in DataFrame._apply_agg_fn(self, fn, cols, group_by)
   2852     groupby_name_set = set() if group_by is None else group_by.to_name_set()
   2853     cols = tuple(c for c in self.column_names if c not in groupby_name_set)
-> 2854 exprs = self._wildcard_inputs_to_expressions(cols)
   2855 return self._agg([fn(c) for c in exprs], group_by)

File /data00/code/tmp2/Daft/daft/dataframe/dataframe.py:1596, in DataFrame._wildcard_inputs_to_expressions(self, columns)
   1594 """Handles wildcard argument column inputs."""
   1595 column_input: Iterable[ColumnInputType] = columns[0] if len(columns) == 1 else columns  # type: ignore
-> 1596 return column_inputs_to_expressions(column_input)

File /data00/code/tmp2/Daft/daft/utils.py:126, in column_inputs_to_expressions(columns)
    123 from daft.expressions import col
    125 column_iter: Iterable[ColumnInputType] = [columns] if is_column_input(columns) else columns  # type: ignore
--> 126 return [col(c) if isinstance(c, str) else c for c in column_iter]

TypeError: 'int' object is not iterable

Using this pr:

In [5]: df.count(1).show()
╭────────╮
│ count  │
│ ---    │
│ UInt64 │
╞════════╡
│ 4      │
╰────────╯

(Showing first 1 of 1 rows)

Related Issues

Checklist

Documented in API Docs (if applicable)
Documented in User Guide (if applicable)
If adding a new documentation page, doc is added to docs/mkdocs.yml navigation
Documentation builds and is formatted properly (tag @/ccmao1130 for docs review)

greptile-apps

Greptile Summary

This PR adds support for SQL-compatible count(1) syntax to Daft DataFrames. The change extends the existing count method to recognize integer arguments as equivalent to count() or count('*'), performing a total row count operation rather than counting non-null values in specific columns.

The implementation modifies the count method in daft/dataframe/dataframe.py by adding a new condition or (len(cols) == 1 and isinstance(cols[0], int)) to the existing special-case handling for empty arguments and asterisk strings. This allows integer literals like 1 to be treated the same way as the wildcard * syntax.

In SQL databases, COUNT(1), COUNT(*), and COUNT() are functionally equivalent operations that count all rows regardless of null values, which differs from COUNT(column_name) that only counts non-null values in specific columns. This change brings Daft's DataFrame API closer to standard SQL behavior, making it more intuitive for users familiar with SQL conventions.

The change integrates seamlessly with Daft's existing expression system, where the integer literal gets processed through the same aggregation pipeline as other count operations. A corresponding test case in tests/dataframe/test_count.py validates that df.count(1) correctly returns the total row count.

Confidence score: 4/5

This PR is safe to merge with minimal risk as it's a straightforward feature addition with clear SQL semantics
Score reflects simple logic changes that extend existing functionality without breaking backward compatibility
Pay close attention to daft/dataframe/dataframe.py to ensure the new condition integrates properly with existing count logic

_{2 files reviewed, 1 comment}

_{Edit Code Review Bot Settings | Greptile}

daft/dataframe/dataframe.py

Jay-ju · 2025-08-14T08:55:20Z

daft/dataframe/dataframe.py

+        if (
+            len(cols) == 0
+            or (len(cols) == 1 and isinstance(cols[0], str) and cols[0] == "*")
+            or (len(cols) == 1 and isinstance(cols[0], int))


Just out of curiosity, does count(1) have to select a specific column, or is it also selected according to the optimizer?

From the following code, it can be seen that the first field is taken

def count(self) -> LogicalPlanBuilder: # TODO(Clark): Add dedicated logical/physical ops when introducing metadata-based count optimizations. first_col = col(self.schema().column_names()[0]) builder = self._builder.aggregate([first_col.count(CountMode.All)._expr], []) builder = builder.select([first_col.alias("count")._expr]) return LogicalPlanBuilder(builder)

@huleilei Yes, so is it possible to make an optimization here? It seems more appropriate to select a column with the min size and read the count. WDYT cc @rchowell

Yes, I agree. The method of selecting fields needs to be optimized.

In SQL, COUNT(<lit>) will return the total number of rows regardless of the <lit> value. I would translate any literal value to COUNT(*) and this PR would be complete.

I agree that choosing the column with the smallest "size" makes sense, like choose an int32 column instead of some binary. I'm not sure this is really necessary or does actually improve perf? Maybe worth testing.

Schema (PostgreSQL v17)

CREATE TABLE T (x INT); INSERT INTO T VALUES (1), (2), (3);

Query #1

SELECT COUNT(1.0), COUNT('xyz'), COUNT(-1), COUNT(false) FROM T;

count count count count

3 3 3 3

View on DB Fiddle

codecov · 2025-08-14T09:40:02Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.92%. Comparing base (23549cd) to head (711a89d).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4977      +/-   ##
==========================================
- Coverage   76.46%   73.92%   -2.55%     
==========================================
  Files         917      916       -1     
  Lines      127012   130215    +3203     
==========================================
- Hits        97124    96263     -861     
- Misses      29888    33952    +4064

Files with missing lines	Coverage Δ
daft/dataframe/dataframe.py	`86.75% <100.00%> (ø)`

... and 77 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

feat: support count(1) in dataframe

ec058ce

github-actions bot added the feat label Aug 14, 2025

huleilei marked this pull request as draft August 14, 2025 08:52

greptile-apps bot reviewed Aug 14, 2025

View reviewed changes

daft/dataframe/dataframe.py Show resolved Hide resolved

Jay-ju reviewed Aug 14, 2025

View reviewed changes

Merge branch 'main' into count_1

711a89d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support count(1) in dataframe #4977

feat: support count(1) in dataframe #4977

huleilei commented Aug 14, 2025 •

edited

Loading

Uh oh!

greptile-apps bot left a comment

Uh oh!

Uh oh!

Jay-ju Aug 14, 2025

Uh oh!

huleilei Aug 14, 2025

Uh oh!

Jay-ju Aug 15, 2025

Uh oh!

huleilei Aug 15, 2025

Uh oh!

rchowell Aug 15, 2025

Uh oh!

codecov bot commented Aug 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

feat: support count(1) in dataframe #4977

Are you sure you want to change the base?

feat: support count(1) in dataframe #4977

Conversation

huleilei commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes Made

Related Issues

Checklist

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Greptile Summary

Confidence score: 4/5

Uh oh!

Uh oh!

Jay-ju Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

huleilei Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Jay-ju Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

huleilei Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

rchowell Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

huleilei commented Aug 14, 2025 •

edited

Loading

codecov bot commented Aug 14, 2025 •

edited

Loading