Skip to content

[Protocol Change Request] Geospatial types #4726

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

stefankandic
Copy link
Contributor

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (protocol RFC)

Description

Adds the protocol changes for the Geospatial types (see #4725) to the RFC folder.

How was this patch tested?

N/A

Does this PR introduce any user-facing changes?

N/A


For geospatial types, the most useful statistics for data skipping are **bounding boxes**. A bounding box is a minimal axis-aligned rectangle, parallelepiped, or hyper-parallelepiped, that fully contains all points of all geometries or geographies in a given set. It is typically represented by two points: the **lower-left** and **upper-right** corners. Each point must include **X** and **Y** coordinates, but may also optionally include **Z** (elevation) and **M** (measure) values.

Since a bounding box naturally defines spatial **minimum** and **maximum** values, we can leverage the existing min and max column statistics fields to store the points. The values are encoded in **[Well-Known Text (WKT)](https://libgeos.org/specifications/wkt/)** format. For example:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Since a bounding box naturally defines spatial **minimum** and **maximum** values, we can leverage the existing min and max column statistics fields to store the points. The values are encoded in **[Well-Known Text (WKT)](https://libgeos.org/specifications/wkt/)** format. For example:
Since a bounding box naturally defines spatial **minimum** and **maximum** values, these values are stored `minValues` and `maxValues fields of the `stats` field in `add` and `remove` actions. The values are encoded in **[Well-Known Text (WKT)](https://libgeos.org/specifications/wkt/)** format. For example:

Since a bounding box naturally defines spatial **minimum** and **maximum** values, we can leverage the existing min and max column statistics fields to store the points. The values are encoded in **[Well-Known Text (WKT)](https://libgeos.org/specifications/wkt/)** format. For example:

```json
{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also include the outer stats struct in the example?


### Per-file statistics

For geospatial types, the most useful statistics for data skipping are **bounding boxes**. A bounding box is a minimal axis-aligned rectangle, parallelepiped, or hyper-parallelepiped, that fully contains all points of all geometries or geographies in a given set. It is typically represented by two points: the **lower-left** and **upper-right** corners. Each point must include **X** and **Y** coordinates, but may also optionally include **Z** (elevation) and **M** (measure) values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is typically represented by two points: the lower-left and upper-right corners. Each point must include X and Y coordinates, but may also optionally include Z (elevation) and M (measure) values.

We need to be more precise here, this doesn't actually specify the format that must be used.
E.g.:
A bound box is represented by two points:

  • the lower-left corner, defined as the minimum value
  • the upper-right corner defined as the maximum value.

A point is represented by two floating values, encoded in Well-Known Text (WKT) format, e.g.:POINT(-122.419 37.774)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants