Skip to content

Dataset selection criteria #43

@jveitchmichaelis

Description

@jveitchmichaelis

@bw4sz following your comment in the issue, we can think about user options for filtering data. In my mind there are some obvious ones we've talked about:

  • Label provenance (similar to GBIF/Inat), was the annotation done by hand or auto/semi-automated?
  • Validation: field, LIDAR, none?
  • Real or Synthetic
  • Geographic coverage? We could expose a very coarse tag like terrestrial ecoregion ([0,13]) which is still privacy preserving
  • Spatial scale? Again could expose GSD as metadata without breaking privacy
  • License? Are all datasets CC-BY, etc.
  • Tree Coverage (overall %) and label coverage (can estimate this with Restor model)

We'd also need to think about categories for some of these, though some seem straightforward. And how we tag data - image level I guess?

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions