Skip to content

Lebovits/issu 1016 ndvi #1171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: staging
Choose a base branch
from
Open

Lebovits/issu 1016 ndvi #1171

wants to merge 4 commits into from

Conversation

nlebovits
Copy link
Collaborator

New Feature: Add NDVI and One-Year NDVI Change to ETL Pipeline

Description

This PR addresses issue #1016. It uses Planetary Computer to query Sentinel 2 satellite imagery for the most recent summer and the summer prior to that. It returns a raster with the most recent summer's NDVI at 10m resolution, plus the one-year change in NDVI from the prior summer, each as a separate band in the raster. It saves the raster as a file in the tmp directory (currently coming in around 65MB), then uses exactextract to get the mean NDVI and mean NDVI change for each parcel in the primary feature layer dataset (fall 500k+ properties in Philadelphia). The code is set up to check whether the data for the most recent summer already exists in the tmp folder. If it does, it skips straight to extraction. The full code without cached data runs in 10-15 minutes. With the cached data, it takes about 3 minutes. I've also added unit tests in test/test_ndvi.py.

@nlebovits nlebovits requested a review from adamzev April 13, 2025 00:33
Copy link

vercel bot commented Apr 13, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
vacant-lots-proj ✅ Ready (Inspect) Visit Preview 💬 Add feedback Apr 14, 2025 0:03am

Copy link
Contributor

@adamzev adamzev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the number and small size of the helper classes. With our services, I'm planning on moving to a pattern where the service takes the feature layer but then calls to a transform and merge function that just rely on gdfs so they can be tested without worrying about the feature layer class.

I added some suggestions to reduce how much we need to mock and also had a concern regarding whether the row length of results could ever be different than primary_featurelayer.gdf.

This should be some very interesting and helpful functionality!

from ..constants.services import CENSUS_BGS_URL


def get_current_summer_year() -> int:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may produce surprising behavior around the end of May/early June if we are trying to rerun a pipeline or recreate old results. we may want to add an optional date as a parameter so we can control this when needed. That would eliminate the need for freezegun in the tests.

I haven't looked yet how this is used elsewhere but is there a lag between when it becomes summer and when data is available for that summer we need to worry about?

return today.year


def get_bbox_from_census_data() -> Tuple[Polygon, Tuple[float, float, float, float]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this easier to test, we may want this function to take an optional gdf rather than always reading from CENSUS_BGS_URL.

self.assertFalse(is_cache_valid(wrong_year_path, 2025))
self.assertTrue(is_cache_valid(self.test_raster_path, 2025))

@patch("geopandas.read_file")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if patch/mocks can be removed after allowing an optional df being sent to this function

@patch("new_etl.data_utils.ndvi.odc.stac.stac_load")
@patch("new_etl.data_utils.ndvi.get_current_summer_year")
@patch("new_etl.data_utils.ndvi.pystac_client.Client.open")
def test_generate_ndvi_data_failure(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless you are actually concerned the correct exception won't be hit, I'd just remove this test.


@patch("tempfile.gettempdir")
@patch("new_etl.data_utils.ndvi.get_current_summer_year")
@patch("new_etl.data_utils.ndvi.get_bbox_from_census_data")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is hard to reason about tests when so much is patched. See if we can eliminate at least 2 of the patches from my suggestions.


# Update the primary feature layer with the new columns from results
# (instead of creating a new one)
for column in results.columns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that results is on a cleaned version of the gdf and we're adding columns into a version that hasn't been cleaned, is there the possibility for misalignment or different row lengths here when merging in the columns?

It is probably worth making this logic a separate function and trying out some filtering on a df and making sure things merge okay.

Copy link
Contributor

This PR has been marked as stale because it has been open for 7 days with no activity.

@github-actions github-actions bot added the stale label Apr 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants