Skip to content

Consider alternative job tracking / logging architecture #303

@sharkinsspatial

Description

@sharkinsspatial

We've hit scaling issues with logging via Aurora Serverless on several occasions https://github.com/NASA-IMPACT/hls_development/issues/232 and #301. Though some of this could be alleviated with improved database architecture and maintenance it might be worth considering solutions that don't require any database to reduce a central point of failure when performing massive scale processing (as will be likely during a reprocessing campaign).

The architecture we are using currently was designed more than 5 years ago so it is definitely worth revisiting and refactoring based on lessons we've learned and new ideas.

In reality, a lot of the operations we currently do for processing state tracking through a combination of step functions and Aurora Serverless could likely be accomplished with a combination of step functions and writing intermediate files to S3 (and having other processes check for the presence of those files).

With assistance from @ceholden and @chuckwondo I'd like to draw some new architecture proposals which incorporate this concept and review them for the following questions

  1. Will we hit AWS S3 quota limits with this type of architecture?
  2. What will the predicted costs for the potentially heavy S3 PUT and GET requests this architecture might generate?
  3. Should we build this as a completely new orchestration pipeline or just refactor our existing pipeline?

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions