Consider alternative job tracking / logging architecture

We've hit scaling issues with logging via Aurora Serverless on several occasions https://github.com/NASA-IMPACT/hls_development/issues/232 and https://github.com/NASA-IMPACT/hls-orchestration/issues/301.  Though some of this could be alleviated with improved database architecture and maintenance it might be worth considering solutions that don't require any database to reduce a central point of failure when performing massive scale processing (as will be likely during a reprocessing campaign).

The architecture we are using currently was designed more than 5 years ago so it is definitely worth revisiting and refactoring based on lessons we've learned and new ideas.

In reality, a lot of the operations we currently do for processing state tracking through a combination of step functions and Aurora Serverless could likely be accomplished with a combination of step functions and writing intermediate files to S3 (and having other processes check for the presence of those files).

With assistance from @ceholden and @chuckwondo I'd like to draw some new architecture proposals which incorporate this concept and review them for the following questions

1.  Will we hit AWS S3 quota limits with this type of architecture?
2. What will the predicted costs for the potentially heavy S3 `PUT` and `GET` requests this architecture might generate?
3. Should we build this as a completely new orchestration pipeline or just refactor our existing pipeline?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Consider alternative job tracking / logging architecture #303

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Consider alternative job tracking / logging architecture #303

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions