-
Notifications
You must be signed in to change notification settings - Fork 0
Description
We've hit scaling issues with logging via Aurora Serverless on several occasions https://github.com/NASA-IMPACT/hls_development/issues/232 and #301. Though some of this could be alleviated with improved database architecture and maintenance it might be worth considering solutions that don't require any database to reduce a central point of failure when performing massive scale processing (as will be likely during a reprocessing campaign).
The architecture we are using currently was designed more than 5 years ago so it is definitely worth revisiting and refactoring based on lessons we've learned and new ideas.
In reality, a lot of the operations we currently do for processing state tracking through a combination of step functions and Aurora Serverless could likely be accomplished with a combination of step functions and writing intermediate files to S3 (and having other processes check for the presence of those files).
With assistance from @ceholden and @chuckwondo I'd like to draw some new architecture proposals which incorporate this concept and review them for the following questions
- Will we hit AWS S3 quota limits with this type of architecture?
- What will the predicted costs for the potentially heavy S3
PUT
andGET
requests this architecture might generate? - Should we build this as a completely new orchestration pipeline or just refactor our existing pipeline?