feat(engine): Batch trigger reloaded#2779
Merged
Merged
Conversation
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
New batch trigger system with larger payloads, streaming ingestion, larger batch sizes, and a fair processing system.
This PR introduces a new
FairQueueabstraction inspired by our ownRunQueuethat enables multi-tenant fair queueing with concurrency limits. The newBatchQueueis built on top of theFairQueue, and handles processing Batch triggers in a fair manner with per-environment concurrency limits defined per-org. Additionally, there is a global concurrency limit to prevent the BatchQueue system from creating too many runs too quickly, which can cause downstream issues.For this new BatchQueue system we have a completely new batch trigger creation and ingestion system. Previously this was a single endpoint with a single JSON body that defined details about the batch as well as all the items in the batch.
We're introducing a two-phase batch trigger ingestion system. In the first phase, the BatchTaskRun record is created (and possibly rate limited). The second phase is another endpoint that accepts an NDJSON body with each line being a single item/run with payload and options.
At ingestion time all items are added to a queue, in order, and then processed by the BatchQueue system.
New batch trigger rate limits
This PR implements a new batch trigger specific rate limit, configured on the
Organization.batchRateLimitConfigcolumn, and defaults using these environment variables:BATCH_RATE_LIMIT_REFILL_RATEdefaults to 10BATCH_RATE_LIMIT_REFILL_INTERVALthe duration interval, defaults to"10s"BATCH_RATE_LIMIT_MAXdefaults to 1200This rate limiter is scoped to the environment ID and controls how many runs can be submitted via batch triggers per interval. The SDK handles the retrying side.
Batch queue concurrency limits
The new column
Organization.batchQueueConcurrencyConfignow defines an org specificprocessingConcurrencyvalue, with a backup of the env varBATCH_CONCURRENCY_LIMIT_DEFAULTwhich defaults to 10. This controls how many batch queue items are processed concurrently per environment.There is also a global rate limit for the batch queue set via the
BATCH_QUEUE_GLOBAL_RATE_LIMITwhich defaults to being disabled. If set, the entire batch queue system won't process more thanBATCH_QUEUE_GLOBAL_RATE_LIMITitems per second. This allows controlling the maximum number of runs created per second via batch triggers.Batch trigger settings
STREAMING_BATCH_MAX_ITEMScontrols the maximum number of items in a single batchSTREAMING_BATCH_ITEM_MAXIMUM_SIZEcontrols the maximum size of each item in a batchBATCH_CONCURRENCY_DEFAULT_CONCURRENCYcontrols the default environment concurrencyBATCH_QUEUE_DRR_QUANTUMhow many credits each environment gets each round for the DRR schedulerBATCH_QUEUE_MAX_DEFICITthe maximum deficit for the DRR schedulerBATCH_QUEUE_CONSUMER_COUNThow many queue consumers to runBATCH_QUEUE_CONSUMER_INTERVAL_MShow frequently they poll for items in the queueConfiguration Recommendations by Use Case
High-throughput priority (fairness acceptable at 0.98+):
Strict fairness priority (throughput can be lower):
Todo