feat!: use actor.json as source of truth to support single actor and broader monorepo structures#96
feat!: use actor.json as source of truth to support single actor and broader monorepo structures#96ruocco-l wants to merge 29 commits into
Conversation
…eric builder token
metalwarrior665
left a comment
There was a problem hiding this comment.
I think this is good and obviously needed change. It won't be that big of a hassle for other teams since adding new Actors is quite rare. We need to polish this though.
| '.actor/', | ||
| // In root .actor/ mode, .actor/ changes must trigger builds | ||
| ...(isSingleActorRepo ? [] : ['.actor/']), |
There was a problem hiding this comment.
We can remove this completely, there is no reason we should commit changes to top level .actor in multiactor
|
|
||
| let cachedBuilderUsername: string | undefined; | ||
|
|
||
| export const resolveBuilderTokenUsername = async (): Promise<string> => { |
There was a problem hiding this comment.
What is the use-case to need this? Just so you don't have to think about the username?
| }; | ||
|
|
||
| const readActorName = async (actorJsonPath: string): Promise<string> => { | ||
| const actorJson: { name?: string } = JSON.parse(await fs.readFile(actorJsonPath, 'utf-8')); |
There was a problem hiding this comment.
I would rather introduce a new JSON file for our own custom needs than stitch non-spec fields to actor.json.
There was a problem hiding this comment.
that would also enable us to cover miniactors, monorepos or single actor stuff without doing the
user-name_actor-name convention.
It could all be config stuff
| actorName: fullName, | ||
| folder: actorDir, | ||
| isStandalone: folderType === 'standalone-actors', | ||
| tokenEnvVar, |
There was a problem hiding this comment.
I'm not a big fan of passing the token around because it is more likely to leak, we should be able to just resolve it in place like we did no?
There was a problem hiding this comment.
This doesn't pass the token, just the name of the env variable where it's stored
|
We should also sync with @gullmar about other repos. The monorepo where each Actor has its own src is also probably not ideally supported. |
| const resolveOwner = async (folderName: string): Promise<string> => { | ||
| const ownerMatch = folderName.match(/^(.+)_[^_]+$/); | ||
| if (ownerMatch) return ownerMatch[1]; | ||
| return resolveBuilderTokenUsername(); | ||
| }; |
There was a problem hiding this comment.
So the owner still has to be defined in the folder name 😅 That kind of defeats the purpose of this entire effort 🫠
I would either want the conventionally named folder names only (owner + name in folder name), or to have both the owner and actor name in actor.json, but not both the conventionally named folder names for owners and actor.json for names at once 🙏
There was a problem hiding this comment.
No. IF the owner is there we automatically assign and look for the appropriate token. If not we fallback to whatever is the owner of the BUILDER token. This is still useful if you have a default account where there are all the miniactors (for which is ok to fallback the BUILDER token) and have one miniactor that is build under a different account (for whatever reason), in which case you will specify it with the folder name owner_whatever-actor-name-since-it's-not-used.
But I agree that just adding the owner to the actor.json is much cleaner
There was a problem hiding this comment.
I agree that for most repos, the single BUILDER token would work fine, since we usually have everything under one account. So, for the most part, this would only have to cover edge cases. Even then it seems like there should be a better way to define the edge case then by folder name - be it actor.json or a global config file 🤔
| }; | ||
|
|
||
| const readActorName = async (actorJsonPath: string): Promise<string> => { | ||
| const actorJson: { name?: string } = JSON.parse(await fs.readFile(actorJsonPath, 'utf-8')); |
|
Alright alright, let's do the config |
|
One more related long-term thing. When we designed this library, the core idea was "convention over configuration". We wanted everything to be at hardcoded places since that is always better than having to config and it worked well for that use-case. Now if we want many (or arbitrary) different setups, we cannot keep patching it like this because we would have to keep adding On the other hand, the library has to remain somewhat opinionated, like what file changes trigger build etc, otherwise it has basically no value, it would just be some configs + few API calls. I think for now, this PR is ok because top-level .actor is the template thing, but there probably already are things in the "changed Actors" that break. But for anything more, I would wait for th full rewrite. That should also make it truly open-source usable. |
metalwarrior665
left a comment
There was a problem hiding this comment.
I gave it another thought and I think it wouldn't be too hard to go all the way for really configurable config. Otherwise, we would have to potentially do another breaking change.
We should support (let's look of we can find more) at least 3 reference repos:
- typical Store monorepo
- template-like repo (e.g. WCC)
- monorepo with packages and actors importing them - e-commerce
I'm thinking it could work like this:
- Each Actor config points to the folder where .actor is (like now) or directly to the .actor folder.
- Then from actor.json, we read
dockerContextDirand that contains all folders/files that can influence the changes for that Actor. We can allow config to haveoverrideActorContextif they want to e.g. narrow it down. - Then instead of the current logic that if "code changes" we add mark all Actors as changed, we just do the changed files logic for each mini Actor independently, that cleans up the logic as well.
This way, we handle current Store monorepo (Actor folder + top-level context), template (just top-level), e-commerce (actor folder with its code, packages, shared)
What do you think?
| { | ||
| "actors": [ | ||
| { | ||
| "folder": "actors/web-scraper", |
There was a problem hiding this comment.
I would add actorName too rather than deriving it from the folder
There was a problem hiding this comment.
I thought about it, but I think can cause some confusion when you (usually when you publish the actor) play a little with the naming for SEO reasons (or similar). I think whatever is in the actor.json should be respected as the truth and not be overridden by some obscure logic from the testing package.
There was a problem hiding this comment.
The testing lib will not set or change the name, it just check that it exists. The name.in actor.json is not a single source of truth so I would not use it, better to have all here
| }); | ||
| }, | ||
| ) | ||
| .command( |
There was a problem hiding this comment.
I would remove this, Claude will one-shot this and you will want to manually check it anyway.
There was a problem hiding this comment.
I think this sort of CLI initialization utility perfectly fits into a library like here 🤔
However unless we make this a completely seamless end to end migration command, then I'm also against it. Right now as far as I can tell, it doesn't really migrate the ENV vars. I will open up a separate comment for that.
There was a problem hiding this comment.
| if (!Array.isArray(config.actors)) { | ||
| throw new Error(`Config file "${CONFIG_FILE_NAME}" must have an "actors" array at the top level.`); | ||
| } |
There was a problem hiding this comment.
I would recommend using Zod for parsing the JSON config 👀
| }); | ||
| }, | ||
| ) | ||
| .command( |
There was a problem hiding this comment.
I think this sort of CLI initialization utility perfectly fits into a library like here 🤔
However unless we make this a completely seamless end to end migration command, then I'm also against it. Right now as far as I can tell, it doesn't really migrate the ENV vars. I will open up a separate comment for that.
| }); | ||
| }, | ||
| ) | ||
| .command( |
There was a problem hiding this comment.
metalwarrior665
left a comment
There was a problem hiding this comment.
I think we are going the right way, I have quite a few comments, the one with pre-filtering files has the biggest refactoring potential.
I don't think we are in any hurry so let's give this the time needed :)
| | { impact: 'cosmetic'; semanticallyVerified: boolean } | ||
| | { impact: 'functional' }; | ||
|
|
||
| const isFileInContext = (lowercaseFilePath: string, actor: ActorConfig): boolean => { |
There was a problem hiding this comment.
Not sure why here it called it just actor :)
| const isFileInContext = (lowercaseFilePath: string, actor: ActorConfig): boolean => { | |
| const isFileInContext = (lowercaseFilePath: string, actorConfig: ActorConfig): boolean => { |
| | { impact: 'functional' }; | ||
|
|
||
| const isFileInContext = (lowercaseFilePath: string, actor: ActorConfig): boolean => { | ||
| if (actor.overrideActorContext) { |
There was a problem hiding this comment.
I would resolve overrideActorContext at the config parsing so we carry over just the context path array, this function doesn't need to know if it is overriden or docker context
|
|
||
| if (lowercaseFilePath.endsWith('changelog.md')) { | ||
| return { impact: 'cosmetic', semanticallyVerified: false, includes: 'all-actors' }; | ||
| if (!isFileInContext(lowercaseFilePath, actor)) { |
There was a problem hiding this comment.
| if (!isFileInContext(lowercaseFilePath, actor)) { | |
| if (!isFileInActorContext(lowercaseFilePath, actor)) { |
| /** | ||
| * Also works for folders | ||
| */ | ||
| const isIgnoredTopLevelFile = (lowercaseFilePath: string) => { |
There was a problem hiding this comment.
This function needs to be rewritten so the top level files are vs actor context so it works for standalone Actors too. Basically all the checks should be context aware. I think what we can simply do is to first "resolve context" by filtering the changed file paths from git and stripping the context path from them so they are "hoisted" to top level. E.g. if you have standalone-actors/my-actor/readme.md and context is standalone-actors/my-actor at the point we get changed file paths, we just strip this to just readme.md as it is relative to context for that Actor.
|
|
||
| return IGNORED_TOP_LEVEL_FILES.some((ignoredFile) => sanitizedLowercaseFilePath.startsWith(ignoredFile)); | ||
| // Strip deprecated code/ and shared/ prefixes — repos like apify-store/amazon use these | ||
| const sanitized = lowercaseFilePath.replace(/^code\//, '').replace(/^shared\//, ''); |
There was a problem hiding this comment.
Follow-up to my previous comment, we can remove this hardcode and the repo owner will just have to add code and shared to context of all Actors
| * Check if a file falls inside another actor's folder. | ||
| * Root actors (folder === "") never exclude files from siblings. | ||
| */ | ||
| const isExcludedBySibling = (lowercaseFilePath: string, actor: ActorConfig, allActors: ActorConfig[]): boolean => { |
There was a problem hiding this comment.
I think this can be simplified to just remove all .actor changes that aren't this Actor's .actor, we don't need to know what other Actors exist.
Theoretically, we could refactor to use the context to generate a list of file/directory paths for each Actor that would already exclude the sibling .actor and pass those resolved paths around instead of the context. This would allow us to do some filtering even before we know about changed files so the logic would be split into 2 stages, simplifying tests etc. We can prefilter ignored files, siblings, docker etc. Not a hard requirement but you can ask Claude what it thinks :)
There was a problem hiding this comment.
I didn't do it, it feels like a small improvement and I personally like the flow, it's simple enough
| * across all actors: functional > cosmetic > ignored. Files that are outside-context for every | ||
| * actor are treated as ignored. | ||
| */ | ||
| const updateFileImpact = ( |
There was a problem hiding this comment.
I would change the logging, remove the impact priority (it's weird to override it like this) and forget the repo-wide approach and do it more in line with the Actor-centric approach
- Group all changes that are same across Actors. Most changes will be shared by all.
- Log changes for all groups separately. Something like "shared changes for Actors instagram-scraper, instagram-post-scraper: file1, file2", then "Changes specific to instagram-scraper: actors/instagram-scraper/.actor/input_schema.json" etc,
There was a problem hiding this comment.
6980b49 much better imo, thank you for the input
| ); | ||
| } | ||
|
|
||
| const actorDotDir = folder ? `${folder}/.actor` : '.actor'; |
There was a problem hiding this comment.
I would just make folder required so it is explicit
| ); | ||
| } | ||
|
|
||
| const actorDotDir = folder ? `${folder}/.actor` : '.actor'; |
There was a problem hiding this comment.
We should validate that it has actor.json here, that will also catch some typos
There was a problem hiding this comment.
This is basically already there, the code tries to parse the json and if it can't it throws
| actorName: `${owner}/${actorName}`, | ||
| folder: actorDir, | ||
| isStandalone: folderType === 'standalone-actors', | ||
| actorName: entry.actorName, |
There was a problem hiding this comment.
| actorName: entry.actorName, | |
| actorFullName: entry.actorName, |
There was a problem hiding this comment.
This is actually something that I have been struggling since the start of this PR. The problem with this is that there is some incosistency with what is actorName and what is actorId (in both Apify and this project). I took the liberty of doing one single commit where I try to unify the names so we now have actorFullName for the human-readable name, actorRawId for the platform's ID, and actorId as the Apify conventional id (both raw id and full name ).
There was a problem hiding this comment.
I understand that this can be a confusing change, so maybe we can export it in a seprate pr, or just revert it and leave it as is
| return readConfigFile(); | ||
| }; | ||
|
|
||
| const CONFIG_FILE_NAME = '.test-tools-actors-config.json'; |
There was a problem hiding this comment.
I'd call this apify-test-tools.json or apify-test-tools.config.json so it's clear it's for apify-test-tools
Closes #84
The CI tooling previously derived actor names and ownership entirely from folder naming conventions (
owner_actor-name), which was fragile — usernames with multiple underscores caused mismatches, and it was impossible to have a folder name that didn't encode the owner. This PR makes three interconnected changes to fix that:actor.jsonis now the source of truth for actor names. Each actor folder must contain.actor/actor.jsonwith a"name"field. The old convention of reverse-engineering the actor name from the folder name (splitting on the last underscore) is gone.maybeParseActorFoldernow returns the folder path directly (e.g.actors/shopify) instead of attempting to reconstructowner/actor-namefrom the folder. This eliminates the class of bugs where folder names didn't round-trip cleanly to actor names.getRepoActorsnow resolves which env var holds the correct token for each actor (APIFY_TOKEN_<OWNER>orBUILDER_APIFY_TOKENas fallback) and validates the actor exists on the platform. The resolvedtokenEnvVaris carried inActorConfigand used byApifyBuilder.fromActorConfig— replacing the oldfromActorNamewhich re-derived the token independently. This catches misconfigured folder names or missing actors before any build/delete work starts, with a clear error message.Additionally:
actors/orstandalone-actors/directories can now be built if they have a root.actor/actor.json. The owner is resolved fromBUILDER_APIFY_TOKEN. In this mode,.actor/changes correctly trigger builds (they're ignored in multi-actor repos where.actor/is just forapify pushCLI).BUILDER_APIFY_TOKENfallback. Both token resolution andApifyBuilderfall back toBUILDER_APIFY_TOKENwhen the per-owner env var is missing, instead of failing immediately.Breaking change
Actor folders must now contain
.actor/actor.jsonwith a"name"field. Teams already using this tooling will need to add this file to every actor folder. The folder name is still used to resolve the owner (everything before the last underscore), but the actor name itself comes exclusively fromactor.json.Note oncirc_leactorscirc_lematching now depends onactor.json's"name"field matching what comes afterapify-managed---in thecirc_leaccount. As long as the"name"inactor.jsonis correct (i.e. matches what the old folder convention would have produced), the fragility is the same as before — no better, no worse. If a repo's builds were already working correctly, adding the right"name"toactor.jsonwill not change behavior.circ_le logic has been removed from main.
Tests
diff-changestests for ownerless folder matching, single-actor.actor/build triggers, and multi-actor.actor/ignore behavior.utilstest suite (10 cases) covering token resolution priority,BUILDER_APIFY_TOKENfallback, validation failure on missing/unreachable actors, missingactor.jsonname field, single-actor repo mode, ownerless folders, standalone actors, and special characters in owner names (e.g.luigi.ruocco->APIFY_TOKEN_LUIGI_RUOCCO).Update: mandatory config file replaces all discovery logic
The discovery/guessing logic from the earlier commits has been fully replaced with a mandatory root-level config file (
.test-tools-actors-config.json). Combined with each actor's.actor/actor.json, these are now the only sources of truth.Config file shape:
{ "actors": [ { "folder": "actors/web-scraper", "owner": "myteam", "tokenEnvVar": "APIFY_TOKEN_MYTEAM", "isStandalone": false } ] }What this replaces:
resolveBuilderTokenUsername,resolveOwner,resolveTokenEnvVar,validateActorExists,readActorName— all deleted.apify-clientimport removed fromutils.ts(onlybuild.tsneeds it now).BUILDER_APIFY_TOKENfallback — removed.tokenEnvVarin the config is final; if the env var is not set at build time, it fails hard.isSingleActorRepobranching in diff detection — removed..actor/changes at root level now always trigger builds for all non-standalone actors.New
init-configcommand:Scans the repo via
git ls-filesfor.actor/actor.jsonfiles, generates the config with placeholders. Warns if a root-level.actor/actor.jsoncoexists with subfolder actors.Updated tests:
.actor/changes now trigger builds in multi-actor repos (updated indiff-changes.test.tsandshould-built-and-test.test.ts).utils.test.ts(18 tests) coveringreadConfigFileandgenerateConfigFile.Update 2: dockerContextDir-based change detection
Config file shape:
{ "actors": [ { "folder": "actors/web-scraper", "actorName": "myteam/web-scraper", "tokenEnvVar": "APIFY_TOKEN_MYTEAM", "overrideActorContext": ["actors/web-scraper", "packages/shared"] } ] }What changed:
ownerreplaced byactorName— fullowner/nameformat (e.g."apify/web-scraper"). The actor name is no longer derived fromactor.json'snamefield —actorNamein the config is the single source of truth.isStandaloneremoved — replaced by sibling exclusion. Each actor'sfolderdefines its ownership boundary. Files inside another actor's folder are automatically excluded, achieving the same effect without an explicit flag.dockerContextDir-based scoping — each actor's.actor/actor.jsondockerContextDirfield defines which files can affect its build. Change detection uses this boundary instead of hardcodedactors/andstandalone-actors/regex matching.overrideActorContext— optional config field. When set, replacesdockerContextDirfor change detection. Useful for actors that depend on shared packages outside their Docker build context..dockerignorefiltering — if a.dockerignoreexists at the root of an actor'sdockerContextDir, matching files are ignored during change detection. Patterns are resolved relative todockerContextDir, matching Docker's own behavior. Added as a self-contained commit for easy revert if it gets pushback.init-configcommand removed — the config must be written manually.Change detection algorithm (per file, per actor):
dockerContextDiroroverrideActorContext) → skip if no match.dockerignorefiltering → ignored if matchedfolder) → skip