Improve UI5 performance for large codebases#186
Conversation
Applying inSameWebApp too early in the pipeline was causing a cross-product on all data flow nodes in a WebApp. It's typically better to apply the other conditions (e.g. name matching), then filter on the same web-app as a final step.
Avoid creating a cross-product of tokens identified in a string. Instead, create an ordering of tokens based on begin/end locations and process those linearly to find the first/next/contains tokens.
jeongsoolee09
left a comment
There was a problem hiding this comment.
Left a question, but otherwise I think it's a brilliant solution to work around the joining issue!
- Consider all tokens beginning at the same location as eligible to be the next token. - Implement strictContains to reflect previous behaviour.
@jeongsoolee09 Resurrecting your question, as it got lost.
The new implementation of Note, I've also updated to address some minor inconsistencies with the previous implementation. |
jeongsoolee09
left a comment
There was a problem hiding this comment.
Makes sense now and completely understood. Thank you!
We observed high memory usage and slow evaluation on codebases with a large number of files and string literals.
0f554bd - addresses a join order problem when determining whether elements occur in the same web-app. We prefer to apply the
inSameWebappcondition after computing all other conditions, to avoid a cross-product on data flow nodes in the same web app.ed87aa9 - the
BindingStringParserhad performance issues on large string literals with a large number of tokens. This was because calculating token operations such as the next token, token containment and the first token created a cross-product on all tokens in the given string, before applying further conditions.We avoid this problem by creating a
tokenOrderingpredicate that provides an ordering over all the tokens in the string. This is then used to implement the token operations more efficiently. This approach was complicated by the fact that we can have overlapping and contained tokens. Longer term, I think we would prefer to eliminate this overlap to simplify this code.We observed this performance issue on a codebase with a large number of
*.chunk.jsfiles (which are generated by webpack code splitting). Unfortunately, these files contain a lot of large string literals containing JSON, which we currently pick up as a potential binding string. As a future improvement it would be better if we could exclude these from the binding string calculation all together, perhaps by using data flow to determine whether a string literal flows to a relevant API.