Skip to content

fix: escape invalid backslash sequences in generated docstrings#712

Open
Alan4506 wants to merge 3 commits into
smithy-lang:developfrom
Alan4506:fix/docstring-escape-literal-backslash
Open

fix: escape invalid backslash sequences in generated docstrings#712
Alan4506 wants to merge 3 commits into
smithy-lang:developfrom
Alan4506:fix/docstring-escape-literal-backslash

Conversation

@Alan4506
Copy link
Copy Markdown
Contributor

@Alan4506 Alan4506 commented Jun 4, 2026

Background

Generated docstrings are Python string literals, so every backslash must either form a valid escape sequence or the literal is malformed. Python 3.12+ emits a SyntaxWarning for unrecognized escape sequences, and strict type checkers (pyright) report reportInvalidStringEscapeSequence as an error, which fails codegen's post-generation type check.

MarkdownConverter.postProcessPandocOutput is responsible for cleaning up the backslash escapes that pandoc adds. After #707 the single strip regex still produced invalid Python, reproducible with the Secrets Manager model.

Problem

The original line in MarkdownConverter.java:

output = output.replaceAll("\\\\([\\[\\]'{}()<>`@_*|!~$#^])", "$1");

A Python string literal requires backslashes to come in pairs (each pair is one literal backslash). The trouble is that pandoc escapes literal backslashes (\\\) and Markdown-significant characters (*\*) independently, and when the two sit next to each other the backslashes pile up into runs of varying length. A single strip regex cannot get every run right — it either consumes one half of a valid pair or leaves a lone backslash behind. Both happen with the Secrets Manager model, which documents password character sets that contain a literal \:

  • Line 1085 of the model file: the documentation contains a literal \\, which pandoc preserves. The regex matched the second backslash followed by ] and stripped it, turning the valid \\ into a lone \ — it broke a previously valid escape.
  • Line 1128 of the model file: the documentation contains a lone backslash followed by a space, which the regex does not match, so it was left untouched — still an invalid escape.

Generated code before this fix

src/aws_sdk_secretsmanager/client.py (GetRandomPassword docstring):

in passwords: `` !\"#$$%&'()*+,-./:;<=>?@[\]^_`{|}~ ``

src/aws_sdk_secretsmanager/models.py (ExcludePunctuation docstring):

`` ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \ ] ^ _ ` { | } ~ ``.

pyright (run by PythonFormatter during codegen) fails the build:

src/aws_sdk_secretsmanager/client.py:636:51 - error: Unsupported escape sequence in string literal (reportInvalidStringEscapeSequence)
src/aws_sdk_secretsmanager/models.py:3311:54 - error: Unsupported escape sequence in string literal (reportInvalidStringEscapeSequence)

At runtime these also raise SyntaxWarning: invalid escape sequence '\]' / '\ ' on import.

The fix

Replace the strip regex with normalizeBackslashEscapes, which handles each maximal run of backslashes (plus the character that follows it) as a unit instead of matching one backslash at a time. Because backslashes must be paired in a Python literal, the run / 2 fully paired backslashes are always kept; if the run is odd, the one leftover backslash binds to the next character and is:

  • kept if it forms a valid Python escape (\", \n, octal, hex, Unicode, line continuation),
  • dropped if the next character is one pandoc only escapes for Markdown (e.g. [, *), or
  • doubled otherwise, so a real lone backslash becomes a valid \\.

Working on whole runs keeps the result correct no matter how many backslashes pandoc emits, including odd-length runs where a literal backslash sits next to a Markdown character.

Generated code after this fix

src/aws_sdk_secretsmanager/client.py:

in passwords: `` !\"#$$%&'()*+,-./:;<=>?@[\\]^_`{|}~ ``

src/aws_sdk_secretsmanager/models.py:

`` ! " # $ % & ' ( ) * + , - . / : ; < = > ? @ [ \\ ] ^ _ ` { | } ~ ``.

Both are now valid Python: [\\] renders as [\] (a single literal backslash), matching the intended documentation.

Testing

  • Regenerated the Secrets Manager client: no reportInvalidStringEscapeSequence errors, and the module imports under python -W error::SyntaxWarning with no warning. Also inspected the generated code as above.
  • Regenerated DynamoDB, SQS, and API Gateway clients: no errors.
  • :core:test --tests "*MarkdownConverterTest*" passes, including regression tests for paired backslashes, lone backslashes, valid escapes, and backslashes adjacent to Markdown characters.
  • Regenerated existing clients in aws-sdk-python with no error during code generation.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@Alan4506 Alan4506 requested a review from a team as a code owner June 4, 2026 22:06
@Alan4506 Alan4506 force-pushed the fix/docstring-escape-literal-backslash branch from 486a844 to 153a05e Compare June 5, 2026 03:33
@ubaskota
Copy link
Copy Markdown
Contributor

ubaskota commented Jun 5, 2026

Are you able to add tests for these behaviors? I think it'd be valuable to add some tests with different scenarios.

@Alan4506
Copy link
Copy Markdown
Contributor Author

Alan4506 commented Jun 5, 2026

Are you able to add tests for these behaviors? I think it'd be valuable to add some tests with different scenarios.

@ubaskota Thanks! Added tests in 8b30624, and verified that all tests passed.

ubaskota
ubaskota previously approved these changes Jun 5, 2026
Copy link
Copy Markdown
Contributor

@ubaskota ubaskota left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Alan4506 Alan4506 force-pushed the fix/docstring-escape-literal-backslash branch from 017b3d1 to bba6f23 Compare June 5, 2026 21:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants