This query detects non-explicit control and whitespace characters in Java literals. Such characters are often introduced accidentally and can be invisible or hard to recognize, leading to bugs when the actual contents of the string contain control characters.
To avoid issues, use the encoded versions of control characters (e.g. ASCII \n, \t, or Unicode U+000D, U+0009).
This makes the literals (e.g. string literals) more readable, and also helps to make the surrounding code less error-prone and more maintainable.
The following examples illustrate good and bad code:
Bad:
char tabulationChar = ' '; // Non compliant
String tabulationCharInsideString = "A B"; // Non compliant
String fooZeroWidthSpacebar = "foobar"; // Non compliantGood:
char escapedTabulationChar = '\t';
String escapedTabulationCharInsideString = "A\tB"; // Compliant
String fooUnicodeSpacebar = "foo\u0020bar"; // Compliant
String foo2Spacebar = "foo bar"; // Compliant
String foo3Spacebar = "foo bar"; // CompliantThis query detects Java literals that contain reserved control characters and/or non-printable whitespace characters, such as:
-
Decimal and hexidecimal representations of ASCII control characters (code points 0-8, 11, 14-31, and 127).
-
Invisible characters (e.g. zero-width space, zero-width joiner).
-
Unicode C0 control codes, plus the delete character (U+007F), such as:
Escaped Unicode ASCII Decimal Description \u00000 null character \u00011 start of heading \u00022 start of text \u00033 end of text \u00044 end of transmission \u00055 enquiry \u00066 acknowledge \u00077 bell \u00088 backspace \u000B11 vertical tab \u000E14 shift out \u000F15 shift in \u001016 data link escape \u001117 device control 1 \u001218 device control 2 \u001319 device control 3 \u001420 device control 4 \u001521 negative acknowledge \u001622 synchronous idle \u001723 end of transmission block \u001824 cancel \u001925 end of medium \u001A26 substitute \u001B27 escape \u001C28 file separator \u001D29 group separator \u001E30 record separator \u001F31 unit separator \u007F127 delete -
Zero-width Unicode characters (e.g. zero-width space, zero-width joiner), such as:
Escaped Unicode Description \u200Bzero-width space \u200Czero-width non-joiner \u200Dzero-width joiner \u2028line separator \u2029paragraph separator \u2060word joiner \uFEFFzero-width no-break space
The following list outlines the explicit exclusions from query scope:
-
any number of simple space characters (
U+0020, ASCII 32). -
an escape character sequence (e.g.
\t), or the Unicode equivalent (e.g.\u0009), for printable whitespace characters:Character Sequence Escaped Unicode ASCII Decimal Description \t\u0009 9 horizontal tab \n\u000A 10 line feed \f\u000C 12 form feed \r\u000D 13 carriage return \u0020 32 space -
character literals (i.e. single quotes) containing control characters.
-
literals defined within "likely" test methods, such as:
- JUnit test methods
- methods annotated with
@Test - methods of a class annotated with
@Test - methods with names containing "test"
- Unicode: Unicode Control Characters.
- Wikipedia: Unicode C0 control codes.
- Wikipedia: Unicode characters with property "WSpace=yes" or "White_Space=yes".
- Java API Specification: Java String Literals.
- Java API Specification: Java Class Charset.