Skip to content

Log sanitization with use of SearchValues v2 #3340

@mdchennu

Description

@mdchennu

Problem Statement.
We want to update the Sanitize method:

to utilize SearchValues. All values which need to be sanitized can be added to a SearchValues collection.

Specific Characters to Sanitize:

  • '\r' (Carriage Return, U+000D)
  • '\n' (Line Feed, U+000A)
  • '\t' (Tab, U+0009)
  • All other ASCII control characters: U+0000-U+0008, U+000B-U+000C, U+000E-U+001F, U+007F-U+009F
  • All Unicode characters where char.IsControl(c) is true (Unicode category Cc)
  • All Unicode characters where CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.Format (Unicode category Cf), e.g.:
    • U+200B (Zero Width Space)
    • U+200C (Zero Width Non-Joiner)
    • U+200D (Zero Width Joiner)
    • U+200E (Left-to-Right Mark)
    • U+200F (Right-to-Left Mark)
    • U+202A-U+202E (Directional formatting)
    • U+2060-U+206F (Various format characters)
    • U+FEFF (Zero Width No-Break Space)

Implementation Clarifications:

  • The referenced SanitizeEntryFromFilePath method from dotnet/runtime should be used as a performance pattern reference only. The actual character set for sanitization should follow this issue's explicit enumeration.
  • The new implementation must produce identical output to the current method, including string formatting (e.g., using "\u{(int)c:X4}" for control/format characters).

Steps:

  1. Update the Sanitize method to have a SearchValues collection that includes the above values. Create a SearchValues for the common ASCII control characters and handle Unicode format characters separately with a fallback to char.IsControl() and CharUnicodeInfo.GetUnicodeCategory()
  2. Update the Sanitize method to follow the structure of the method called SanitizeEntryFromFilePath: dotnet/runtime/src/libraries/Common/src/System/IO/Archiving.Utils.Windows.cs#L11
  3. Update tests accordingly. If tests to cover the following edge cases are not present, add them: Strings with no sanitizable characters (performance fast path), Strings with only ASCII control characters, Strings with Unicode format characters. Very long strings, and Null/empty string handling

Only update the Sanitize method as described above; do not update any other methods.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions