-
Notifications
You must be signed in to change notification settings - Fork 438
Open
Description
Problem Statement.
We want to update the Sanitize method:
Line 533 in c152004
| private static string Sanitize(string input) |
Specific Characters to Sanitize:
- '\r' (Carriage Return, U+000D)
- '\n' (Line Feed, U+000A)
- '\t' (Tab, U+0009)
- All other ASCII control characters: U+0000-U+0008, U+000B-U+000C, U+000E-U+001F, U+007F-U+009F
- All Unicode characters where char.IsControl(c) is true (Unicode category Cc)
- All Unicode characters where CharUnicodeInfo.GetUnicodeCategory(c) == UnicodeCategory.Format (Unicode category Cf), e.g.:
- U+200B (Zero Width Space)
- U+200C (Zero Width Non-Joiner)
- U+200D (Zero Width Joiner)
- U+200E (Left-to-Right Mark)
- U+200F (Right-to-Left Mark)
- U+202A-U+202E (Directional formatting)
- U+2060-U+206F (Various format characters)
- U+FEFF (Zero Width No-Break Space)
Implementation Clarifications:
- The referenced SanitizeEntryFromFilePath method from dotnet/runtime should be used as a performance pattern reference only. The actual character set for sanitization should follow this issue's explicit enumeration.
- The new implementation must produce identical output to the current method, including string formatting (e.g., using "\u{(int)c:X4}" for control/format characters).
Steps:
- Update the Sanitize method to have a SearchValues collection that includes the above values. Create a SearchValues for the common ASCII control characters and handle Unicode format characters separately with a fallback to char.IsControl() and CharUnicodeInfo.GetUnicodeCategory()
- Update the Sanitize method to follow the structure of the method called SanitizeEntryFromFilePath: dotnet/runtime/src/libraries/Common/src/System/IO/Archiving.Utils.Windows.cs#L11
- Update tests accordingly. If tests to cover the following edge cases are not present, add them: Strings with no sanitizable characters (performance fast path), Strings with only ASCII control characters, Strings with Unicode format characters. Very long strings, and Null/empty string handling
Only update the Sanitize method as described above; do not update any other methods.
Copilot
Metadata
Metadata
Assignees
Labels
No labels