Skip to content

Multi-byte characters not rendered correctly when output from a C or C++ program #223

Open
@tfpf

Description

@tfpf

If a C or C++ program writes multi-byte characters to the console, they are not rendered correctly. The following shell script demonstrates the same.

#! /usr/bin/env sh

pacman -S --needed mingw-w64-ucrt-x86_64-gcc mingw-w64-ucrt-x86_64-python

printf '#include <stdio.h>\nint main(void) { puts("∈√≈≡⊥"); }\n' >msys2.c
gcc msys2.c
./a
echo $(./a)

printf '#include <cstdio>\nint main(void) { std::puts("∈√≈≡⊥"); }\n' >msys2.cc
g++ msys2.cc
./a
echo $(./a)

printf 'import sys\n\nprint("∈√≈≡⊥")' >msys2.py
python msys2.py

echo "∈√≈≡⊥"

Here's the output.

warning: mingw-w64-ucrt-x86_64-gcc-14.1.0-3 is up to date -- skipping
warning: mingw-w64-ucrt-x86_64-python-3.11.9-1 is up to date -- skipping
 there is nothing to do
∈√≈≡⊥
∈√≈≡⊥
∈√≈≡⊥
∈√≈≡⊥
∈√≈≡⊥
∈√≈≡⊥

Key Observations

  • When multi-byte characters are written by a C or C++ program, the actual characters written don't appear to be related, and can themselves by multi-byte.
  • If the output is saved to a variable and then echoed (see echo $(./a) above), the characters are displayed correctly.
  • Upon writing multi-byte characters from Python or sh, nothing unexpected occurs.

I am using 64-bit MSYS2 20230526. I didn't try this with the latest version because I didn't find any bug reports for this issue even after searching for a while.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions