Home » C++ » Output unicode strings in Windows console app

Output unicode strings in Windows console app

Posted by: admin November 29, 2017 Leave a comment

Questions:

Hi I was trying to output unicode string to a console with iostreams and failed.

I found this: Using unicode font in c++ console app and this snippet works.

SetConsoleOutputCP(CP_UTF8);
wchar_t s[] = L"èéøÞǽлљΣæča";
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize]; 
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
wprintf(L"%S", m);

However, I did not find any way to output unicode correctly with iostreams. Any suggestions?

This does not work:

SetConsoleOutputCP(CP_UTF8);
utf8_locale = locale(old_locale,new boost::program_options::detail::utf8_codecvt_facet());
wcout.imbue(utf8_locale);
wcout << L"¡Hola!" << endl;

EDIT
I could not find any other solution than to wrap this snippet around in a stream.
Hope, somebody has better ideas.

//Unicode output for a Windows console 
ostream &operator-(ostream &stream, const wchar_t *s) 
{ 
    int bufSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
    char *buf = new char[bufSize];
    WideCharToMultiByte(CP_UTF8, 0, s, -1, buf, bufSize, NULL, NULL);
    wprintf(L"%S", buf);
    delete[] buf; 
    return stream; 
} 

ostream &operator-(ostream &stream, const wstring &s) 
{ 
    stream - s.c_str();
    return stream; 
} 
Answers:

I have verified a solution here using Visual Studio 2010. Via this MSDN article and MSDN blog post. The trick is an obscure call to _setmode(..., _O_U16TEXT).

Solution:

#include <iostream>
#include <io.h>
#include <fcntl.h>

int wmain(int argc, wchar_t* argv[])
{
    _setmode(_fileno(stdout), _O_U16TEXT);
    std::wcout << L"Testing unicode -- English -- Ελληνικά -- Español." << std::endl;
}

Screenshot:

Unicode in console

Questions:
Answers:

The wcout must have the locale set differently to the CRT. Here’s how it can be fixed:

int _tmain(int argc, _TCHAR* argv[])
{
    char* locale = setlocale(LC_ALL, "English"); // Get the CRT's current locale.
    std::locale lollocale(locale);
    setlocale(LC_ALL, locale); // Restore the CRT.
    std::wcout.imbue(lollocale); // Now set the std::wcout to have the locale that we got from the CRT.
    std::wcout << L"¡Hola!";
    std::cin.get();
    return 0;
}

I just tested it, and it displays the string here absolutely fine.

Questions:
Answers:

SetConsoleCP() and chcp does not the same!

Take this program snippet:

SetConsoleCP(65001)  // 65001 = UTF-8
static const char s[]="tränenüberströmt™\n";
DWORD slen=lstrlen(s);
WriteConsoleA(GetStdHandle(STD_OUTPUT_HANDLE),s,slen,&slen,NULL);

The source code must be saved as UTF-8 without BOM (Byte Order Mark; Signature). Then, the Microsoft compiler cl.exe takes the UTF-8 strings as-is.
If this code is saved with BOM, cl.exe transcodes the string to ANSI (i.e. CP1252), which doesn’t match to CP65001 (= UTF-8).

Change the display font to Lucidia Console, otherwise, UTF-8 output will not work at all.

  • Type: chcp
  • Answer: 850
  • Type: test.exe
  • Answer: tr├ñnen├╝berstr├ÂmtÔäó
  • Type: chcp
  • Answer: 65001 – This setting has changed by SetConsoleCP() but with no useful effect.
  • Type: chcp 65001
  • Type: test.exe
  • Answer: tränenüberströmt™ – All OK now.

Tested with: German Windows XP SP3

Questions:
Answers:

I don’t think there is an easy answer. looking at Console Code Pages and SetConsoleCP Function it seems that you will need to set-up an appropriate codepage for the character-set you’re going to output.

Questions:
Answers:

Recenly I wanted to stream unicode from Python to windows console and here is the minimum I needed to make:

  • You should set console font to the one covering unicode symbols. There is not a wide choise: Console properties > Font > Lucida Console
  • You should change the current console codepage: run chcp 65001 in the Console or use the corresponding method in the C++ code
  • write to console using WriteConsoleW

Look through an interesing article about java unicode on windows console

Besides, in Python you can not write to default sys.stdout in this case, you will need to substitute it with something using os.write(1, binarystring) or direct call to a wrapper around WriteConsoleW. Seems like in C++ you will need to do the same.

Questions:
Answers:

First, sorry I probably don’t have the fonts required so I cannot test it yet.

Something looks a bit fishy here

// the following is said to be working
SetConsoleOutputCP(CP_UTF8); // output is in UTF8
wchar_t s[] = L"èéøÞǽлљΣæča";
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize]; 
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
wprintf(L"%S", m); // <-- upper case %S in wprintf() is used for MultiByte/utf-8
                   //     lower case %s in wprintf() is used for WideChar
printf("%s", m); // <-- does this work as well? try it to verify my assumption

while

// the following is said to have problem
SetConsoleOutputCP(CP_UTF8);
utf8_locale = locale(old_locale,
                     new boost::program_options::detail::utf8_codecvt_facet());
wcout.imbue(utf8_locale);
wcout << L"¡Hola!" << endl; // <-- you are passing wide char.
// have you tried passing the multibyte equivalent by converting to utf8 first?
int bufferSize = WideCharToMultiByte(CP_UTF8, 0, s, -1, NULL, 0, NULL, NULL);
char* m = new char[bufferSize]; 
WideCharToMultiByte(CP_UTF8, 0, s, -1, m, bufferSize, NULL, NULL);
cout << m << endl;

what about

// without setting locale to UTF8, you pass WideChars
wcout << L"¡Hola!" << endl;
// set locale to UTF8 and use cout
SetConsoleOutputCP(CP_UTF8);
cout << utf8_encoded_by_converting_using_WideCharToMultiByte << endl;

Questions:
Answers:

I had a similar problem, Output Unicode to console Using C++, in Windows contains the gem that you need to do chcp 65001 in the console before running your program.

There may be some way of doing this programatically, but I don’t know what it is.

Questions:
Answers:

Correctly displaying Western European characters in the windows console

Long story short:

  1. use chcp to find which codepage works for you. In my case it was chcp 28591 for Western Europe.
  2. optionally make it the default: REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 28591

History of the discovery

I had a similar problem, with Java. It is just cosmetic, since it involves log lines sent to the console; but it is still annoying.

The output from our Java application is supposed to be in UTF-8 and it displays correctly in eclipse’s console. But in windows console, it just shows the ASCII box-drawing characters: Inicializaci├│n and art├¡culos instead of Inicialización and artículos.

I stumbled upon a related question and mixed some of the answers to get to the solution that worked for me. The solution is changing the codepage used by the console and using a font that supports UNICODE (like consolas or lucida console). The font you can select in the system menu of the Windows cosole:

  1. Start a console by
    1. Win + R then type cmd and hit the Return key.
    2. Hit the Win key and type cmd followed by the return key.
  2. Open the system menu by
    1. click the upper left corner icon
    2. Hit the Alt + Space key combination
  3. then select “Default” to change the behavior of all subsequent console windows
  4. click the “Font” tab
  5. Select Consolas or Lucida console
  6. Click OK

Regarding the codepage, for a one-off case, you can get it done with the command chcp and then you have to investigate which codepage is correct for your set of characters. Several answers suggested UTF-8 codepage, which is 65001, but that codepage didn’t work for my Spanish characters.

Another answer suggested a batch script to interactively selecting the codepage you wanted from a list. There I found the codepage for ISO-8859-1 I needed: 28591. So you could execute

chcp 28591

before each execution of your application. You might check which code page is right for you in the Code Page Identifiers MSDN page.

Yet another answer indicated how to persist the selected codepage as the default for your windows console. It involves changing the registry, so consider yourself warned that you might brick your machine by using this solution.

REG ADD HKCU\Console /v CodePage /t REG_DWORD /d 28591

This creates the CodePage value with the 28591 data inside the HKCU\Console registry key. And that did work for me.

Please note that HKCU (“HKEY_CURRENT_USER”) is only for the current user. If you want to change it for all users in that computer, you’ll need to use the regedit utility and find/create the corresponding Console key (probably you’ll have to create a Console key inside HKEY_USERS\.DEFAULT)