Putting UTF-8 into C/C++ Source Code
After much googling, I could not find any tools for converting a UTF-8 string into an escaped C/C++ string literal suitable for pasting into an ASCII source file. Therefore I produced this Perl script which seems to provide a fairly readable escaped string:
use strict;
chomp;
print '"';
my $prev_esc = 0;
print map
{
if (ord $_ > 0x7f) {
$prev_esc = 1;
sprintf('\\x%lx', ord $_);
} else {
my $need_break = $prev_esc && /[0-9A-Fa-f]/;
$prev_esc = 0;
($need_break ? '" "' : '') . $_;
}
}
split('', $_);
print '"' . "\n";
Run it with the Perl -n option, and it will output an escaped string literal for each line input:
$perl -n utf8esc.pl Grüße aus Bärenhöfe "Gr\xc3\xbc\xc3\x9f" "e aus B\xc3\xa4renh\xc3\xb6" "fe"
Hit Ctrl-D on a blank line to exit.
Unfortunately, C/C++ seems to have the strange rule that all hex characters following a “\x” apply to that escape sequence, even though the maximum value allowed is 0xff. Therefore it is necessary to break the string into separate segments.
No Comments so far
Leave a comment
Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>