March 26, 2008

_bstr_t and Converting to Unicode (C++)

I am trying the following code and it does not convert to unicode. Can you help?

     HRESULT hr;
    IChilkatCharset2Ptr cs;
    char dest[1024];
    hr = ::CoInitialize(NULL);
    hr = cs.CreateInstance("Chilkat.Charset2");
    if (FAILED(hr))
	return FALSE;
    hr = cs->UnlockComponent("AnythingWorksFor30DayTrial");
    cs->FromCharset = "iso-8859-1";
    cs->ToCharset = "utf-8";
    _bstr_t indata = _bstr_t("Din Saveme konto behøver din opmærksomhed");
    _variant_t v = indata;
    _bstr_t outdata = cs->ConvertToUnicode(v);

A _bstr_t is an object that contains a Unicode string. The _bstr_t stores the string in memory using Unicode (ucs-2, 2-bytes/char). This line of code is where an ANSI-to-Unicode implicit conversion is happening:

_bstr_t indata = _bstr_t("Din Saveme konto behøver din opmærksomhed");

Assuming you saved your C++ source file using the ANSI charset, the compiler generated code to initialize the _bstr_t from an ANSI string. At this point, there is nothing more to do. You already have Unicode.

(If you saved your C++ source file in a non-ANSI charset, such as utf-8, then the compiler still generates code to convert from ANSI to Unicode, but since the bytes are utf-8, they will not be interpreted correctly. Utf-8 is the multi-byte encoding of Unicode.
see Charset 101.

Let’s say you still want to call ConvertToUnicode. What would the code look like?

    cs->FromCharset = "ucs-2";
    // The ToCharset does not apply when calling ConvertToUnicode, so this is not necessary:
    //cs->ToCharset = "utf-8";
    // The _bstr_t contains Unicode (ucs-2), so the FromCharset (above) is ucs-2
    _bstr_t indata = _bstr_t("Din Saveme konto behøver din opmærksomhed");
    // The _variant_t now contains the _bstr_t, but nothing has changed..
    _variant_t v = indata;
    // We're now calling ConvertToUnicode -- converting from ucs-2 to ucs-2.
    // Internally, ConvertToUnicode is just making a copy of the string (no conversion necessary)
    _bstr_t outdata = cs->ConvertToUnicode(v);

