Zip Component, Email Component, Encryption Component ActiveX Control for Zip Compression .NET Components for ASP.NET
ActiveX and .NET Components for Zip Compression, Encryption, Email, XML, S/MIME, HTML Email, Character Encoding, Digital Certificates, FTP, and more ASP Email ActiveX Component


Downloads
.NET 2.0
.NET 1.*
.NET x64
VC++ 6.0
VC++ 7.0
VC++ 8.0
Java
Ruby
Perl 5.8.*
Perl 5.10.*
Python
Bounce ActiveX
Charset ActiveX
Email ActiveX
FTP2 ActiveX
Crypt ActiveX
HTML-to-XML ActiveX
HTTP ActiveX
IMAP ActiveX
MHT ActiveX
MIME ActiveX
RSA ActiveX
Socket ActiveX
Spider ActiveX (free)
String ActiveX (free)
Tar ActiveX
Upload ActiveX (free)
XML ActiveX (free)
XMP ActiveX
Zip ActiveX

Index of Chilkat Blog Posts

March 26, 2008

_bstr_t and Converting to Unicode (C++)

Question:
I am trying the following code and it does not convert to unicode. Can you help?

     HRESULT hr;
    IChilkatCharset2Ptr cs;
    char dest[1024];
	
    hr = ::CoInitialize(NULL);
    hr = cs.CreateInstance("Chilkat.Charset2");
    if (FAILED(hr))
	return FALSE;
	
    hr = cs->UnlockComponent("AnythingWorksFor30DayTrial");
	
    cs->FromCharset = "iso-8859-1";
    cs->ToCharset = "utf-8";
    _bstr_t indata = _bstr_t("Din Saveme konto behøver din opmærksomhed");
    _variant_t v = indata;
    _bstr_t outdata = cs->ConvertToUnicode(v);
	
    CoUninitialize();

Answer:
A _bstr_t is an object that contains a Unicode string. The _bstr_t stores the string in memory using Unicode (ucs-2, 2-bytes/char). This line of code is where an ANSI-to-Unicode implicit conversion is happening:

_bstr_t indata = _bstr_t("Din Saveme konto behøver din opmærksomhed");

Assuming you saved your C++ source file using the ANSI charset, the compiler generated code to initialize the _bstr_t from an ANSI string. At this point, there is nothing more to do. You already have Unicode.

(If you saved your C++ source file in a non-ANSI charset, such as utf-8, then the compiler still generates code to convert from ANSI to Unicode, but since the bytes are utf-8, they will not be interpreted correctly. Utf-8 is the multi-byte encoding of Unicode.
see Charset 101.

Let’s say you still want to call ConvertToUnicode. What would the code look like?

    cs->FromCharset = "ucs-2";
    // The ToCharset does not apply when calling ConvertToUnicode, so this is not necessary:
    //cs->ToCharset = "utf-8";
    // The _bstr_t contains Unicode (ucs-2), so the FromCharset (above) is ucs-2
    _bstr_t indata = _bstr_t("Din Saveme konto behøver din opmærksomhed");
    // The _variant_t now contains the _bstr_t, but nothing has changed..
    _variant_t v = indata;
    // We're now calling ConvertToUnicode -- converting from ucs-2 to ucs-2.
    // Internally, ConvertToUnicode is just making a copy of the string (no conversion necessary)
    _bstr_t outdata = cs->ConvertToUnicode(v);


Privacy Statement. Copyright 2000-2008 Chilkat Software, Inc. All rights reserved.
Send feedback to support@chilkatsoft.com

Components for Microsoft Windows XP, 2000, 2003 Server, Vista, and Windows 95/98/NT4.