How Character Encoding (Charset) Affects RSA Operations

When using RSA for encryption, decryption, signing, or signature validation, the character encoding (charset) of the text matters significantly because RSA operations work on raw bytes, not text characters.


Text to Bytes Conversion

Before any RSA operation, text needs to be converted to bytes using a specific character encoding:

* UTF-8: The most common, variable-length encoding.

* UTF-16: Fixed-length encoding, used internally by Windows.

* ISO-8859-1: Single-byte per character, primarily for Western European languages.

# Example: Text to bytes conversion
message = "Hello, RSA!"
utf8_bytes = message.encode('utf-8')
utf16_bytes = message.encode('utf-16')
print(utf8_bytes)   # b'Hello, RSA!'
print(utf16_bytes)  # b'\xff\xfeH\x00e\x00l\x00l\x00o\x00,\x00 \x00R\x00S\x00A\x00!'

RSA Encryption and Charset

When encrypting text:

  1. The plaintext is converted to byte format.
  2. RSA then encrypts those bytes using the public key.
  3. The resulting ciphertext is raw bytes — usually Base64-encoded for transport.

RSA Decryption and Charset

When decrypting:

  1. The encrypted bytes are decrypted back to raw bytes.
  2. These bytes need to be converted back to text using the same charset used during encryption.

If you switch charsets (e.g., UTF-8 to UTF-16), the resulting text will be garbled or throw an error.


RSA Signing and Charset

When signing a message:

  1. The message text is converted to bytes.
  2. The byte representation is hashed (e.g., SHA-256).
  3. The hash is signed with the private key.

If you use different charsets, the hash will be different:

* UTF-8 encoding of "Hello, RSA!" hashes differently from UTF-16 encoding.


RSA Signature Verification and Charset

To verify a signature:

  1. The original message must be converted to bytes using the same encoding.
  2. The hash must match the signed hash after decryption with the public key.

If the encoding is different, the verification fails.


Summary

Operation Encoding Matters? Why?
Encryption Yes Byte conversion changes the raw data for encryption
Decryption Yes Must decode with the same charset used in encryption
Signing Yes Hashing is done on bytes, different charsets produce different hashes
Verification Yes Signature is validated against the byte-encoded message