How Character Encoding of URL-Encoded HTTP Query Parameters Matters

When data is sent via HTTP query parameters (the part of a URL after "?"), it is typically URL-encoded to make it safe for transmission over the internet. This encoding converts characters into a format that can be safely included in a URL.


URL Encoding Basics

URL encoding replaces unsafe characters with a "%" followed by two hexadecimal digits representing the byte value in ASCII. For example:

  • Space (" ") → "%20"
  • Exclamation mark ("!") → "%21"
  • Unicode character ("✓") → "%E2%9C%93"
Original: Hello World!
URL Encoded: Hello%20World%21

Character Encoding Matters

The character encoding (charset) used to transform text into bytes directly affects the URL encoding:

  • UTF-8: Multi-byte encoding, most common for web applications.
  • ISO-8859-1 (Latin-1): Single-byte encoding, sometimes used in older systems.
  • UTF-16: Rare for URLs, but possible; it creates larger URL-encoded values.
Example URL encoding the word "café"
from urllib.parse import quote
# UTF-8 encoding
utf8_encoded = quote("café".encode('utf-8'))
print(utf8_encoded)  # Output: caf%C3%A9
# ISO-8859-1 encoding
iso_encoded = quote("café".encode('iso-8859-1'))
print(iso_encoded)   # Output: caf%E9

* UTF-8 → "café" becomes "caf%C3%A9" (two bytes for "é": "C3 A9")

* ISO-8859-1 → "café" becomes "caf%E9" (one byte for "é": "E9")

If the server expects UTF-8 but receives ISO-8859-1, it will misinterpret the bytes.


UTF-8 is the Typical Charset for Query Parameters

RFC 3986 (Uniform Resource Identifier specification) recommends UTF-8 as the standard encoding for URLs. Browsers (Chrome, Firefox, Safari) automatically use UTF-8 for Form submissions, AJAX requests, and URL parameters. APIs and RESTful services generally expect UTF-8, unless specified otherwise.


Problems if the Charset Doesn't Match

If the server expects UTF-8 but receives a different encoding then multibyte (non-us-ascii) characters may be corrupted. Symbols like "€", "✓", or non-Latin characters could be misread. For example, an API expecting "café" in UTF-8 might receive garbled text if sent in ISO-8859-1.


Specifying Character Encoding

If you need to specify a different charset, it must be declared in the Content-Type header:

Content-Type: application/x-www-form-urlencoded; charset=ISO-8859-1