Skip to content
Toolcroft

Developer Tools

String Byte-Size Counter - UTF-8, UTF-16, and UTF-32 Lengths

Count how many bytes a string occupies in UTF-8, UTF-16 LE/BE, and UTF-32 encodings. See which characters are multi-byte and their individual byte sizes.

EncodingCount
JS .length (code units)16
Unicode code points15
UTF-8 bytes18
UTF-16 LE bytes32
UTF-16 BE bytes32
UTF-32 bytes60

Multi-byte Characters

Char ▲▼Codepoint ▲▼UTF-8 bytes ▲▼UTF-16 bytes ▲▼
🌍U+1F30D44

Understanding String Byte Sizes

Different text encodings represent the same string using different numbers of bytes. A simple ASCII string uses 1 byte per character in UTF-8 but 2 bytes in UTF-16, while an emoji can use 4 bytes in UTF-8 and 4 bytes in UTF-16.

Why This Matters

Database column widths, network payload sizes, and file storage estimates all depend on the encoding used. This tool lets you quickly check the exact byte footprint of any string.

Encoding comparison table

Character typeExampleUTF-8 bytesUTF-16 bytesUTF-32 bytes
ASCII letterA124
Extended Latiné224
CJK character324
Emoji (BMP)324
Emoji (surrogate pair)😀444
ZWJ family emoji👨‍👩‍👧181812

Database column sizing

  • MySQL VARCHAR(255) with utf8mb4: each character can be up to 4 bytes. VARCHAR(255) can store up to 1,020 bytes, but InnoDB has a row size limit of ~65,535 bytes. Emoji require utf8mb4 - the older utf8 charset only supports 3-byte characters and will silently truncate emoji.
  • PostgreSQL TEXT: natively stores UTF-8; effectively unlimited length. Character-length limits (VARCHAR(n)) count Unicode code points, not bytes.
  • SQL Server NVARCHAR: stores UTF-16LE encoding at 2 bytes per character (BMP). NVARCHAR(255) reserves 510 bytes. Supplementary characters (surrogate pairs) consume 4 bytes each and may require setting a supplementary character collation.

Network payload notes

  • HTTP/2 header compression (HPACK): header values are transmitted as UTF-8 bytes. Large headers (e.g., JWT tokens in Authorization headers) add up; a typical JWT is 300–1,500 bytes.
  • Content-Length header: this value must match the byte length of the body in UTF-8, not the character count. A 100-character string with CJK characters can be 300 bytes.
  • JWT size limits: AWS API Gateway and many proxies enforce an 8 KB header limit. Large JWTs with many claims can hit this limit, causing 431 (Request Header Fields Too Large) errors.