Input

IN
Input
0 chars

Output

OUT
Output
0 chars

About UTF-8 Encoding

UTF-8 (Unicode Transformation Format - 8-bit) is a variable-width character encoding that uses 1-4 bytes per character. It's backward compatible with ASCII and is the dominant encoding on the web. Unlike showing Unicode code points, this tool outputs the actual UTF-8 byte values β€” the real bytes stored in memory and transmitted over networks.

βœ“Real UTF-8 byte encoding
βœ“Unicode & emoji support
βœ“Decimal & hexadecimal formats
βœ“Live byte count tracking

Complete Guide to UTF-8 Encoding

Free Online Text to UTF-8 Byte Converter

Convert text to actual UTF-8 byte values or decode UTF-8 bytes back to text instantly. This tool shows the real bytes that computers use to store and transmit text β€” not just code points. Perfect for developers, network engineers, and anyone debugging character encoding issues.

Key Features

πŸ” Text to UTF-8 Encoding

  • Convert any text to UTF-8 bytes
  • Full Unicode character support
  • Handles emojis and special symbols
  • Real-time conversion as you type
  • Decimal and hex byte output

πŸ”“ UTF-8 to Text Decoding

  • Decode UTF-8 bytes to readable text
  • Validates byte sequences
  • Error detection & messages
  • Handles space/comma separators
  • Supports hex input (0xFF format)

⚑ Real-Time Processing

  • Instant conversion on input
  • 300ms debounce for performance
  • Live byte count display
  • No button clicks required

πŸ’Ύ Export Options

  • Download as .txt file
  • Export as .html file
  • Save as .json format
  • One-click copy to clipboard

What is UTF-8?

UTF-8 (Unicode Transformation Format - 8-bit) is a variable-width character encoding that can represent every character in the Unicode standard. It's backward compatible with ASCII (first 128 characters are identical) and uses 1-4 bytes per character. UTF-8 is now the dominant character encoding on the web and supports all languages, symbols, and emojis.

UTF-8 Byte Ranges:

1 byte (0x00-0x7F): Basic ASCII characters (A, B, 0-9, etc.)

2 bytes (0xC0-0xDF + 0x80-0xBF): Latin extended, Greek, Cyrillic, Arabic, Hebrew

3 bytes (0xE0-0xEF + 2Γ—0x80-0xBF): Most Asian languages (Chinese, Japanese, Korean), symbols

4 bytes (0xF0-0xF7 + 3Γ—0x80-0xBF): Rare languages, musical notation, emojis

UTF-8 vs Code Points

Many tools claim to show β€œUTF-8” but actually show Unicode code points (the abstract number assigned to each character). This tool shows the actual UTF-8 bytesβ€” the real data stored in files and sent over networks. Here's the difference:

Example: β€œβ‚¬β€ (Euro sign)

Code point: U+20AC (decimal: 8364) β€” one number

UTF-8 bytes: 0xE2 0x82 0xAC (decimal: 226 130 172) β€” three bytes

Example: β€œπŸ˜€β€ (Grinning face)

Code point: U+1F600 (decimal: 128512) β€” one number

UTF-8 bytes: 0xF0 0x9F 0x98 0x80 (decimal: 240 159 152 128) β€” four bytes

How UTF-8 Encoding Works

  1. ASCII characters (U+0000 to U+007F): Encoded as a single byte, identical to ASCII. Example: 'A' β†’ 0x41
  2. 2-byte characters (U+0080 to U+07FF): First byte starts with 110xxxxx, second with 10xxxxxx. Example: 'Γ©' β†’ 0xC3 0xA9
  3. 3-byte characters (U+0800 to U+FFFF): First byte starts with 1110xxxx, followed by two 10xxxxxx bytes. Example: '€' β†’ 0xE2 0x82 0xAC
  4. 4-byte characters (U+10000 to U+10FFFF): First byte starts with 11110xxx, followed by three 10xxxxxx bytes. Example: 'πŸ˜€' β†’ 0xF0 0x9F 0x98 0x80

Common Use Cases

Debugging Encoding Issues: See the actual bytes stored in files to diagnose mojibake, garbled text, or encoding mismatches.

Network Analysis: Verify how text is encoded when transmitted over HTTP, WebSocket, or other protocols.

Database Debugging: Check UTF-8 byte sequences stored in databases to troubleshoot character set issues.

Education: Learn how UTF-8 encoding works at the byte level and understand variable-width encoding.

File Analysis: Understand how text editors and systems store characters in UTF-8 encoded files.

Quick Reference: UTF-8 Byte Examples

CharacterUTF-8 Bytes (Hex)Byte Count
A411 byte
Γ©C3 A92 bytes
€E2 82 AC3 bytes
δΈ­E4 B8 AD3 bytes
πŸ˜€F0 9F 98 804 bytes
🌍F0 9F 8C 8D4 bytes

Programming Examples

Get UTF-8 Bytes in Different Languages:

JavaScript:

new TextEncoder().encode('€') // Uint8Array [226, 130, 172]
new TextDecoder().decode(new Uint8Array([226, 130, 172])) // '€'

Python:

'€'.encode('utf-8') # b'\xe2\x82\xac'
b'\xe2\x82\xac'.decode('utf-8') # '€'

Java:

"€".getBytes(StandardCharsets.UTF_8) // [-30, -126, -84] (signed)
new String(bytes, StandardCharsets.UTF_8) // "€"

πŸ”’ 100% Privacy Guaranteed

All UTF-8 encoding and decoding is performed entirely in your web browser using JavaScript. Your text and data never leave your device - nothing is uploaded to servers, stored in databases, logged, or transmitted to any third party. Complete privacy and security for all your conversions.

Learn More About UTF-8

Want to understand how UTF-8 encoding works under the hood? Read our in-depth guide covering variable-width encoding, byte patterns, step-by-step encoding examples, and best practices.

Read: What is UTF-8?