Skip to main content
DevBench
U+

Unicode Checker

TextOffline-ready

Unicode Checker breaks any string into individual characters and shows each one's Unicode codepoint (U+xxxx), official name, general category, script, UTF-8 byte sequence, and HTML entity. It highlights invisible characters (zero-width space, soft hyphen, directional marks), homoglyph lookalikes, and unexpected scripts — essential for security auditing and debugging encoding issues.

Related: InspectRegexText → Hex

Your files and inputs stay in your browser — nothing is uploaded or stored.

61
Characters
32
Codepoints
78
UTF-8 bytes
8
Non-ASCII
2 suspicious characters detected — invisible formatting, directional overrides, or zero-width characters that may cause display or security issues.
Scripts detected:LatinHan (CJK)Emoji
Show:
GlyphCodepointNameCopy
HU+0048LATIN CAPITAL LETTER H
eU+0065LATIN SMALL LETTER E
lU+006CLATIN SMALL LETTER L
lU+006CLATIN SMALL LETTER L
oU+006FLATIN SMALL LETTER O
,U+002CCOMMA
U+0020SPACE
U+4E16U+4E16
U+754CU+754C
!U+0021EXCLAMATION MARK
U+0020SPACE
🌍U+1F30DU+1F30D
CTLU+000ALINE FEED (LF)
ZU+005ALATIN CAPITAL LETTER Z
eU+0065LATIN SMALL LETTER E
rU+0072LATIN SMALL LETTER R
oU+006FLATIN SMALL LETTER O
U+200BU+200BZero Width Space
WU+0057LATIN CAPITAL LETTER W
iU+0069LATIN SMALL LETTER I
dU+0064LATIN SMALL LETTER D
tU+0074LATIN SMALL LETTER T
hU+0068LATIN SMALL LETTER H
U+200BU+200BZero Width Space
SU+0053LATIN CAPITAL LETTER S
pU+0070LATIN SMALL LETTER P
aU+0061LATIN SMALL LETTER A
cU+0063LATIN SMALL LETTER C
eU+0065LATIN SMALL LETTER E
U+0020SPACE
hU+0068LATIN SMALL LETTER H
eU+0065LATIN SMALL LETTER E
rU+0072LATIN SMALL LETTER R
eU+0065LATIN SMALL LETTER E
.U+002EFULL STOP
CTLU+000ALINE FEED (LF)
SU+0053LATIN CAPITAL LETTER S
mU+006DLATIN SMALL LETTER M
aU+0061LATIN SMALL LETTER A
rU+0072LATIN SMALL LETTER R
tU+0074LATIN SMALL LETTER T
U+0020SPACE
U+201CLEFT DOUBLE QUOTATION MARK
qU+0071LATIN SMALL LETTER Q
uU+0075LATIN SMALL LETTER U
oU+006FLATIN SMALL LETTER O
tU+0074LATIN SMALL LETTER T
eU+0065LATIN SMALL LETTER E
sU+0073LATIN SMALL LETTER S
U+201DRIGHT DOUBLE QUOTATION MARK
U+0020SPACE
&U+0026AMPERSAND
U+0020SPACE
eU+0065LATIN SMALL LETTER E
mU+006DLATIN SMALL LETTER M
U+2014EM DASH
dU+0064LATIN SMALL LETTER D
aU+0061LATIN SMALL LETTER A
sU+0073LATIN SMALL LETTER S
hU+0068LATIN SMALL LETTER H
.U+002EFULL STOP

Unicode Checker — inspect every character

The Unicode Checker analyses text character by character, showing the Unicode code point (U+XXXX), UTF-8 byte sequence, Unicode category (letter, digit, punctuation, symbol, etc.), script (Latin, Cyrillic, Arabic, etc.), and HTML entity for each character. It also flags invisible and potentially dangerous characters such as zero-width joiners, right-to-left overrides, bidirectional control characters, and the BOM (byte order mark).

Why check for invisible Unicode characters?

Invisible Unicode characters — zero-width spaces (U+200B), zero-width non-joiners (U+200C), soft hyphens (U+00AD), right-to-left overrides (U+202E) — are often copy-pasted from untrusted sources and can cause subtle bugs: password mismatches, URL spoofing, broken string comparisons, and hidden Trojan-source attacks in code. The checker highlights all suspicious code points in red.

UTF-8 byte sequences

UTF-8 is a variable-length encoding: ASCII characters (U+0000–U+007F) use one byte, characters up to U+07FF use two bytes, up to U+FFFF use three bytes, and characters above U+FFFF (emoji, rare CJK extensions) use four bytes. Knowing the byte length is important when allocating storage (e.g. MySQL's utf8mb4 charset is required for four-byte characters) and when calculating byte offsets in binary protocols.