What is a Unicode codepoint?

A Unicode codepoint is a number assigned to each character in the Unicode standard — over 149,000 characters across 154 scripts. Codepoints are written as U+XXXX (e.g. U+0041 for 'A', U+1F600 for 😀). The checker shows the codepoint, official Unicode name, and category for every character in your input.

What are invisible characters and why are they dangerous?

Invisible characters include zero-width space (U+200B), zero-width non-joiner (U+200C), soft hyphen (U+00AD), byte order mark (U+FEFF), and directional marks (U+200E, U+200F). They are invisible in most editors but present in the string. They can cause string comparison failures, security issues (domain homograph attacks), and unexpected word-boundary behaviour.

A homoglyph is a character that looks visually similar to another — for example, the Cyrillic 'о' (U+043E) looks identical to the Latin 'o' (U+006F). Homoglyphs are used in phishing (substituting Cyrillic letters in domain names) and obfuscation attacks. The checker flags characters from unexpected scripts.

What does UTF-8 bytes show?

For each character, the checker shows the UTF-8 byte sequence — the actual bytes stored on disk or transmitted over the network. 'A' is one byte (0x41). '€' is three bytes (0xE2 0x82 0xAC). Emoji are typically four bytes. This is useful for calculating byte lengths of strings for database varchar limits and HTTP Content-Length.

Is my text safe here?

Yes. Analysis runs in your browser. Nothing is sent to a server.

Back to Text All tools

U+

Unicode Checker

TextOffline-ready

Unicode Checker breaks any string into individual characters and shows each one's Unicode codepoint (U+xxxx), official name, general category, script, UTF-8 byte sequence, and HTML entity. It highlights invisible characters (zero-width space, soft hyphen, directional marks), homoglyph lookalikes, and unexpected scripts — essential for security auditing and debugging encoding issues.

Related: InspectRegexText → Hex

Your files and inputs stay in your browser — nothing is uploaded or stored.

Characters

Codepoints

UTF-8 bytes

Non-ASCII

2 suspicious characters detected — invisible formatting, directional overrides, or zero-width characters that may cause display or security issues.

Scripts detected:LatinHan (CJK)Emoji

Input text

Codepoint lookup

Show:

Glyph	Codepoint	Name	Category	Script	UTF-8	HTML
H	U+0048	LATIN CAPITAL LETTER H	Letter, Uppercase	Latin	48	H
e	U+0065	LATIN SMALL LETTER E	Letter, Lowercase	Latin	65	e
l	U+006C	LATIN SMALL LETTER L	Letter, Lowercase	Latin	6C	l
l	U+006C	LATIN SMALL LETTER L	Letter, Lowercase	Latin	6C	l
o	U+006F	LATIN SMALL LETTER O	Letter, Lowercase	Latin	6F	o
,	U+002C	COMMA	Punctuation, Other	ASCII	2C	,
	U+0020	SPACE	Separator, Space	ASCII	20
世	U+4E16	U+4E16	Letter, Other	Han (CJK)	E4 B8 96	世
界	U+754C	U+754C	Letter, Other	Han (CJK)	E7 95 8C	界
!	U+0021	EXCLAMATION MARK	Punctuation, Other	ASCII	21	!
	U+0020	SPACE	Separator, Space	ASCII	20
🌍	U+1F30D	U+1F30D	Symbol, Other	Emoji	F0 9F 8C 8D	🌍
`CTL`	U+000A	LINE FEED (LF)	Control	ASCII	0A
Z	U+005A	LATIN CAPITAL LETTER Z	Letter, Uppercase	Latin	5A	Z
e	U+0065	LATIN SMALL LETTER E	Letter, Lowercase	Latin	65	e
r	U+0072	LATIN SMALL LETTER R	Letter, Lowercase	Latin	72	r
o	U+006F	LATIN SMALL LETTER O	Letter, Lowercase	Latin	6F	o
`U+200B`	U+200B	Zero Width Space	Format	Common / Other	E2 80 8B
W	U+0057	LATIN CAPITAL LETTER W	Letter, Uppercase	Latin	57	W
i	U+0069	LATIN SMALL LETTER I	Letter, Lowercase	Latin	69	i
d	U+0064	LATIN SMALL LETTER D	Letter, Lowercase	Latin	64	d
t	U+0074	LATIN SMALL LETTER T	Letter, Lowercase	Latin	74	t
h	U+0068	LATIN SMALL LETTER H	Letter, Lowercase	Latin	68	h
`U+200B`	U+200B	Zero Width Space	Format	Common / Other	E2 80 8B
S	U+0053	LATIN CAPITAL LETTER S	Letter, Uppercase	Latin	53	S
p	U+0070	LATIN SMALL LETTER P	Letter, Lowercase	Latin	70	p
a	U+0061	LATIN SMALL LETTER A	Letter, Lowercase	Latin	61	a
c	U+0063	LATIN SMALL LETTER C	Letter, Lowercase	Latin	63	c
e	U+0065	LATIN SMALL LETTER E	Letter, Lowercase	Latin	65	e
	U+0020	SPACE	Separator, Space	ASCII	20
h	U+0068	LATIN SMALL LETTER H	Letter, Lowercase	Latin	68	h
e	U+0065	LATIN SMALL LETTER E	Letter, Lowercase	Latin	65	e
r	U+0072	LATIN SMALL LETTER R	Letter, Lowercase	Latin	72	r
e	U+0065	LATIN SMALL LETTER E	Letter, Lowercase	Latin	65	e
.	U+002E	FULL STOP	Punctuation, Other	ASCII	2E	.
`CTL`	U+000A	LINE FEED (LF)	Control	ASCII	0A
S	U+0053	LATIN CAPITAL LETTER S	Letter, Uppercase	Latin	53	S
m	U+006D	LATIN SMALL LETTER M	Letter, Lowercase	Latin	6D	m
a	U+0061	LATIN SMALL LETTER A	Letter, Lowercase	Latin	61	a
r	U+0072	LATIN SMALL LETTER R	Letter, Lowercase	Latin	72	r
t	U+0074	LATIN SMALL LETTER T	Letter, Lowercase	Latin	74	t
	U+0020	SPACE	Separator, Space	ASCII	20
“	U+201C	LEFT DOUBLE QUOTATION MARK	Punctuation, Initial	Common / Other	E2 80 9C	“
q	U+0071	LATIN SMALL LETTER Q	Letter, Lowercase	Latin	71	q
u	U+0075	LATIN SMALL LETTER U	Letter, Lowercase	Latin	75	u
o	U+006F	LATIN SMALL LETTER O	Letter, Lowercase	Latin	6F	o
t	U+0074	LATIN SMALL LETTER T	Letter, Lowercase	Latin	74	t
e	U+0065	LATIN SMALL LETTER E	Letter, Lowercase	Latin	65	e
s	U+0073	LATIN SMALL LETTER S	Letter, Lowercase	Latin	73	s
”	U+201D	RIGHT DOUBLE QUOTATION MARK	Punctuation, Final	Common / Other	E2 80 9D	”
	U+0020	SPACE	Separator, Space	ASCII	20
&	U+0026	AMPERSAND	Punctuation, Other	ASCII	26	&
	U+0020	SPACE	Separator, Space	ASCII	20
e	U+0065	LATIN SMALL LETTER E	Letter, Lowercase	Latin	65	e
m	U+006D	LATIN SMALL LETTER M	Letter, Lowercase	Latin	6D	m
—	U+2014	EM DASH	Punctuation, Dash	Common / Other	E2 80 94	—
d	U+0064	LATIN SMALL LETTER D	Letter, Lowercase	Latin	64	d
a	U+0061	LATIN SMALL LETTER A	Letter, Lowercase	Latin	61	a
s	U+0073	LATIN SMALL LETTER S	Letter, Lowercase	Latin	73	s
h	U+0068	LATIN SMALL LETTER H	Letter, Lowercase	Latin	68	h
.	U+002E	FULL STOP	Punctuation, Other	ASCII	2E	.

Unicode Checker

Unicode Checker — inspect every character

Why check for invisible Unicode characters?

UTF-8 byte sequences

Unicode Checker

Related tools

Unicode Checker — inspect every character

Why check for invisible Unicode characters?

UTF-8 byte sequences