JSON and Unicode work together to support every language on earth. But encoding issues still cause bugs. Here's what you need to know.
JSON and UTF-8
JSON specification requires UTF-8 encoding. All JSON strings are Unicode strings:
{
"greeting": "Hello",
"chinese": "你好",
"japanese": "こんにちは",
"arabic": "مرحبا",
"emoji": "👋🌍"
}
Unicode Escape Sequences
JSON supports \u escape sequences for any character:
{
"chinese": "\u4f60\u597d",
"emoji": "\ud83d\udc4b",
"copyright": "\u00a9"
}
Common Encoding Problems
Mojibake (Garbled Text)
When UTF-8 JSON is decoded as Latin-1:
Expected: 你好
Got: ä½ å¥½
Fix: Always set Content-Type: application/json; charset=utf-8 in HTTP headers.
BOM Issues
A Byte Order Mark at the start of a file can break JSON.parse():
// Remove BOM before parsing
const clean = jsonStr.replace(/^/, '');
const data = JSON.parse(clean);
Surrogate Pairs
Emoji and rare characters use surrogate pairs in JSON:
{
"emoji": "\uD83D\uDE00"
}
JavaScript handles this automatically with JSON.parse().
Best Practices
application/json; charset=utf-8Working with Different Languages
// Safe encoding
const data = { text: "中文 日本語 한국어" };
const json = JSON.stringify(data); // Handles all characters
// Reading from file
const fs = require('fs');
const content = fs.readFileSync('data.json', 'utf-8');
const parsed = JSON.parse(content);
Use our JSON Validator to check for encoding issues in your JSON files.