JSON Encoding and Unicode: Handling International Characters

JSON and Unicode work together to support every language on earth. But encoding issues still cause bugs. Here's what you need to know.

JSON and UTF-8

JSON specification requires UTF-8 encoding. All JSON strings are Unicode strings:

{
  "greeting": "Hello",
  "chinese": "你好",
  "japanese": "こんにちは",
  "arabic": "مرحبا",
  "emoji": "👋🌍"
}

Unicode Escape Sequences

JSON supports \u escape sequences for any character:

{
  "chinese": "\u4f60\u597d",
  "emoji": "\ud83d\udc4b",
  "copyright": "\u00a9"
}

Common Encoding Problems

Mojibake (Garbled Text)

When UTF-8 JSON is decoded as Latin-1:

Expected: 你好
Got: ä½ å¥½

Fix: Always set Content-Type: application/json; charset=utf-8 in HTTP headers.

BOM Issues

A Byte Order Mark at the start of a file can break JSON.parse():

// Remove BOM before parsing
const clean = jsonStr.replace(/^/, '');
const data = JSON.parse(clean);

Surrogate Pairs

Emoji and rare characters use surrogate pairs in JSON:

{
  "emoji": "\uD83D\uDE00"
}

JavaScript handles this automatically with JSON.parse().

Best Practices

Always use UTF-8 — No exceptions

Set charset in headers — application/json; charset=utf-8

Don't escape unnecessarily — Modern parsers handle Unicode directly

Test with real data — Include Chinese, Arabic, emoji in test cases

Use a validator — Check encoding issues early

Working with Different Languages

// Safe encoding
const data = { text: "中文 日本語 한국어" };
const json = JSON.stringify(data); // Handles all characters
// Reading from file
const fs = require('fs');
const content = fs.readFileSync('data.json', 'utf-8');
const parsed = JSON.parse(content);

Use our JSON Validator to check for encoding issues in your JSON files.