JSON Encoding and Unicode: Handling International Characters

jsonunicodeencodingi18n

JSON and Unicode work together to support every language on earth. But encoding issues still cause bugs. Here's what you need to know.

JSON and UTF-8

JSON specification requires UTF-8 encoding. All JSON strings are Unicode strings:

{

"greeting": "Hello",

"chinese": "你好",

"japanese": "こんにちは",

"arabic": "مرحبا",

"emoji": "👋🌍"

}

Unicode Escape Sequences

JSON supports \u escape sequences for any character:

{

"chinese": "\u4f60\u597d",

"emoji": "\ud83d\udc4b",

"copyright": "\u00a9"

}

Common Encoding Problems

Mojibake (Garbled Text)

When UTF-8 JSON is decoded as Latin-1:

Expected: 你好

Got: ä½ å¥½

Fix: Always set Content-Type: application/json; charset=utf-8 in HTTP headers.

BOM Issues

A Byte Order Mark at the start of a file can break JSON.parse():

// Remove BOM before parsing

const clean = jsonStr.replace(/^/, '');

const data = JSON.parse(clean);

Surrogate Pairs

Emoji and rare characters use surrogate pairs in JSON:

{

"emoji": "\uD83D\uDE00"

}

JavaScript handles this automatically with JSON.parse().

Best Practices

  • Always use UTF-8 — No exceptions
  • Set charset in headersapplication/json; charset=utf-8
  • Don't escape unnecessarily — Modern parsers handle Unicode directly
  • Test with real data — Include Chinese, Arabic, emoji in test cases
  • Use a validator — Check encoding issues early
  • Working with Different Languages

    // Safe encoding

    const data = { text: "中文 日本語 한국어" };

    const json = JSON.stringify(data); // Handles all characters

    // Reading from file

    const fs = require('fs');

    const content = fs.readFileSync('data.json', 'utf-8');

    const parsed = JSON.parse(content);

    Use our JSON Validator to check for encoding issues in your JSON files.

    Related Tools