International Components for Unicode

ICU Home
  · ICU Home
ICU4C Demos
  · Converter Explorer
  · Collation Demo
  · Segments
  · IDNA
  · Locale Explorer
  · Normalization Browser
  · Regular Expressions
  · String Compare
  · Transforms
  · Unicode Browser
ICU4J Demos
  · Demo Page
Tools
 

Related Websites

Unicode Consortium

Common Locale Data

 

ICU  >  Demo  > 

IDNA Demo


Results of Operation
ModeTextCode Points
Input (empty)
ToASCII(input)  
ToUnicode(ToASCII(input))  
ToUnicode(input)  
ToASCII(ToUnicode(input))  

About this demo

This CGI program demostrates the IDNA implementation. The RFC defines 2 operations: ToASCII and ToUnicode. Domain labels containing non-ASCII code points are required to be processed by ToASCII operation before passing it to resolver libraries. Domain names that are obtained from resolver libraries are required to be processed by ToUnicode operation before displaying the domain name to the user. IDNA requires that implementations process input strings with Nameprep, which is a profile of Stringprep , and then with Punycode.

In the above demo, different combinations of ToASCII and ToUnicode are applied to the input. It also provides a simple illustration of how a GUI can visually indicate boundaries between different scripts, to help avoid spoofing. The code is rough, and only meant for illustration. One could certainly refine this to call out more characters that are visually confusable. For example, many CJK Radicals are identical in appearance to CJK Ideographs. Mixtures of simplified and traditional characters can also be visually highlighted, to help signal possible user errors.


Examples
You can either paste in Unicode text into the above box, or you can use Unicode escapes. For example, you can either use "ä" or "\u00E4", or could use the decomposition "a\u0308". You can also copy some interesting Unicode text samples from the following pages:

Unicode version used by IDNA 3.2 — Powered by ICU 74.1