ICU Locale Explorer > Help

This file contains the help text for Locale Explorer, organized by topic. If your question is not covered here, you can post your question on the icu-support mailing list - http://www.icu-project.org/contacts.html). Please post any comments or suggestions for improvement of this help file on the ICU bug database.

Display Problems

The output of this dynamic page is dependent on your browser settings, installed fonts, and which encoding you choose. If text displays incorrectly, you may need to change the encoding at the bottom of each Locale Explorer page.

Display Q&A

Why do I see boxes or question-marks?
It may be that you do not have the proper font selected, or the font you have selected is missing some letters. On Netscape, go into Edit -> Preferences and select the Fonts sub panel. On Microsoft Internet Explorer, go into the Internet Options dialog and click on Fonts.
Also make sure the right encoding is selected.
No text, "Garbage" text, Wrong Encoding
An Encoding is the specific method used to transfer letters to your browser. Each encoding has a specific charset, or set of characters which it understands how to handle. UTF-8 is the preferred encoding to use, as it can transfer directly any Unicode value (and therefore, any text which the Locale Explorer tries to show.) However, if your browser does not understand UTF-8 or does not have the appropriate fonts, you may choose from among several other industry-standard encodings. Be aware that some encodings will not be able to encode specific characters. If you try to view a Greek locale using the iso-8859-1 (Latin-1, Western European) encoding, you will simply see many \uXXXX letters (see below). Check your browser settings to see which encodings it supports.
The Locale Explorer will attempt to negotiate an encoding with your browser, however you may force it to use a different encoding by clicking on the name of the current encoding in the 'Your Settings' box, at the bottom of each Locale Explorer page.
I see '\u1234' instead of a letter.
This is because the encoding you have chosen does not support the unicode character selected. Since the purpose of the Locale Explorer is educational, it displays the Unicode value instead of a questionmark. Note that the Sort, Numberformat, and Dateformat demos all accept the \uXXXX notation in their input fields. ("\u" was chosen because this format is also used in ICU locale bundle files, and is a standard feature of Java as well.)
Text is 'backwards', letters are drawn incorrectly. (Non-Latin scripts)
Drawing of some scripts, such as Arabic, depends heavily on the browser and OS having the correct fonts and drawing support. As well, scripts which are written right-to-left (such as Arabic or Hebrew) depend on the browser/OS to understand drawing text in this direction. The Locale Explorer does not attempt to correct for these situations.
How do I install fonts?
On the Macintosh, drag fonts into your System Folder icon and restart your browser. On Windows, bring up the Fonts control panel. On X Windows, use 'man mkfontdir' and 'man xset' for more information.
Why do I see accents following characters, such as 'o"', or accents following \uXXXX unicode characters?
The Locale Explorer uses a module called the Decomposition Callback when it cannot output a character in a specific encoding. It attempts a 'Unicode canonical character decomposition' to break accented and ligature characters apart, and then looks for possible substitutions for the decomposed forms.
As an example, set the Locale Explorer encoding to iso-8859-1 and view Hungarian. 'Monday' in Hungarian ends with an 'o with double acute accent'. This is decomposed into 'o' + 'double-acute'. O can be output as just a normal o. The double-acute is turned into a doublequote ("), which is then output. So the result is 'o"' which is certainly more readable than an error character such as '?'.

For this reason, if your fonts are deficient, you may actually get better results (in some cases) by choosing iso-8859-1 or some other encoding, than utf-8.

(missing resource) - MISSING_RESOURCE_ERROR

The resource you asked for did not exist, nor could a suitable default be found.

AM & PM

Used in formatting/parsing Times. Specify the strings to be used for 'am' vs. 'pm'. Ignored if am/pm are not used in that locale.

Collation rules

Specifies the rules to be used for collating (comparing and sorting) text. The rules may be found in the ICU documentation for TableCollator.

Regions

Specifies the display name for the given region (ISO-3166) codes. Used for converting a region code of a locale into a displayable name.

Currency Elements

These strings are used in parsing/formatting numbers which are designated as currency values.

Currency Symbol: This is used whenever a 'currency symbol' (\u00A4) is encountered in a currency number pattern.
Int'l Currency Symbol: Three-letter international code for a currency. Used where a doubled currency symbol (\u00A4 \u00A4) is encountered in a number pattern.
Currency separator: This is used as the decimal separator in currency formatting/parsing, instead of the DecimalSeparator from the NumberElements list.

Date & Time Options

First day of the week: A number indicating which day of the week is considered the 'first' day, for calendar purposes. It is 1-based, with 1 being Sunday, 2 being Monday, .. 7 being Saturday.
Minimal Days in First Week: Minimal days required in the first week of the year are; For example, if the first week is defined as one that contains the first day of the first month of a year, this value will be 1. If it must be a full week, the value will be 7.

Date & Time Patterns

The first 8 items are different lengths of either date or time patterns. See the localPatternChars (in the locale) for the meanings of special characters.

Quoting rules: Single quotes, ('), enclose bits of the pattern that should be treated literally. Inside a quoted string, two single quotes ('') are replaced with a single one ('). For example: 'class of 'YYYY' at 'h' o''clock' -> class of 1939 at 6 o'clock (Literal strings underlined.)

Day

This resource contains the full and short (abbreviated) names of the days of the week, starting with Sunday.

Eras

Display strings for the eras. (2 for the default, Gregorian calendar: BC,AD.)

Languages

Display names for language codes. For example, "en" is "English" in English, but it is "inglés" in Spanish.

Windows Locale ID

Hexadecimal Locale ID for this Locale as used by Microsoft Windows. See http://www.microsoft.com/globaldev/ for a description of these ID's.

Month

Full and short (abbreviated) month names, starting with January.

Number Elements

Symbols used in number formatting and parsing. Used in NumberPatterns.

Decimal Separator: - separates the integer and fractional part of the number.
Grouping Separator: - groups (for example) units of thousands: 10^6 = 1,000,000. The grouping separator is commonly used for thousands, but in some countries for ten-thousands. The interval is a constant number of digits between the grouping characters, such as 100,000,000 or 1,0000,0000. If you supply a pattern with multiple grouping characters, the interval between the last one and the end of the integer is the one that is used. So "#,##,###,####" == "######,####" == "##,####,####".
Pattern Separator: - sets the character used for separating variant expressions. (For example: 0.00;(0.00) where the second pattern is the form to use for negative number)
Percent: - symbol used to indicate a percentage (1/100th) amount. (If present, the value is also multiplied by 100 before formatting. That way 1.23 => 123%)
ZeroDigit: - Symbol used to indicate a digit in the pattern, or zero if that place would otherwise be empty. For example, with the digit of '0', the pattern "000" would format "34" as "034", but the pattern "0" would format "34" as just "34". As well, the digits 1-9 are expected to follow the code point of this specified 0 value.
Digit: - Symbol used to indicate any digit value. If that digit is zero, then it is not shown.
Minus Sign: - Symbol used to denote negative value.
Exponential: - Symbol separating the mantissa and exponent values.
PerMill: - symbol used to indicate a per-mille (1/1000th) amount. (If present, the value is also multiplied by 1000 before formatting. That way 1.23 => 1230 [1/000])
Infinity: - The infinity sign. Corresponds to the IEEE infinity bit pattern.
Not a number: - The NaN sign. Corresponds to the IEEE NaN bit pattern.

Number Patterns

Patterns for formatting different types of numbers. Note that the NumberElements resource affects how these patterns are interpreted.

Decimal: The normal locale specific way to write a base 10 number.
Currency: Use \u00A4 where the local currency symbol should be. Doubling the currency symbol (\u00A4\u00A4) will output the international currency symbol (a 3-letter code).
Percent: Pattern for use with percentage formatting
Scientific: Pattern for use with scientific (exponent) formatting.

Quoting rules: Single quotes, ('), enclose bits of the pattern that should be treated literally. Inside a quoted string, two single quotes ('') are replaced with a single one ('). For example: 'X '#' at 'h' o''clock' -> class of 1939 at 6 o'clock (Literal strings underlined.)

Locale Codes

2 and 3 letter ISO codes for the language and region, as well as the variant codes.

Version

Version of the ICU data files.

Localized Date Pattern Chars

These characters are replaced with the appropriate values when a date or time is being formatted.

Characters may be used multiple times. For example, if y is used for the year, 'yy' might produce '99', whereas 'yyyy' produces '1999'.

For most numerical characters, the number of characters specifies the field width. For example, if h is the hour, 'h' might produce '5', but 'hh' produces '05'. For some characters, the count specifies whether an abbreviated or full form should be used.

Note: In the following list, the default (English) form is used as an example, but see the actual locale for the correct characters!

G Era - Replaced with the Era string for the current date.
y Year - Use two for the short year, or 4 for the full year
M Month - Use one or two for the numerical month, three for the abbreviation, or four for the full name.
d Date - Day of the month. Use one, or two for zero padding.
k Hour of day 1 [Midnight appears as '24']
H Hour of day 0 [0-23]
m Minute - Use one or two
s Second - use one or two
S Millisecond - Use 1,2, or 3.. shows the MOST significant digits.
E Day of week - Use three for the short day, or four for the full name.
D Day of year - Use 1-3
F Day of Week in Month- use one. This is 1 for the first day of the week, etc..
w Week of Year - use 1 or 2.
"Values calculated for the WEEK_OF_YEAR field range from 1 to 53. Week 1 for a year is the first week that contains at least getMinimalDaysInFirstWeek() days from that year. It depends on the values of getMinimalDaysInFirstWeek(), getFirstDayOfWeek(), and the day of the week of January 1. Weeks between week 1 of one year and week 1 of the following year are numbered sequentially from 2 to 52 or 53 (if needed). For example, January 1, 1998 was a Thursday. If getFirstDayOfWeek() is MONDAY and getMinimalDaysInFirstWeek() is 4 (these are the values reflecting ISO 8601 and many national standards), then week 1 of 1998 starts on December 29, 1997, and ends on January 4, 1998. However, if getFirstDayOfWeek() is SUNDAY, then week 1 of 1998 starts on January 4, 1998, and ends on January 10, 1998. The first three days of 1998 are then part of week 53 of 1997." (from ICU User's Guide)
W Week of month - use 1
a AM or PM
h Hour 1 - Noon and Midnight show up as "12"
K Hour 0 - [0-11]
z Timezone. Use 3 for the short timezone (i.e. PST) or 4 for the full name (Pacific Standard Time). If there's no name for the zone, it'll show up as GMT+/-hh:mm.
A Year (of "Week of Year"). [May differ from Calendar year, see comments under 'Week of Year', above.]
e Timezone. Use 3 for the short timezone (i.e. PST) or 4 for the full name (Pacific Standard Time). If there's no name for the zone, it'll show up as GMT+/-hh:mm.

Time Zones

Localized names for time zones. The columns are, in order:

Canonical name for the time zone.
Display long name for the normal time zone.
Abbreviation for the normal time zone.
Display long name for the time zone on summer/daylight savings time.
Abbreviation for the time zone on summer/daylight savings time.
A city in the specified time zone.

Collation (sorting) Example

This example demonstrates sorting (collation) in this locale. Type in some lines of text to be sorted, and click the Sort button. (The notes below explain what happens). You see four different columns as output. The first is the original text for comparison. The lines are numbered to show their original position. The remaining columns show sorting by different strengths (available as a parameter to the collation function). Groups of lines that sort precisely the same are separated by an underline. Since collation treats these lines as identical, lines in the same group could appear in any order (depending on the precise sorting algorithm used).

The demo shows three different strengths used when comparing any two strings:

Primary Only means that only major differences are considered, such as different base letters (e.g. A vs B).
Primary & Secondary means that a second level of differences is also considered, such as accents (e.g. A vs Á). However, these are only relevant if there are no primary differences anywhere else in the strings.
Primary - Tertiary means that a third level of differences is also considered, such as case (e.g. A vs a). However, these are only relevant if there are no primary differences and no secondary differences anywhere else in the strings.

Primary - Tertiary is the only collation people will normally use; however, these levels are also used in searching, for a "loose" match. In some languages, such as French, secondary differences are counted from the end of the strings. You see this if you look at the difference in sorting between English and French for the lines: "côté", "coté", "côte", and "cote".

Note: if you want to enter a character for sorting that is not available from your keyboard, you can type it in by character code using "\u" notation: for example, "ä" is \u00E4, or paste it in from a unicode chart page, such as here [link http://www.macchiato.com/unicode/charts.html].

References. For more information, please see these web pages.

The UTR #10 Unicode Collation Algorithm [link http://www.unicode.org/unicode/reports/tr10/Sample/)]
The UCA demo (link to http://www.unicode.org/unicode/reports/tr10/Sample/)
The Draft ICU User Guide
- Collation (link http://www.icu-project.org/userguide/Collate_Intro.html)
The ICU API documentation on Collation
- Collator (link http://www.icu-project.org/apiref/icu4c/ucol_8h.html#_details)

Number Pattern Demo

In this example you can try creating localized patterns and formatting numbers using those patterns.

The top form shows the pattern you are working with. It is the same kind of pattern as the pre-set patterns found in the NumberPatterns resource. Also, see the NumberElements resource for important information on the characters used in each pattern.

The left hand side shows the number that will be formatted. You may change this number (and click Change) to see it's effect on the formatted number.

The right hand side shows the formatted number. You may also change the formatted version of the number (and click Change) to see it converted back onto the left hand side.

Note that you may type in unicode values directly. For example, typing '\u0416' will be replaced with the Cyrillic letter "Zhe", which is at Unicode code point U+0416.

Date & Time Pattern Demo

In this example you can try creating localized patterns and formatting dates using those patterns.

The top form shows the pattern you are working with. It is the same kind of pattern as the pre-set patterns found in the DateTimePatterns resource. The characters used in the pattern are the localized pattern characters for that locale. They are reprinted at the bottom of the demo for your convenience.

Below the pattern is the current date/time (at left), and the formatted version using your pattern (at right).

Note that you may type in unicode values directly. For example, typing '\u0416' will be replaced with the Cyrillic letter "Zhe", which is at Unicode code point U+0416.

Transliteration Help

Clicking 'Transliterate it for me!' will cause any text that is NOT "Latin-1" to be transliterated. All transliteration will show up in green. To remove transliteration, simply change your encoding to something else. What is Transliteration? Click here to learn more!

End of help