Preface
Recently joined the new project team to be responsible for the pre-research and selection of front-end technology, which involves a familiar and unfamiliar need - internationalization & localization. What I am familiar with is that I have also played with previous projects, but what I am unfamiliar with is that the previous implementation is only at the "existence" stage. Take this opportunity to study and organize it carefully and prepare for the subsequent technical selection.
This article will explain the concepts of internationalization and localization, as well as one of the very important concepts - Language tag (also called Language code or Culture).
What is internationalization?
Internationalization I think is that the application supports multilingual and cultural customs (digital, currency, date and character comparison algorithm, etc.), while localization is that the application can identify the cultural customs of the user and automatically adapt to the corresponding language and cultural version.
In the past, I often thought that internationalization was a substitution of strings - such as "Hello!" replaced by "What's up, man!", which is actually divided into the following five aspects:
- String replacement
like"Hello!"
Replace with"What's up, man!"
. - Digital representation method
like1200.01
, English expression is1,200.01
, while French is1 200,01
, German is1.200,01
. - Currency representation
Like RMB¥1,200.01
, the US dollar is expressed as$1,200.01
, while the English-speaking Euro is€1,200.01
The German Euro is1.200,01 €
.
Note: The exchange rate is not included here. - Date representation method
likeSeptember 15, 2016
, English expression is9/15/2016
, while French is15/9/2016
, in German15.9.2016
. -
Character comparison algorithm
likeä
andz
When comparing, both English and German areä
Ranked inz
In front, in Swedish it isz
Ranked inä
Front.The key to localization - Language Tag
Since it is necessary to automatically adapt to the language and cultural version to which the user belongs, then there must be a basis to identify it, right? I think everyone should be right
zh-CN
anden
It’s no stranger to wait, and they are the basis we need! When we use the existing i18n library to achieve internationalization/localization, we will definitely write the following documents{ "en": { "name": "Enter Name" }, "zh-CN": { "name": "Enter a name" } }
But except
en
andzh-CN
Are there any other keys? What are their composition rules? Let’s have a little in-depth understanding of these Language Tags!
Syntax Rules
Note that the following description is in ABNF language (please refer to the syntax of ABNFGrammar Specifications: BNF and ABNF)
Language-Tag = langtag / privateuse / grandfathered langtag = language ["-" script] ["-" region] *("-" variant) *("-" extension) ["-" privateuse]
Can seeLanguage-Tag
Divided intolangtag
,privateuse
andgrandfatherd
There are three subcategories, let’s first understand the two that are not available in general!
privateuse
The meaning of a tag is not defined by the subtag registry, but is privately defined, maintained and used by the teams used.
Format:
privateuse = "x" 1*("-" (1*8alphanum))
Example:x-zh-CN
It is privateuse, and its meaning does not necessarily mean languagezh-CN
Consistent.
Note: It can only be used as a small group and must not be applied on a large scale.
grandfathered
For backward compatibility. Since the tags before RFC 4646 cannot exactly match the tag syntax and meaning of the current registry, grandfathered provides backward compatibility features.
grammar:
grandfathered = irregular / regualr irregular = "en-GB-oed" ; irregular tags do not match / "i-ami" ; the 'langtag' production and / "i-bnn" ; would not otherwise be / "i-default" ; considered 'well-formed' / "i-enochian" ; These tags are all valid, / "i-hak" ; but most are deprecated / "i-klingon" ; in favor of more modern / "i-lux" ; subtags or subtag / "i-mingo" / "i-navajo" / "i-pwn" / "i-tao" / "i-tay" / "i-tsu" / "sgn-BE-FR" / "sgn-BE-NL" / "sgn-CH-DE" regular = "art-lojban" ; these tags match the 'langtag' / "cel-gaulish" ; production, but their subtags / "no-bok" ; are not extended language / "no-nyn" ; or variant subtags: their meaning / "zh-guoyu" ; is defined by their registration / "zh-hakka" ; and all of these are deprecated / "zh-min" ; in favor of a more modern / "zh-min-nan" ; subtag or sequence of subtags / "zh-xiang"
Note: Almost all grandfarthered tags can be replaced by the current registry tags and their combinations (likei-tao
Can betao
instead), so if you don't have any accidents, please use the current label.
Next, we have our highlight langtag. First, let’s take a look at the first subtag under langtag - language.
Primary language subtag
pictureen
This is the Primary language subtag, which is used to identify the language corresponding to the resource.
grammar:
language = 2*3ALPAH ["-" extlang] / 4ALPHA / 5*8ALPHA extlang = 3ALPHA *2("-" 3ALPHA)
I saw that there are three forms of language, and the first one that made me more curious is2*3ALPHA ["-" extlang]
. In this form, the previous2*3ALPHA
It is called macrolanguage, which indicates the summary of a resource corresponding to a language, while a specific language/dialect is specified by extlang. The language containing the extlang part is also called encompassed language.
likezh-cmn
andzh-yue
is encompassed language, in whichzh
is macrolanguage, andcmn
andyue
It's extlang.
There is a very interesting thing here. We believe that Mandarin and Cantonese are both Chinese dialects, but the West believes that Mandarin and Cantonese do not belong to the same language at all, sozh-cmn
andzh-yue
It is set to redundant in the specification, it is recommended to use it directlycmn
andyue
wait. However, due to historical reasons, we still usezh-CN
representcmn-CN
。
In addition, there are only 7 tags that can be used as macrolanguage (ar
,kok
,ms
,sw
,uz
,zh
andsgn
)
Several other subtags similar to cmn are as follows
cmn mandarin(Official words、Mandarin) wuu Wu Yu(Jiangsu and Zhejiang dialect、Shanghai dialect) czh Hui language(Huizhou dialect、Yanzhou dialect、Wu Yu-Huiyan film) hak Hakka yue Cantonese(Cantonese) nan Minnan language(Fujian dialect、*) cpx Putian dialect(Putian dialect、Xinghua Language) cdo Fujian Eastern Language mnp Fujian language zco Fujian Chinese gan Gan language(Jiangxi dialect) hsn Hunan language(Hunan dialect) cjy Jin language(Shanxi dialect、Shaanxi dialect)
Note: All lowercase is generally used
Script subtag
Used to specify the language and dialect to which the text or text system resources belong.
grammar:
script = 4ALPHA
Note: The initial letter is generally capitalized, and subsequent letters are all lowercase.
Region subtag
Specify the language/dialect culture corresponding to the country and region.
grammar:
region = 2ALPHA / 3DIGIT
Note: Generally, all capitalization is used
Variant subtag
Specify additional information that other subtags cannot provide
grammar:
variant = 5*8alphanum / (DIGIT 3alphanum)
Example:de-CH-1996
Among them, 1996 is variant subtag, which overall means German modified since 1996 used in Switzerland.
Extension subtag
Provide a mechanism for us to expand langtag
grammar:
extension = singleton 1*("-" (2*8alphanum)) singleton = DIGIT / %x41-57 / %x59-5A / %x61-77 / %x79-7A
Now only supportedu
As the value of sigleton.
Example:de-DE-u-co-phonebk
It means that the content is sorted and other operations are performed by telephone book verification.
For more information about language-tag, please refer toBCP 47
How to choose Language Tag
Bitting the bullet and gnawing on so many standard contents, but I don’t know how to combine the appropriate language-tag: (In fact, there is only one principle of choice and combination
Keep the language-tag short and lean enough, under the premise that it is sufficient to distinguish other language-tags in the current context.
Example 1: The following Mandarin and Cantonese coexist
<p lang="cmn"> Xiao Chen said:"Old man, how can I go to Oriental Plaza?" The old man replied:"<span lang="yue">What do you say??I don't understand。</span>" </p>
Example 2: The following contains mainlanders speaking English, *ers speaking Mandarin and Americans speaking English
<p lang="cmn"> Xiao Chen said:"<span lang="en-CN">Hi, where are you come from?</span>" Mr. Li said:<span lang="cmn-HK">Your English is as ordinary as mine,Ha ha!</span>" Simon said:<span lang="en">Hey, what's up!</span>" </p>
Now another question is, how do we know what values are specifically defined by each subtag?
It is defined inIANA Language Subtag RegistryFame.
If you find it inconvenient to search, then use itLanguage Subtag Lookup toolBar!
In addition, if you are not sure about the language or dialect used in each country and region, you can use itEthnologueCheck it out and click on the area on the map to get the corresponding subtag information.
Summarize
Now we have a more comprehensive understanding of internationalization and localization, and have a deeper understanding of Language tags. Are you eager to roll up your sleeves and roll up the code? Stay tuned for the next article "JS Magic Hall: Incomplete Internationalization & Localization Manual"
grateful
Should the statement at the head of the web page be lang="zh" or lang="zh-cn"?
Language Subtag Registry
BCP 47
Language on the Web
Choosing a Language Tag
Language tags in HTML and XML