.


:




:

































 

 

 

 





; :

. , : , , [34];

, . , .

, , . , . Å (A ) Å, μ - .

, . , , , .

. , .

ɻ (U+0419) Ȼ (U+0418) ̆ (U+0306)

(). . , , . , . (. base characters), (. combining characters); . , á a (U+0061) ́ (U+0301) á (U+00E1).

(. variation selectors). , . 5.0 , .

, . (. normalization forms) , . . () , , , () .

4 : NFD, NFC, NFKD NFKC.

NFD

NFD, . n ormalization f orm D (D . d ecomposition), D , (. precomposed characters) (. composite characters) .

:

Å
U+00C5
A
U+0041
̊
U+030A

 

U+1E69
s
U+0073
̣
U+0323
̇
U+0307

 

ḍ̇
U+1E0B U+0323
d
U+0064
̣
U+0323
̇
U+0307

 

q̣̇
U+0071 U+0307 U+0323
q
U+0071
̣
U+0323
̇
U+0307

NFC

NFC, . n ormalization f orm C (C . c omposition), C , . ( NFD) D. , NFD, :

S , ;

, S, C S, S C - B, , , C. , ;

, ( );

X Y, Z, <X, Y>;

C L , L L-C, C .

:

o
U+006F
̂
U+0302
ô
U+00F4

NFKD

NFKD, . n ormalization f orm KD, KD , . [35]:

(ℍ ℌ);

(①);

(カ カ);

(︷ {);

(⁹ ₉);

(¼);

().

:

U+210d
H
U+0048

 

U+2460
 
U+0031

 

U+FF76
U+30AB

 

U+FE37
{
U+007B

 

U+2079
 
U+0039

 

¼
U+00BC
   
U+0031 U+2044 U+0034

 

U+2122
T M
U+0054 U+004D

NFKC

NFKC, . n ormalization f orm KC, KC , ( NFKD) ( NFC).

[ | -]

NFD NFC NFKD NFKC
U+FB01
U+FB01
U+FB01
f i
U+0066 U+0069
f i
U+0066 U+0069
 
U+0032 U+2075
 
U+0032 U+2075
 
U+0032 U+2075
   
U+0032 U+0035
   
U+0032 U+0035
ẛ̣
U+1E9B U+0323
ſ ̣ ̇
U+017F U+0323 U+0307
̣
U+1E9B U+0323
s ̣ ̇
U+0073 U+0323 U+0307
U+1E69
U+0439
̆
U+0438 U+0306
U+0439
̆
U+0438 U+0306
U+0439
U+0451
̈
U+0435 U+0308
U+0451
̈
U+0435 U+0308
U+0451
U+0410
U+0410
U+0410
U+0410
U+0410
U+304C
U+304B U+3099
U+304C
U+304B U+3099
U+304C
U+2167
U+2167
U+2167
V I I I
U+0056 U+0049 U+0049 U+0049
V I I I
U+0056 U+0049 U+0049 U+0049
ç
U+00E7
c ̧
U+0063 U+0327
ç
U+00E7
c ̧
U+0063 U+0327
ç
U+00E7

(. left-to-right, LTR), (. right-to-left, RTL) , . ; .

, , . (. bidirectional text, BiDi). (, ) , . : , , . ( ) .

: ,

, :

,

,

,

,

,

,

,

,

,

,

( , ),

,

,

,

,

(),

,

,

( , , )

.

, : , , , , , , .

, .

, , (, Apple MacRoman (0xF0) Windows Wingdings (0xFF)). .

ISO/IEC 10646

ISO/IEC/JTC1/SC2/WG2, 10646 (ISO/IEC 10646). ISO/IEC 10646 , .

(. International Organization for Standardization, ISO) 1991 . 1993 ISO DIS 10646.1. 1.1, DIS 10646.1. Unicode 1.1 DIS 10646.1 .

. 2000 Unicode 3.0 ISO/IEC 10646-1:2000. ISO/IEC 10646 Unicode 4.0. , .

UTF-16 UTF-32 , ISO/IEC 10646 : UCS-2 (2 , UTF-16) UCS-4 (4 , UTF-32). UCS () (. universal multiple-octet coded character set). UCS-2 UTF-16 (UTF-16 ), UCS-4 UTF-32.

ISO/IEC 10646:

;

ISO/IEC 10646 , :

;

(. collation) (. rendering) ;

(, , (. bi-directional) ).

(. Unicode transformation format, UTF): UTF-8, UTF-16 (UTF-16BE, UTF-16LE) UTF-32 (UTF-32BE, UTF-32LE). UTF-7 , - ASCII . 1 2005 : UTF-9 UTF-18 (RFC 4042).

Microsoft Windows NT Windows 2000 Windows XP UTF-16LE. UNIX- GNU/Linux, BSD Mac OS X UTF-8 UTF-32 UTF-8 .

Punycode Unicode- ACE-, - , .

UTF-8

: UTF-8

UTF-8 , , 8- . , 128, UTF-8 ASCII. , UTF-8 128 ASCII . 2 6 ( , 4 , 10FFFF, ), 11xxxxxx, 10xxxxxx. UTF-8 , 4 .

UTF-8 2 1992 Plan 9[36]. UTF-8 RFC 3629 ISO/IEC 10646 Annex D.

UTF-8 Unicode :

Unicode UTF-8:

0x00000000 0x0000007F: 0xxxxxxx

0x00000080 0x000007FF: 110xxxxx 10xxxxxx

0x00000800 0x0000FFFF: 1110xxxx 10xxxxxx 10xxxxxx

0x00010000 0x001FFFFF: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

, :

0x00200000 0x03FFFFFF: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

0x04000000 0x7FFFFFFF: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

, UTF-8 , . .

UTF-16 (. UTF-16 little-endian), (. UTF-16 big-endian). UTF-32LE UTF-32BE.

U+FEFF ( ), (. byte order mark (BOM)). UTF-16LE UTF-16BE, U+FFFE . UTF-8, . , , :

UTF-8

EF BB BF

UTF-16BE

FE FF

UTF-16LE

FF FE

UTF-32BE

00 00 FE FF

UTF-32LE

FF FE 00 00

, UTF-16LE UTF-32LE, U+0000 ( ).

UTF-16 UTF-32, BOM, big-endian (unicode.org).





:


: 2016-11-24; !; : 509 |


:

:

: , , , , .
==> ...

1299 - | 1198 -


© 2015-2024 lektsii.org - -

: 0.055 .