.


:




:

































 

 

 

 


If (Leading < $D800) Or (Leading > $DFFF) Then




Return WordToUInt32(Leading)

Else If (Leading >= $DC00) Then

Error(" .")

Else

Var Code: UInt32

Code = WordToUInt32(Leading And $3FF) Shl 10

Trailing = ReadWord()

If ((Trailing < $DC00) Or (Trailing > $DFFF)) Then

Error(" .")

Else

Code = Code Or WordToUInt32(Trailing And $3FF)

Return (Code + $10000)

End If

End If

End Function

 

UTF-8

UTF-8 ( Unicode Transformation Format, 8-bit) , 8- . -.

UTF-16, UTF-8 : .

, 128, UTF-8 ASCII. , UTF-8 128 ASCII . 2 6 ( 4 , 221 ), 11xxxxxx, 10xxxxxx.

, UTF-8 , ASCII US-ASCII, a 1. .

, , .

, ( ) , UTF-8 UTF-16.

, UTF-16 , .

, UTF-16, UCS-2.

UTF-8 2 1992 Plan 9. UTF-8 RFC 3629 ISO/IEC 10646 Annex D.

: , UTF-8, , Unicode 0x10ffff, Unicode 4 UTF-8.

 

UTF-8 0 0x7FFFFFFF ( 32- ).

1. 8- (). 1 6 .

2. ASCII- (000 0x7F ) .

3. . , .

4. ASCII- 1. ( , , ). , .

5. 6 ( ).

, , 1 . 0. . 6 , . 11111110 (0xFE) 11111111 (0xFF) UTF-8.

, .

, 6 . 32- . , . 6 0..5 . , 6..11, 12..17, 18..23, 24..29. .

, .

. .

Unicode (HEX) UTF-8
00000000 0000007F 1 ASCII, ,
00000080 000007FF 2 , , , , , ; , , ; ;
00000800 0000FFFF 3 , , , , ; ;
00010000 001FFFFF 4 , ,
00200000 03FFFFFF 5 Unicode
04000000 7FFFFFFF 6 Unicode

, .

UTF-8 , . , . , ASCII- 1 (0x31), : 11000000 10110001 (0xC0 0xB1) 11100000 10000000 10110001 (0xE0 0x80 0xB1). : 110 00000 (0xC0), 1110 0000 (0xE0), 11110 000 (0xF0), 111110 00 (0xF8), 1111110 0 (0xFC), 10 000000 (0x80).

UTF-8 32- . , Unicode 0x001FFFFF . 32- , UTF-8 .

. . , . , . 6 . 36 42 .

UTF-8 254 (0xFE) 255 (0xFF). 221, UTF-8 248 253 (0xF8 0xFD). ( ) UTF-8, 192 193 (0xC0 0xC1).

BOM ()

Windows ( ) 0xEF, 0xBB, 0xBF , UTF-8.

(. Byte Order Mark, BOM), (, UTF-8 UTF-8 with Signature). , UTF-8, , xml-. , Notepad++, Notepad2 Kate, , UTF-.

: a.

UTF-8 with Signature, : 0xEF 0xBB 0xBF 0x61

UTF-8 ( ), : 0x61

BOM, Unicode- 0xFEFF. . BOM UTF-16 UTF-32.

 

(. Byte Order Mark (BOM)) -, . U+FEFF. , , , , . , , Unicode .

Unicode 16- 32- , . .

[

Unicode, U+FEFF ( , ). Unicode 3.2 U+2060 Word Joiner[1], U+FEFF .

hex dec ISO-8859-1 KOI8-R CP1251 CP866
UTF-8[t 1] EF BB BF 239 187 191 ï¿  ╗┐  
UTF-16 (BE) FE FF 254 255 þÿ
UTF-16 (LE) FF FE 255 254 ÿþ
UTF-32 (BE) 00 00 FE FF 0 0 254 255 ␀␀þÿ ␀␀ ␀␀ ␀␀■ ␀ NUL,
UTF-32 (LE) FF FE 00 00 255 254 0 0 ÿþ␀␀ ␀␀ ␀␀ ■␀␀
UTF-7[t 1] 2B 2F 76 38 2B 2F 76 39 2B 2F 76 2B 2B 2F 76 2F[t 2] 43 47 118 56 43 47 118 57 43 47 118 43 43 47 118 47 +/v8 +/v9 +/v+ +/v/        
UTF-1[t 1] F7 64 4C 247 100 76 ÷dL        
UTF-EBCDIC[t 1] DD 73 66 73 221 115 102 115 Ýsfs        
SCSU[t 1] 0E FE FF[t 3] 14 254 255 ␎þÿ     ␎■ ␎ . Shift Out (.).,
BOCU-1[t 1] FB EE 28 251 238 40 ûî     √(  
GB-18030[t 1] 84 31 95 33 132 49 149 51 �1�3     13

1. ↑ : 1234567 , , .[2][3]

2. UTF-7 base-64, BOM 001111xx , xx ( BOM). BOM, ( BOM) . xx=00, 01, 10, 11, , , 38, 39, 2B, 2F base64. base64, 38 , 2D.

3. SCSU U+FEFF, UTR #6.[4]

 





:


: 2016-11-24; !; : 410 |


:

:

, .
==> ...

1737 - | 1610 -


© 2015-2024 lektsii.org - -

: 0.018 .