.


:




:

































 

 

 

 


XML




XML , XML () . , XML (wellformedness) . , , . , , HTML web- HTML . ( , HTML . , HTML , .)

( ), XML . , . , , XML HTTP (, Atom). XML, 1997, , Atom .

, XML , . , lxml .

XML .

<?xml version='1.0' encoding='utf-8'?><feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'> <title>dive into &hellip;</title>...</feed>

, &hellip; XML ( HTML). , lxml hellip.

>>> import lxml.etree>>> tree = lxml.etree.parse('examples/feed-broken.xml')Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lxml.etree.pyx", line 2693, in lxml.etree.parse (src/lxml/lxml.etree.c:52591) File "parser.pxi", line 1478, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:75665) File "parser.pxi", line 1507, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:75993) File "parser.pxi", line 1407, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:75002) File "parser.pxi", line 965, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:72023) File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:67830) File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:68877) File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:68125)lxml.etree.XMLSyntaxError: Entity 'hellip' not defined, line 3, column 28

XML , XML.

>> parser = lxml.etree.XMLParser(recover=True) ①>>> tree = lxml.etree.parse('examples/feed-broken.xml', parser) ②>>> parser.error_log ③examples/feed-broken.xml:3:28:FATAL:PARSER:ERR_UNDECLARED_ENTITY: Entity 'hellip' not defined>>> tree.findall('{http://www.w3.org/2005/Atom}title')[<Element {http://www.w3.org/2005/Atom}title at ead510>]>>> title = tree.findall('{http://www.w3.org/2005/Atom}title')[0]>>> title.text ④'dive into '>>> print(lxml.etree.tounicode(tree.getroot())) ⑤<feed xmlns='http://www.w3.org/2005/Atom' xml:lang='en'> <title>dive into </title>.. [ ].

① lxml.etree.XMLParser. , recover. True lxml .

② XML parser parse(). lxml &hellip;.

③ . ( recover.)

④ &hellip;, . title 'dive into '.

⑤ : &hellip; , lxml .

, XML . &hellip; HTML . ? . ? , XML . ( XML ) . , .





:


: 2016-11-18; !; : 314 |


:

:

, .
==> ...

1553 - | 1347 -


© 2015-2024 lektsii.org - -

: 0.013 .