Xpath Extract Current Node Content Including All Child Node

February 28, 2024 Post a Comment

I've met a problem while extracting current node content including all child node. Just like the following code, I want to get string abcdefgb1b2b3 in pre tag.

Solution 1:

To get XML node's content markup (sometimes referred to as "innerXML") , you can start by selecting the node (instead of selecting the child or the text content) :

from lxml import html
import lxml.etree as le

input = "<pre>abcdefg<b>b1b2b3</b></pre>"
tree = html.fromstring(input)
node = tree.xpath("//pre")[0]

then combine the text content with all child nodes markup :

result= node.text +''.join(le.tostring(e) for e in node)
print result

Output :

abcdefg<b>b1b2b3</b>

Solution 2:

try replacing your xpath with the following

In [0]: input = "<pre>abcdefg<b>b1b2b3</b></pre>"

In [1]: input_xpath = "//pre//text()"

In [2]: tree = html.fromstring(input)

In [3]: result = tree.xpath(input_xpath)

In [4]: result
Out[5]: ['abcdefg', 'b1b2b3']

Python Library

Xpath Extract Current Node Content Including All Child Node

Solution 1:

Solution 2:

Post a Comment for "Xpath Extract Current Node Content Including All Child Node"