i want extract link of image in case if picture not contain word "thumb" in name.
<a title="" rev="http://insales.ru/images/ large.jpeg" href="http://insales.ru/images/t001.jpeg" class="testclass"> <img src="http://insales.ru/images/thumb.jpeg" class="productimage"> </a>
so want extract "http://insales.ru/images/thumb.jpeg" line <img>
tag if picture name not have "thumb" keyword.
i trying that:
//a[@class='testclass']//img[not(contains(@src, 'thumb'))]
not working. because loosing data <a>
now.
sometimes extract links directly <a>
tag. rev or href. <img>
child of <a>
.
how specify xpath allow extract links parent or child. there condition child.
in details:
i parsing data online store. specific trying images of product. code provided represents 1 image of product. need big version of picture. not small thumbnail picture. problem link big picture presented in rev attribute of <a>
tag. presented in src attribute of <img>
tag.
case 1 (link need presented in <a>
tag , rev attribute
<li class='product-item'> <a title="" rev="http://insales.ru/images/ large.jpeg" href="http://insales.ru/images/t001.jpeg" class="magicthumb-swap"> <img src="http://insales.ru/images/thumb_t001" class="productimage" title=" tissot"> </a> </li>
in case need extract http://insales.ru/images/large.jpeg line. dont need http://insales.ru/images/thumb_t001 in <img>
tag.
case 2 (link need presented in <img>
tag , src attribute)
<div class='item'> <a title="" id="zoomer" class="magiczoomplus jqzoom modal" href="http:// insales.ru/images/thumbi14.jpg" > <img src="http://insales.ru/images/large_i14.jpg" title="orient” class="productimage"> </a> </div>
in second case need extract http://insales.ru/images/large_i14.jpg line. , dont need http:// insales.ru/images/thumbi14.jpg in <a>
tag.
i know how extract link in each of these 2 cases. don’t know how make universal xpath allow getting links big pictures in both scenarios. thata why trying make condition based on picture name in link. if there 'thumb' keyword in link im trying filter out.
if have understood correctly now, correct path expression is
//a/@rev[not(contains(.,'thumb'))] | //img/@src[not(contains(.,'thumb'))]
where |
union
operator combines sets of nodes.
assuming input document like
<html> <li class='product-item'> <a title="" rev="http://insales.ru/images/ large.jpeg" href="http://insales.ru/images/t001.jpeg" class="magicthumb-swap"> <img src="http://insales.ru/images/thumb_t001" class="productimage" title=" tissot"/> </a> </li> <div class='item'> <a title="" id="zoomer" class="magiczoomplus jqzoom modal" href="http:// insales.ru/images/thumbi14.jpg" > <img src="http://insales.ru/images/large_i14.jpg" title="orient" class="productimage"/> </a> </div> </html>
the result (individual results separated -----------
):
rev="http://insales.ru/images/ large.jpeg" ----------------------- src="http://insales.ru/images/large_i14.jpg"
and in case need include class
attribute of a
:
//a[@class='testclass']/@rev[not(contains(.,'thumb'))] | //a[@class='testclass']/img/@src[not(contains(.,'thumb'))]
however, did not mention in "detailed" description.