java - Fetching URL from HTMLData with List Android Studio -
i'm near want, i'm blocked... have html
data in string contentstring : log.i(tag, "all url : " + contentstring);
:
<p><b>14th april</b></p> <p>the wind south west 4 5 foot of swell @ peak. streedagh best beach break.</p> <p><span id="more-113"></span></p> <p>high tide: 1250 3.1m <span style="color: #ff0000;"> <a href="http://www.bundoransurfco.com/webcam/"><strong>click here live peak webcam</strong></a></span></p> <p>low tide: 1854 1.4m</p> <p></p> <p></p> <style type='text/css'> #gallery-1 { margin: auto; } #gallery-1 .gallery-item { float: left; margin-top: 10px; text-align: center; width: 50%; } #gallery-1 img { border: 2px solid #cfcfcf; } #gallery-1 .gallery-caption { margin-left: 0; } /* see gallery_shortcode() in wp-includes/media.php */ </style> <div id='gallery-1' class='gallery galleryid-113 gallery-columns-2 gallery-size-thumbnail'><dl class='gallery-item'> <dt class='gallery-icon portrait'> <a rel="prettyphoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n.jpg'><img width="67" height="68" src="http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n-67x68.jpg" class="attachment-thumbnail colorbox-113 " alt="11149460_10152656389992000_7842452340110509403_n" /></a> </dt></dl><dl class='gallery-item'> <dt class='gallery-icon portrait'> <a rel="prettyphoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-april.jpg'><img width="67" height="68" src="http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-april-67x68.jpg" class="attachment-thumbnail colorbox-113 " alt="14th april" /></a> </dt></dl><br style="clear: both" /> </div> <p></p> <p><b>3 day forecast april 13th</b></p> <p>solid swell , onshore winds weekend. best spots rossnowlagh , streedagh. bundoran beaches , reefs blown out.</p> <h1> wind charts</h1> <p><a href="http://www.windguru.cz/int/index.php?sc=103244"><img class="size-thumbnail wp-image-747 alignleft" title="wind guru" src="http://www.bundoransurfco.com/wp-content/uploads/2010/12/wind-guru-67x68.jpg" alt="" width="67" height="68" /></a> <a href="http://www.xcweather.co.uk/"><img class="alignnone size-thumbnail wp-image-749" title="xcweathersmall" src="http://www.bundoransurfco.com/wp-content/uploads/2010/12/xcweathersmall2-67x68.jpg" alt="" width="67" height="68" /></a> <a href="http://www.buoyweather.com/wxnav6.jsp?region=uk&program=nww3bw1&grb=nww3&latitude=55.0&longitude=-8.75&zone=0&units=e"><img class="alignnone size-thumbnail wp-image-750" title="buoy weather" src="http://www.bundoransurfco.com/wp-content/uploads/2010/12/buoy-weather-67x68.jpg" alt="" width="67" height="68" /></a> <a href="http://www.windguru.cz/int/index.php?sc=103244">wind guru</a> <a href="http://www.xcweather.co.uk/">xc weather</a> <a href="http://www.buoyweather.com/wxnav6.jsp?region=uk&program=nww3bw1&grb=nww3&latitude=55.0&longitude=-8.75&zone=0&units=e">buoy weather</a></p>
i fetch href's url <a rel="prettyphoto[gallery-113]" ...>
(two in example)
for that, i'm using pattern :
pattern pattern = pattern.compile("<a rel=\"prettyphoto\\[gallery-113\\]\"[^>]*>"); matcher matcher = pattern.matcher(contentstring); list<string> urlwithrel = new arraylist<string>(); string laststring; list<string> imagesurl = null; while (matcher.find()) { urlwithrel.add(matcher.group()); laststring = urlwithrel.tostring(); } log.i(tag, "url rel : " + urlwithrel); log.i(tag, "final url : " + imagesurl); log.i(tag, "list size : " + imagesurl.size());
with first regex can have 2 markup need :
<a rel="prettyphoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n.jpg'>, <a rel="prettyphoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-april.jpg'>
now want store href's url, find regex works getting url : (?<=href=).*(?=>)
but problem can't use regex on list... , if create string making regex, regex working on first object...
here final code (doesn't work) :
pattern pattern = pattern.compile("<a rel=\"prettyphoto\\[gallery-113\\]\"[^>]*>"); matcher matcher = pattern.matcher(contentstring); list<string> urlwithrel = new arraylist<string>(); string laststring; list<string> imagesurl = null; while (matcher.find()) { urlwithrel.add(matcher.group()); laststring = urlwithrel.tostring(); pattern lastpattern = pattern.compile("(?<=href=).*(?=>)"); matcher lastmatcher = lastpattern.matcher(laststring); imagesurl = new arraylist<string>(); while (lastmatcher.find()) { imagesurl.add(lastmatcher.group()); } } log.i(tag, "url rel : " + urlwithrel); log.i(tag, "final url : " + imagesurl); log.i(tag, "list size : " + imagesurl.size());
returns :
final url : ['http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n.jpg'>, <a rel="prettyphoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-april.jpg']
if willing use jsoup
library snippet should use:
arraylist<url> urls=new arraylist<url>(); document doc=jsoup.parse(contentstring); elements els=doc.select("a[href]"); for(element el : els) if(el.attr("rel").equals("prettyphoto[gallery-113]")) urls.add(new url(el.attr("href")));
and remember handle malformedurlexception
url
object.
Comments
Post a Comment