java - Fetching URL from HTMLData with List Android Studio -


i'm near want, i'm blocked... have html data in string contentstring : log.i(tag, "all url : " + contentstring);:

<p><b>14th april</b></p> <p>the wind south west 4 5 foot of swell @ peak. streedagh best beach break.</p> <p><span id="more-113"></span></p> <p>high tide: 1250  3.1m    <span style="color: #ff0000;"> <a href="http://www.bundoransurfco.com/webcam/"><strong>click here live peak webcam</strong></a></span></p> <p>low tide: 1854 1.4m</p> <p></p> <p></p> <style type='text/css'> #gallery-1 { margin: auto; } #gallery-1 .gallery-item { float: left; margin-top: 10px; text-align: center; width: 50%; } #gallery-1 img { border: 2px solid #cfcfcf; } #gallery-1 .gallery-caption { margin-left: 0; } /* see gallery_shortcode() in wp-includes/media.php */ </style> <div id='gallery-1' class='gallery galleryid-113 gallery-columns-2 gallery-size-thumbnail'><dl class='gallery-item'> <dt class='gallery-icon portrait'> <a rel="prettyphoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n.jpg'><img width="67" height="68" src="http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n-67x68.jpg" class="attachment-thumbnail colorbox-113 " alt="11149460_10152656389992000_7842452340110509403_n" /></a> </dt></dl><dl class='gallery-item'> <dt class='gallery-icon portrait'> <a rel="prettyphoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-april.jpg'><img width="67" height="68" src="http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-april-67x68.jpg" class="attachment-thumbnail colorbox-113 " alt="14th april" /></a> </dt></dl><br style="clear: both" /> </div> <p></p> <p><b>3 day forecast april 13th</b></p> <p>solid swell , onshore winds weekend. best spots rossnowlagh , streedagh. bundoran beaches , reefs blown out.</p> <h1> wind charts</h1> <p><a href="http://www.windguru.cz/int/index.php?sc=103244"><img class="size-thumbnail wp-image-747 alignleft" title="wind guru" src="http://www.bundoransurfco.com/wp-content/uploads/2010/12/wind-guru-67x68.jpg" alt="" width="67" height="68" /></a> <a href="http://www.xcweather.co.uk/"><img class="alignnone size-thumbnail wp-image-749" title="xcweathersmall" src="http://www.bundoransurfco.com/wp-content/uploads/2010/12/xcweathersmall2-67x68.jpg" alt="" width="67" height="68" /></a>       <a href="http://www.buoyweather.com/wxnav6.jsp?region=uk&program=nww3bw1&grb=nww3&latitude=55.0&longitude=-8.75&zone=0&units=e"><img class="alignnone size-thumbnail wp-image-750" title="buoy weather" src="http://www.bundoransurfco.com/wp-content/uploads/2010/12/buoy-weather-67x68.jpg" alt="" width="67" height="68" /></a> <a href="http://www.windguru.cz/int/index.php?sc=103244">wind guru</a>       <a href="http://www.xcweather.co.uk/">xc weather</a>       <a href="http://www.buoyweather.com/wxnav6.jsp?region=uk&program=nww3bw1&grb=nww3&latitude=55.0&longitude=-8.75&zone=0&units=e">buoy weather</a></p> 

i fetch href's url <a rel="prettyphoto[gallery-113]" ...> (two in example)

for that, i'm using pattern :

pattern pattern = pattern.compile("<a rel=\"prettyphoto\\[gallery-113\\]\"[^>]*>");         matcher matcher = pattern.matcher(contentstring);         list<string> urlwithrel = new arraylist<string>();         string laststring;         list<string> imagesurl = null;         while (matcher.find()) {             urlwithrel.add(matcher.group());             laststring = urlwithrel.tostring();         }         log.i(tag, "url rel : " + urlwithrel);         log.i(tag, "final url : " + imagesurl);         log.i(tag, "list size : " + imagesurl.size()); 

with first regex can have 2 markup need :

<a rel="prettyphoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n.jpg'>, <a rel="prettyphoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-april.jpg'>

now want store href's url, find regex works getting url : (?<=href=).*(?=>)

but problem can't use regex on list... , if create string making regex, regex working on first object...

here final code (doesn't work) :

pattern pattern = pattern.compile("<a rel=\"prettyphoto\\[gallery-113\\]\"[^>]*>"); matcher matcher = pattern.matcher(contentstring); list<string> urlwithrel = new arraylist<string>(); string laststring; list<string> imagesurl = null; while (matcher.find()) {     urlwithrel.add(matcher.group());     laststring = urlwithrel.tostring();     pattern lastpattern = pattern.compile("(?<=href=).*(?=>)");     matcher lastmatcher = lastpattern.matcher(laststring);     imagesurl = new arraylist<string>();     while (lastmatcher.find()) {         imagesurl.add(lastmatcher.group());     } } log.i(tag, "url rel : " + urlwithrel); log.i(tag, "final url : " + imagesurl); log.i(tag, "list size : " + imagesurl.size()); 

returns :

final url : ['http://www.bundoransurfco.com/wp-content/uploads/2014/11/11149460_10152656389992000_7842452340110509403_n.jpg'>, <a rel="prettyphoto[gallery-113]" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/14th-april.jpg'] 

if willing use jsoup library snippet should use:

arraylist<url> urls=new arraylist<url>(); document doc=jsoup.parse(contentstring); elements els=doc.select("a[href]"); for(element el : els)     if(el.attr("rel").equals("prettyphoto[gallery-113]"))        urls.add(new url(el.attr("href"))); 

and remember handle malformedurlexception url object.


Comments

Popular posts from this blog

css - SVG using textPath a symbol not rendering in Firefox -

Java 8 + Maven Javadoc plugin: Error fetching URL -

datatable - Matlab struct computations -