java - Converting relative paths into absolute ones not working in JSoup -
i'm trying access relative links (a[href]) in webpage , replacing them absolute ones, , printing modified html of webpage on console. but, when i'm looking @ links after having run program, no changes have been made. here's code:
document doc = null; try { doc = jsoup.connect("http://jsoup.org/cookbook/extracting-data/dom-navigation").useragent("mozilla").get(); } catch (ioexception e1) { e1.printstacktrace(); } elements imports = doc.select("a[href]"); string s = ""; (element link : imports) { //system.out.println("\n"+link.attr("href")); //system.out.println(link.attr("abs:href")); if(link.attr("href").equalsignorecase("/")){ //do nothing } else{ s = doc.tostring().replaceall(link.attr("href"), link.attr("abs:href")); } } system.out.println(s); one strange thing that, in program, i'm connecting http://jsoup.org/cookbook/extracting-data/dom-navigation when connect link http://csb.stanford.edu/class/public/pages/sykes_webdesign/05_simple.html notice changes being made. problem here, wrong code or webpage?!
please try <your element>.absurl("href") instead. , testing, print result element directly after have changed it.
for replacing urls can use (not tested):
elements urls = doc.select("a[href]"); for( element urlelement : urls ) { urlelement.attr("href", urlelement.absurl("href")); system.out.println(urlelement); // print result directly after changes have been done }
Comments
Post a Comment