java - Processing URLs found in HTML page -


i have html page. parse using jsoup, here part of code

document doc = jsoup.parse(content);  org.jsoup.select.elements images = doc.select("[src]");  for(org.jsoup.nodes.element img : images) {     // here need determine type of url , convert absolute url } 

i need change urls inside html absolute urls. problem that, src attribute of <img> </img> of type, if host if www.example.com:

 1. http://www.example.com/images/1.png  2. http://example.com/images/1.png  3. www.example.com/images/1.png  4. example.com/images/1.png  5. /example.com/images/1.png  6. //example.com/images/1.png  7. /images/1.png 

i came list, while testing, should support them all. need function outputs me absolute url(http://www.example.com/images/1.png) inputs listed above. problem complicated when url resource location, example haha.com/images/1.png.

so need way determine type of url, like:

  • relative(/images/1.png);
  • absolute(http://example.com/images/1.png);
  • protocol relative(example.com/images/1.png).

what best approach solve problem in java? thank you.

check out methods available dom. specifically: document.url

http://www.w3schools.com/jsref/dom_obj_document.asp


Comments

Popular posts from this blog

php - SPIP: From Tag directly to an article -

jquery - isAjaxRequest always return false -

ruby on rails - In a controller spec, how to find a specific tag in the generated view? -