java - Processing URLs found in HTML page -
i have html page. parse using jsoup, here part of code
document doc = jsoup.parse(content); org.jsoup.select.elements images = doc.select("[src]"); for(org.jsoup.nodes.element img : images) { // here need determine type of url , convert absolute url } i need change urls inside html absolute urls. problem that, src attribute of <img> </img> of type, if host if www.example.com:
1. http://www.example.com/images/1.png 2. http://example.com/images/1.png 3. www.example.com/images/1.png 4. example.com/images/1.png 5. /example.com/images/1.png 6. //example.com/images/1.png 7. /images/1.png i came list, while testing, should support them all. need function outputs me absolute url(http://www.example.com/images/1.png) inputs listed above. problem complicated when url resource location, example haha.com/images/1.png.
so need way determine type of url, like:
- relative(
/images/1.png); - absolute(
http://example.com/images/1.png); - protocol relative(
example.com/images/1.png).
what best approach solve problem in java? thank you.
check out methods available dom. specifically: document.url
Comments
Post a Comment