xml - How can I find commands that will show all instances of a pattern in Unix? -
what command can use find patterns in block of text in unix? need find appears between <title> , </title> (which appears several times in block of text). tried using
sed -n'/<title>,<\/title>/p' but seems print between first instance of <title> , last instance of </title>.
this looks might xml question, or maybe html “not quite xml”, in case there utilities enable extract particular parts of document according xpath. if can install software, might try:
xgrep -x //title <your file> there dozens of little utilities of varying degrees of maturity , ability handle quirks (like parsing html not well-formed xml).
if really have fall on doing regular expressions, assuming file called tagsoup.in, , looks this:
<blah> <title>one line title</title> <p>foo</p> <p>bar</p> <title>multi line title </title> <p>foo</p> <p>bar</p> </blah> then following line of sed extract one-line title, not multiline title:
sed -n 's/<title>\([^<]\+\)<\/title>/\1/p' tagsoup.in the following sed extract single-line , multiline content, runs risk of loading whole file memory if end tag not found:
sed -n ' /<title>\(.*\)/ { # if line matches start tag: s//\1/ # keep stuff after start tag /<\/title>/!{ # if end-tag *not* on line h # save hold space : loop # n # go on next line /\(.*\)<\/title>/{ # if match end tag s//\1/ # keep stuff start tag h # append hold space g # fetch hold space pattern space s/\n/ /g # replace newlines spaces p # print out pattern space } /<\/title>/!{ # if not match end tag h # append line hold space b loop # go , try next line } } /\(.*\)<\/title>/{ # if end-tag *is* on line s//\1/ # keep stuff before end tag p # print one-line title } }' tagsoup.in
Comments
Post a Comment