regex - How to match one word or another in Elisp regexp -
i have string contains html code, below:
... <a href="../link.png">image link</a> ... <img src="../image.png" /> ... <pre class="should_not_match">...</pre> ...
i want extract resource paths: ../link.png
of href
in a
, , ../image.png
of src
in img
. have following code:
(with-temp-buffer (insert html-content) ;; html-content content mentioned above (beginning-of-buffer) (while (re-search-forward "<[a-za-z]+[^/>]+[src|href]=\"\\([^\"]+\\)\"[^>]*>" nil t) (message (match-string 1)) ;; more code here ))
the output includes not wanted ../link.png
, ../image.png
, should_not_match
, know because incorrect [src|href]
in regexp (i want match either src
or href
). use following regexp:
"<[a-za-z]+[^/>]+(src|href)=\"\\([^\"]+\\)\"[^>]*>"
but returns nil
now. tried following, without luck:
"<[a-za-z]+[^/>]+\\(src|href\\)=\"\\([^\"]+\\)\"[^>]*>" "<[a-za-z]+[^/>]+((src)|(href))=\"\\([^\"]+\\)\"[^>]*>" "<[a-za-z]+[^/>]+(\\(src\\)|\\(href\\))=\"\\([^\"]+\\)\"[^>]*>" "<[a-za-z]+[^/>]+\\((src)|(href)\\)=\"\\([^\"]+\\)\"[^>]*>" "<[a-za-z]+[^/>]+\\(\\(src\\)|\\(href\\)\\)=\"\\([^\"]+\\)\"[^>]*>"
so, correct regexp can work?
thanks in advance,
kelvin
edit
inspired @lawlist, find because need escape |
\\|
, \\(src\\|href\\)
works well.
this particular regexp covers first 2 items in example of original poster, e.g, <a href="../link.png">image link</a>
, <img src="../image.png" />
. saw no need exclude third item in example of original poster because not included in following regexp:
\\(<a href=\"\\|<img src=\"\\)\\(.*\\)\\(\">image link</a>\\|\" />\\)
the regexp of original poster not cover portion of first example -- i.e., image link</a>
not contemplated regexp if fix \\(src\\|href\\)
. thus, recommendation devise regexp includes entire html link.
Comments
Post a Comment