Sunday, May 14, 2017

Useful regular expressions

Useful regular expressions.
If you intend to use regex for programming, please consider about using xPath (PHPJava)
Depending on situation, you shouldn't use these to precisely match tags.

1.
<div>(?:(?!<div>|<\/div>)[\s\S\n])*?<\/div>
(start from nearest and deepest div)
Or
<div>[\s\S\n]*?<\/div>
(start from leftmost div)

references
https://stackoverflow.com/questions/27938851/regex-select-closest-match


2.
(?<=<div>)(?:(?!<div>|<\/div>)[\s\S\n])*?(?=<\/div>)
(start from nearest and deepest div)
Or
(?<=<div>)[\s\S\n]*?(?=<\/div>)
(start from leftmost div)

This matches only inside between  <div> </div> tags.



3.
<div>(?=(?:(?!<div>|<\/div>)[\s\S\n])*?<\/div>)|(?<=<div>(?:(?!<div>|<\/div>)[\s\S\n])*?)<\/div>
(start from nearest and deepest div)
Or
<div>(?=[\s\S\n]*?<\/div>)|(?<=<div>[\s\S\n]*?)<\/div>
(start from leftmost div)

This matches tags but not the inside between them.
Try it here.

But please note that infinite lookahead can be used for only .NET, Matthew Barnett's regex module for Python and JGSoft. (reference)

Or 
<a[^>]*>|</a>
To match only a tags.

4.
<("[^"]*"|'[^']*'|[^'">])*>

This matches all tags but not the inside between them.