replace - Why ++ becomes -+-+-+- : string.gsub "strange" behavior -


why ++ becomes -+-+-+- ?

i'd clean string double operating signs. how should process ?

string = "++" print (string ) -- -> ++ string = string.gsub( string, "++", "+") print (string ) -- -> + ok string = string.gsub( string, "--", "+") print (string ) -- -> +++ ? string = string.gsub( string, "+-", "-") print (string ) -- -> -+-+-+- ?? string = string.gsub( string, "-+", "-") print (string ) -- -> -+-+-+- ??? ;-) 

the core problem gsub operates on patterns (lua's minimal regular expressions) , string contains unescaped magic characters. however, knowing found myself surprised results.

it's easier see gsub doing if change replacement string:

string.gsub('+',   '--', '|') => |+| string.gsub('+++', '--', '|') => |+|+|+| 

- means "0 or more occurrences of preceding atom". unlike +, it's non-greedy, matching fewest characters possible.

i tested , apparently "fewest characters possible" means 0 characters. instance, intuition this:

string.gsub('aaa','a-', '|') 

is expression a- match each a, replace them '|', resulting in '|||'. in fact, matches on 0-length gaps before , after each character, resulting in: '|a|a|a|'

in fact, doesn't matter what atom precede -, matches on smallest length, 0:

string.gsub('aaa','x-', '|') => |a|a|a| string.gsub('aaa','a-', '|') => |a|a|a| string.gsub('aaa','?-', '|') => |a|a|a| string.gsub('aaa','--', '|') => |a|a|a| 

you can see last 1 case , explains results. next result exact same thing:

string.gsub('+++','+-','|') => |+|+|+| 

your final result more straightforward:

string.gsub('-+-+-+-','-+','|') => |+|+|+| 

in case, you're matching "1 or more occurances of atom -", you're replacing - characters, you'd expect.


Comments