I have a string which contains an Excel formula. How to parse out each particular function name from within the string?
I can’t figure out how to write the regex for this. Basically it has to be the string of characters before a ( that isn’t in a single or double quote.
For example:
-
=VLOOKUP($A9,'Summary'!$A$10:$C$30,3,FALSE)
– Should returnVLOOKUP
-
=IFERROR((C10/B10),"N/A")
– should returnIFERROR
-
='New Chart Data (Date)'!L70
– Should return nothing because there is no function -
=IFERROR((C10/B10),Len(E30))
– should returnIFERROR
andLEN
-
='New Chart Data(Date)'!L70 + Len(5)
– should returnLen
. This is the tricky one. A lot will returnData
as well which is wrong.
Any ideas?
Thanks in advance.
You can use something like this I guess…
(?<=[=,])[A-Za-z2]+(?=\()
regex101 demo (with descriptions of regex)
Actually, there’s one catch: a formula such as =IFERROR((C10/B10), Len(E30))
won’t get Len
. You can use this one instead and trim any spaces if any:
(?<=[=,])\s*[A-Za-z2]+(?=\()
Or since C# accepts variable length lookbehinds…
(?<=[=,]\s*)[A-Za-z2]+(?=\()
Which I think takes a bit more resources than the previous.
EDIT: I didn’t think of the fact that sheetnames can take the form =Sheet(2)
e.g. ='=Sheet(2)'!A1
(?<=[=,])\s*[A-Za-z2]+(?=\()(?![^']*'!)
EDIT2: Forgot operators as well… I guess I’ll use a word boundary like Andy’s, since the only issue is
\b[A-Za-z2]+(?=\()(?![^']*'!)
Answer:
I think it could be simplified, using a word-break \b
rather than a look-behind:
\b([A-Za-z2]+)(?=\()