Home » excel » regex – Which regular expression is able to select excel column names in a formula in C#?

regex – Which regular expression is able to select excel column names in a formula in C#?

Posted by: admin April 23, 2020 Leave a comment

Questions:

I’m nedding to implement excel formula autofill in C#.

Let’s suppose this formula is located at B100:

=SUM($B$99:B99)

I want to make this formula different at C100:

=SUM($B$99:C99)

This formula is only an example. Some real examples are:

=(SUM($B
=(SUM($B${0}:B{0})/SUM({1}!$B${0}:{1}!B{0}) -1)
=SUM(B{0}:B{1})
=B{0} + B{1}
=C{0}+ B{1}
=$B${0}+ AC{1}
:B{0})/SUM({1}!$B
=(SUM($B${0}:B{0})/SUM({1}!$B${0}:{1}!B{0}) -1)
=SUM(B{0}:B{1})
=B{0} + B{1}
=C{0}+ B{1}
=$B${0}+ AC{1}
:{1}!B{0}) -1) =SUM(B{0}:B{1}) =B{0} + B{1} =C{0}+ B{1} =$B
=(SUM($B${0}:B{0})/SUM({1}!$B${0}:{1}!B{0}) -1)
=SUM(B{0}:B{1})
=B{0} + B{1}
=C{0}+ B{1}
=$B${0}+ AC{1}
+ AC{1}

(consider {0} and {1} are, in fact, numbers)

What I need to do, generically, is to pick these column names and “increment” them. Column names surrounded by $ in formulas should not be updated.

How to identify these fields with regex?

How to&Answers:

Here’s a regex solution that solely deals with formulas. I’ll leave the Excel stuff to you. Provided you have a collection of strings representing your formulas, you can run them through this to increment your column names.

Some comments:

  • **Test this thoroughly!** Perhaps do a sheet manually and compare your efforts with the generated results.
  • This shouldn’t accidentally alter function names that fit the cell naming pattern. If you know your formulas have Excel function names that include numbers, keep an eye out for them and – again – **verify results**.
  • The regex does not verify what you’re feeding it is a formula – I take it you’re only using formulas. In other words, I didn’t make it check that the string begins with an “=” sign. If you plan to feed non-formulas through this for other cell values, then add a check where IsMatch is used in the if branch using formula.StartsWith(“=”). To understand what I’m referring to, add an additional test string to my sample, such as “Check out T4 generation” – if no StartsWith(“=”) check is made, that will match and T4 will become U4.

The regex pattern was actually the easy part. It will just match any letter-number sequence, and ignores $A$1 and $A1 types of cells. The tricky part was the logic to increment the column. I’ve added comments to clarify that bit so grab some coffee and read it over 🙂

I’m sure this could be enhanced but this is what I had time for.

using System.Text.RegularExpressions;

static void Main(string[] args)
{
    string[] formulas = { "Z1", "ZZ1", "AZ1", "AZB1", "BZZ2",
                        "=SUM($B$99:B99)","=SUM($F99:F99)", "=(SUM($B$0:B0)/SUM(1!$B$11:22!B33) -1)",
                        "=SUM(X80:Z1)", "=A0 + B1 - C2 + Z5", "=C0+ B1",
                        "=$B$0+ AC1", "=AA12-ZZ34 + AZ1 - BZ2 - BX3 + BZX4",
                        "=SUMX2MY2(A2:A8,B2:B8)",   // ensure function SUMX2MY2 isn't mistakenly incremented
                        "=$B$40 + 50 - 20"          // no match
                        //,"Check out T4 generation!"  // not a formula but it'll still increment T4, use formula.StartsWith("=")
                        };

    // use this if you don't want to include regex comments
    //Regex rxCell = new Regex(@"(?<![$])\b(?<col>[A-Z]+)(?<row>\d+)\b");

    // regex comments in this style requires RegexOptions.IgnorePatternWhitespace
    string rxCellPattern = @"(?<![$])       # match if prefix is absent: $ symbol (prevents matching $A1 type of cells)
                                            # (if all you have is $A$1 type of references, and not $A1 types, this negative look-behind isn't needed)
                            \b              # word boundary (prevents matching Excel functions with a similar pattern to a cell)
                            (?<col>[A-Z]+)  # named capture group, match uppercase letter at least once
                                            # (change to [A-Za-z] if you have lowercase cells)
                            (?<row>\d+)     # named capture group, match a number at least once
                            \b              # word boundary
                            ";
    Regex rxCell = new Regex(rxCellPattern, RegexOptions.IgnorePatternWhitespace);

    foreach (string formula in formulas)
    {
        if (rxCell.IsMatch(formula))
        {
            Console.WriteLine("Formula: {0}", formula);
            foreach (Match cell in rxCell.Matches(formula))
                Console.WriteLine("Cell: {0}, Col: {1}", cell.Value, cell.Groups["col"].Value);

            // the magic happens here
            string newFormula = rxCell.Replace(formula, IncrementColumn);
            Console.WriteLine("Modified: {0}", newFormula);
        }
        else
        {
            Console.WriteLine("Not a match: {0}", formula);
        }
        Console.WriteLine();
    }
}


private static string IncrementColumn(Match m)
{
    string col = m.Groups["col"].Value;
    char c;

    // single character column name (ie. A1)
    if (col.Length == 1)
    {
        c = Convert.ToChar(col);
        if (c == 'Z')
        {
            // roll over
            col = "AA";
        }
        else
        {
            // advance to next char
            c = (char)((int)c + 1);
            col = c.ToString();
        }
    }
    else
    {
        // multi-character column name (ie. AB1)
        // in this case work backwards to do some column name "arithmetic"
        c = Convert.ToChar(col.Substring(col.Length - 1, 1));   // grab last letter of col

        if (c == 'Z')
        {
            string temp = "";
            for (int i = col.Length - 1; i >= 0; i--)
            {
                // roll over should occur
                if (col[i] == 'Z')
                {
                    // prepend AA if current char is not the last char in column and its next neighbor was also a Z
                    // ie. column BZZ: if current char is 1st Z, it's neighbor Z (2nd Z) just got incremented, so 1st Z becomes AA
                    if (i != col.Length - 1 && col[i + 1] == 'Z')
                    {
                        temp = "AA" + temp;
                    }
                    else
                    {
                        // last char in column is Z, becomes A (this will happen first, before the above if branch ever happens)
                        temp = "A" + temp;
                    }
                }
                else
                {
                    temp = ((char)((int)col[i] + 1)).ToString() + temp;
                }
            }
            col = temp;
        }
        else
        {
            // advance char
            c = (char)((int)c + 1);
            // chop off final char in original column, append advanced char
            col = col.Remove(col.Length - 1) + c.ToString();
        }
    }

    // updated column and original row (from regex match)
    return col + m.Groups["row"].Value;
}

The results should look like this (I removed the cell breakdown for brevity):

Formula: Z1
Modified: AA1

Formula: ZZ1
Modified: AAA1

Formula: AZ1
Modified: BA1

Formula: AZB1
Modified: AZC1

Formula: BZZ2
Modified: CAAA2

Formula: =SUM($B$99:B99)
Modified: =SUM($B$99:C99)

Formula: =SUM($F99:F99)
Modified: =SUM($F99:G99)

Formula: =(SUM($B$0:B0)/SUM(1!$B$11:22!B33) -1)
Modified: =(SUM($B$0:C0)/SUM(1!$B$11:22!C33) -1)

Formula: =SUM(X80:Z1)
Modified: =SUM(Y80:AA1)

Formula: =A0 + B1 - C2 + Z5
Modified: =B0 + C1 - D2 + AA5

Formula: =C0+ B1
Modified: =D0+ C1

Formula: =$B$0+ AC1
Modified: =$B$0+ AD1

Formula: =AA12-ZZ34 + AZ1 - BZ2 - BX3 + BZX4
Modified: =AB12-AAA34 + BA1 - CA2 - BY3 + BZY4

Formula: =SUMX2MY2(A2:A8,B2:B8)
Modified: =SUMX2MY2(B2:B8,C2:C8)

Not a match: =$B$40 + 50 - 20

Answer:

You’re sure you’re not making this too complicated, right? This is something Excel does natively. Example: Highlight cell B100 in the above example. Notice in the cell’s outline that there’s a little black box in the bottom right corner of the cell. This lets you AutoFill. Click that black box and drag it right (to cell C100). You should have just AutoFilled over one column, and C100 should have =SUM($B$99:C99). And if you drag down instead, you’d get =SUM($B$99:B100).

If your goal is to repeat this behavior in C#, I would guess that the best way to do this is to figure out how to hook into Excel’s AutoFill functions. I don’t know exactly how you’d do this in C#, but certainly they’re available in VBA (and you can just record a macro, do the above steps, and then look at the generated code to see the AutoFill VBA code).

Hope that helps.

Answer:

+! for automating Excel and doing the work there.

However, if you’re bent on doing it in C#, you might start here http://ewbi.blogs.com/develops/2004/12/excel_formula_p.html. Maybe once you digest all the rules for tokenizing a formula, you’ll be able to create a RE.