Home » excel » regex – Excel & Powershell: Bulk Find and replace URL's used in formulas

regex – Excel & Powershell: Bulk Find and replace URL's used in formulas

Posted by: admin May 14, 2020 Leave a comment

Questions:

I’m working on creating a script that I will provide a list of SharePoint 2007 document libraries to which will do the following:

  1. In Excel, find formulas within cells that link to other workbooks + sheets (XLS/XLSX)
  2. If found, replace the formula link http://serverold/site/doclib/ with http://servernew/sites/sitecollection/doclib and save
  3. Else, close the workbook and move onto the next Log full URL and filename of any changes

Using the code on this link as my starting point I cannot get the following working:

  1. The regex expression to make the script detect the URL in the formulas
  2. modify the script to replace the old path with the new path in the formula within cells.
  3. a for each branch to deal with when the match is found (save and close) and when it’s not found (just close)

I’m not going into detail on all the research I’ve done (info is very light on the ground), just that it is mentioned on another thread that you can enumerate these links centrally in Excel but no example or links were given and when I’ve tried to enumerate the links collection in PowerShell (with Excel 2010 installed) it is empty with the example workbook I’m using which I know as “links” in that sense.

Example to enumerate link collections:

$File = "C:\temp\example.xls"
$Excel = New-Object -ComObject Excel.Application
$Excel.visible = $true
$Workbook = $Excel.workbooks.open($file)
$Workbook.LinkSources

So it begs the question, which method is right?

Example Excel formula

=+'http://server.old/site/site/Work in Progress `enter code here`Documents/Statements/[Hierarchy2011.xls]Reports'!$AD$37+'http://server.old/site/site/Work in Progress Documents/

Script to enumerate links (from the link I mentioned as my starting point) –

$path = "C:\temp"
$excelSheets = Get-Childitem -Path $path -Include *.xls,*.xlsx -Recurse
$excel = New-Object -comobject Excel.Application
$excel.visible = $false

foreach($excelSheet in $excelSheets)
{
 $workbook = $excel.Workbooks.Open($excelSheet)
 "There are $($workbook.Sheets.count) sheets in $excelSheet"

 For($i = 1 ; $i -le $workbook.Sheets.count ; $i++)
 {
  $worksheet = $workbook.sheets.item($i)
  "`tLooking for links on $($worksheet.name) worksheet"
  $rowMax = ($worksheet.usedRange.rows).count
  $columnMax = ($worksheet.usedRange.columns).count
  For($row = 1 ; $row -le $rowMax ; $row ++)
  {
   For($column = 1 ; $column -le $columnMax ; $column ++)
    { 
     [string]$formula = $workSheet.cells.item($row,$column).formula
     if($formula -match "\w?:\\w*\\[\w*\.xls\w?\]") {"`t`t$($formula)"}
    } #end for $column
   } #end for $row
  $worksheet = $rowmax = $columnMax = $row = $column = $formula = $null
 } #end for
 $workbook.saved = $true
 $workbook.close()
} #end foreach

 $excel.quit()
 $excel = $null
 [gc]::collect()
 [gc]::WaitForPendingFinalizers()

Thanks to anyone who can help and for your time.
Bests,
Ash

How to&Answers:

Would you consider using simply:

$formula -replace 'http://server.old/','http://server.new/'

Update: I initially thought that you were having a problem with the regex (only), but reading through that script, I think you need a bit more help than that.

Let me take the part of the code that’s relevant:

For($i = 1 ; $i -le $workbook.Sheets.count ; $i++)
 {
  $worksheet = $workbook.sheets.item($i)
  "`tLooking for links on $($worksheet.name) worksheet"
  $rowMax = ($worksheet.usedRange.rows).count
  $columnMax = ($worksheet.usedRange.columns).count
  For($row = 1 ; $row -le $rowMax ; $row ++)
  {
   For($column = 1 ; $column -le $columnMax ; $column ++)
    { 
     [string]$formula = $workSheet.cells.item($row,$column).formula
     ## This is irrelavant, it's trying to match a file...
     ##     if($formula -match "\w?:\\w*\\[\w*\.xls\w?\]") {"`t`t$($formula)"}

     $changed = $formula -replace 'http://server.old/','http://server.new/'
     if ($formula -ne $changed) {
        $workSheet.cells.item($row,$column).formula = $changed
     }


    } #end for $column
   } #end for $row
  $worksheet = $rowmax = $columnMax = $row = $column = $formula = $null
 } #end for

 ## The line below is actually cause file to be not saved when closing it as you are telling excel, hey you are saved.
 ## $workbook.saved = $true
 ## I would use this:
 if (!$Workbook.saved) { $workbook.save() }
 $workbook.close()
} #end foreach

Now, while I am on this, let me tell you how I would go about investigating this business before attempting to modify someone else’s script: In console investigate objects!

For this example, I created a simple spreadsheet (a2.xls) in c:\temp\a\

PS H:\> $excel = New-Object -com Excel.Application
PS H:\> $workbook = $excel.Workbooks.Open("C:\temp\a\a2.xls")
PS H:\> $worksheet=$workbook.Sheets.item(1)

Fun starts:

PS H:\> $worksheet |get-member -Type Properties


   TypeName: System.__ComObject#{000208d8-0000-0000-c000-000000000046}

Name                              MemberType Definition
----                              ---------- ----------
Application                       Property   Application Application () {get}
AutoFilter                        Property   AutoFilter AutoFilter () {get}
AutoFilterMode                    Property   bool AutoFilterMode () {get} {set}
Cells                             Property   Range Cells () {get}
CircularReference                 Property   Range CircularReference () {get}
CodeName                          Property   string CodeName () {get}
Columns                           Property   Range Columns () {get}
Comments                          Property   Comments Comments () {get}
ConsolidationFunction             Property   XlConsolidationFunction ConsolidationFunction () {get}
ConsolidationOptions              Property   Variant ConsolidationOptions () {get}
ConsolidationSources              Property   Variant ConsolidationSources () {get}
Creator                           Property   XlCreator Creator () {get}
CustomProperties                  Property   CustomProperties CustomProperties () {get}
DisplayAutomaticPageBreaks        Property   bool DisplayAutomaticPageBreaks () {get} {set}
DisplayPageBreaks                 Property   bool DisplayPageBreaks () {get} {set}
DisplayRightToLeft                Property   bool DisplayRightToLeft () {get} {set}
EnableAutoFilter                  Property   bool EnableAutoFilter () {get} {set}
EnableCalculation                 Property   bool EnableCalculation () {get} {set}
EnableFormatConditionsCalculation Property   bool EnableFormatConditionsCalculation () {get} {set}
EnableOutlining                   Property   bool EnableOutlining () {get} {set}
EnablePivotTable                  Property   bool EnablePivotTable () {get} {set}
EnableSelection                   Property   XlEnableSelection EnableSelection () {get} {set}
FilterMode                        Property   bool FilterMode () {get}
HPageBreaks                       Property   HPageBreaks HPageBreaks () {get}
Hyperlinks                        Property   Hyperlinks Hyperlinks () {get}
Index                             Property   int Index () {get}

(shortened output). See the hyperlinks property there? Was that what you were referring to?
Take a look:

PS H:\> $worksheet.hyperlinks


Application   : Microsoft.Office.Interop.Excel.ApplicationClass
Creator       : 1480803660
Parent        : System.__ComObject
Name          : http://old.server/adil1/hellow
Range         : System.__ComObject
Shape         :
SubAddress    :
Address       : http://old.server/adil1/hellow
Type          : 0
EmailSubject  :
ScreenTip     :
TextToDisplay :

You don’t see my source but let me tell you this: It’s extracting URLs from the formulas. so, if I change the name and the address… Unfortunately, I cannot. I can in fact change the address to point to new server, but its ‘name’ property is read-only (see it does not have set):

PS H:\> $worksheet.hyperlinks |gm


   TypeName: System.__ComObject#{00024431-0000-0000-c000-000000000046}

Name              MemberType Definition
----              ---------- ----------
Address           Property   string Address () {get} {set}
Name              Property   string Name () {get}

This is why you’re probably better off:

  • read the content of the cell as string,
  • replace just the server name
  • if cell does not have what you are looking for -replace won’t do anything and $changed =
    $formula
  • If not, right back the new formula.

Of course, you do not need to use -replace is not the only way but you get the idea…

Hope this helps!