Home » excel » c# – How to remove characters from an excel sheet?

c# – How to remove characters from an excel sheet?

Posted by: admin May 14, 2020 Leave a comment


My overall problem is that I have a large Excel file(Column A-S, 85000 rows) that I want to convert to XML. The data in the cells is all text.

The process I’m using now is to manually save the excel file as csv, then parse that in my own c# program to turn it into XML. If you have better recommendations, please recommend. I’ve searched SO and the only fast methods I found for converting straight to XML require my data to be all numeric.
(Tried reading cell by cell, would have taken 3 days to process)

So, unless you can recommend a different way for me to approach the problem, I want to be able to programmatically remove all commas, <, >, ‘, and ” from the excel sheet.

How to&Answers:

I would use a combination of Microsoft.Office.Interop.Excel and XmlSerializer to get the job done.

This is in light of the fact that a) you’re using a console appilcation, and b) the interop assemblies are easy to integrate to the solution (just References->Add).

I’m assuming that you have a copy of Excel installed in the machine runnning the process (you mentioned you manually open the workbook currently, hence the assumption).

The code would look something like this:

The serializable layer:

public class TestClass
    public List<TestLineItem> LineItems { get; set; }

    public TestClass()
        LineItems = new List<TestLineItem>();

public class TestLineItem
    private string SanitizeText(string input)
        return input.Replace(",", "")
            .Replace(".", "")
            .Replace("<", "")
            .Replace(">", "")
            .Replace("'", "")
            .Replace("\"", "");

    private string m_field1;
    private string m_field2;

    public string Field1 
        get { return m_field1; }
        set { m_field1 = SanitizeText(value); }

    public string Field2 
        get { return m_field2; }
        set { m_field2 = SanitizeText(value); }

    public decimal Field3 { get; set; }

    public TestLineItem() { }

    public TestLineItem(object field1, object field2, object field3)
        m_field1 = (field1 ?? "").ToString();
        m_field2 = (field2 ?? "").ToString();

        if (field3 == null || field3.ToString() == "")
            Field3 = 0m;
            Field3 = Convert.ToDecimal(field3.ToString());

Then open the worksheet and load into a 2D array:

// using OExcel = Microsoft.Office.Interop.Excel;
var app = new OEXcel.Application();
var wbPath = Path.Combine(
        Environment.SpecialFolder.MyDocuments), "Book1.xls");

var wb = app.Workbooks.Open(wbPath);
var ws = (OEXcel.Worksheet)wb.ActiveSheet;

// there are better ways to do this... 
// this one's just off the top of my head
var rngTopLine = ws.get_Range("A1", "C1");
var rngEndLine = rngTopLine.get_End(OEXcel.XlDirection.xlDown);
var rngData = ws.get_Range(rngTopLine, rngEndLine);
var arrayData = (object[,])rngData.Value2;

var tc = new TestClass();

// since you're enumerating an array, the operation will run much faster
// than reading the worksheet line by line.
for (int i = arrayData.GetLowerBound(0); i <= arrayData.GetUpperBound(0); i++)
        new TestLineItem(arrayData[i, 1], arrayData[i, 2], arrayData[i, 3]));

var xs = new XmlSerializer(typeof(TestClass));
var fs = File.Create(Path.Combine(
xs.Serialize(fs, tc);


The generated XML output will look something like this:

      <Field2>some&amp;lt;encoded&amp;gt; stuff here</Field2>
      <Field2>testing some commas, and periods.</Field2>
      <Field2>text in &amp;quot;quotes&amp;quot; and &amp;#39;single quotes&amp;#39;</Field2>


There are many options to read/edit/create Excel files:

MS provides the free OpenXML SDK V 2.0 – see http://msdn.microsoft.com/en-us/library/bb448854%28office.14%29.aspx (XLSX only)

This can read+write MS Office files (including Excel).

Another free option see http://www.codeproject.com/KB/office/OpenXML.aspx (XLSX only)

IF you need more like handling older Excel versions (like XLS, not only XLSX), rendering, creating PDFs, formulas etc. then there are different free and commercial libraries like ClosedXML (free, XLSX only), EPPlus (free, XLSX only), Aspose.Cells, SpreadsheetGear, LibXL and Flexcel etc.

Another option is Interop which requires Excel to be installed locally BUT Interop is not supported in sever-scenarios by MS.

Any library-based approach to deal with the Excel-file directly is way faster than Interop in my experience…