Home » excel » c# – How to effectively buffer and flush stream in Open XML SDK

c# – How to effectively buffer and flush stream in Open XML SDK

Posted by: admin March 7, 2020 Leave a comment

Questions:

I use OpenXML SDK 2.0 to generate Excel file with large amount of data, appox. 1000000 rows, and I need to optimize memory usage because my machine slows down very quickly.

I want to solve this issue by flushing part of generated DOM tree into file during runtime. I make my own buffering for data. E.g I have 100000 records to write and I want flush stream into file when I add 1000 rows into Excel worksheet. I make this by using method worksheetPart.Worksheet.Save().
Documantation says that this method Save(): “saves the data in the DOM tree back to the part. It could be called multiple times as well. Each time it is called, the stream will be flushed.”

         foreach (Record m in dataList)
         {
            Row contentRow = CreateContentRow(index, m);         // my own method to create row content

            //Append new row to sheet data.
            sheetData.AppendChild(contentRow);

            if (index % BufferSize == 0)
            {
                worksheetPart.Worksheet.Save();
            }

            index++;

        }

This method works because memory usage chart has saw shape but unfortunetly the memory uasge grows in time.

Do anyone have any idea how solve this issue?

How to&Answers:

SpreadsheetGear for .NET can create an xlsx workbook with 1,000,000 rows by 40 columns of random numbers (that’s 40 million cells) in 74 seconds (that includes creating the workbook in memory from random numbers and saving to disk on an overclocked Intel QX 6850 and Windows Vista 32).

What kind of performance are you seeing with the Open XML SDK?

You can download a free trial of SpreadsheetGear here and try it yourself.

I will past the code to generate the 40 million cell workbook below.

Disclaimer: I own SpreadsheetGear LLC

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using SpreadsheetGear;

namespace ConsoleApplication10
{
    class Program
    {
        static void Main(string[] args)
        {
            try
            {
                // Run once with 100 rows and then run forever with 1,000,000 rows.
                for (int rows = 100; rows <= 1000000; rows = 1000000)
                {
                    Console.Write("rows={0}, ", rows);
                    var startMemory = System.GC.GetTotalMemory(true);
                    var timer = System.Diagnostics.Stopwatch.StartNew();
                    var workbook = BuildWorkbook(rows);
                    var usedMemory = System.GC.GetTotalMemory(true) - startMemory;
                    Console.WriteLine("usedMemory={0}, time={1} seconds, workbook.Name={2}", usedMemory, timer.Elapsed.TotalSeconds, workbook.Name);
                    workbook = null;
                }
            }
            catch (Exception e)
            {
                Console.WriteLine("got exception={0}", e.Message);
            }
        }

        static IWorkbook BuildWorkbook(int rows)
        {
            var workbook = Factory.GetWorkbook();
            var worksheet = workbook.Worksheets[0];
            var values = (SpreadsheetGear.Advanced.Cells.IValues)worksheet;
            Random rand = new Random();
            int cols = 40;
            for (int col = 0; col < cols; col++)
            {
                for (int row = 0; row <= rows; row++)
                {
                    values.SetNumber(row, col, rand.NextDouble());
                }
            }
            workbook.SaveAs(string.Format(@"c:\tmp\Rows{0}.xlsx", rows), FileFormat.OpenXMLWorkbook);
            return workbook;
        }
    }
}

Answer:

There is opposite approach to the “buffer and flush” for the task of writing large Excel files. The approach is based on using the OpenXmlWriter class and uses sequential writing instead of buffering and flushing. A one typical solution also uses replacement part and OpenXmlReader to get unchanged content from a template. Look at “Writing Large Excel Files with the Open XML SDK” (with a few code examples) and “Write large OpenXML docs” (with complete code example).