Home » excel » html – How would you simplify this process?

html – How would you simplify this process?

Posted by: admin April 23, 2020 Leave a comment

Questions:

I have a bunch (over 1000) HTML files with just simple text. It’s just a combination of text within a <table>. It’s an internal batch of documents, not for web production.

The job we have is to convert them into JPEG files using Photoshop and the old copy paste method. It’s tedious.

Is there a way you would do this process to make it more efficient/easier/simple?

I thought about trying to convert the HTML into Excel and then mail merging it into Word to print as JGEG. But I can’t find (and rightly so) anything to convert HTML to XLSX.

Thoughts? Or is this just a manual job?

How to&Answers:

Here’s a little something I created to convert a single html file to jpeg. It’s not pretty (to say the least), but it works fine with a table larger than my screen. Put it inside a windows forms project. You can add more checks and call this program in a loop, or refactor it to work on multiple html files.

Ideas and techniques taken from –

Finding the needed size – http://social.msdn.microsoft.com/Forums/ie/en-US/f6f0c641-43bd-44cc-8be0-12b40fbc4c43/webbrowser-object-use-to-find-the-width-of-a-web-page

Creating the graphics – http://cplus.about.com/od/learnc/a/How-To-Save-Web-Page-Screen-Grab-csharp.htm

A table for example – copy-paste enlarged version of http://www.w3schools.com/html/html_tables.asp

static class Program
{

    static WebBrowser webBrowser = new WebBrowser();
    private static string m_fileName;

    [STAThread]
    static void Main(string[] args)
    {

        if (args.Length != 1)
        {
            MessageBox.Show("Usage: [fileName]");
            return;
        }

        m_fileName = args[0];
        webBrowser.DocumentCompleted += (a, b) => webBrowser_DocumentCompleted();
        webBrowser.ScrollBarsEnabled = false; // Don't want them rendered
        webBrowser.Navigate(new Uri(m_fileName));


        Application.Run();
    }

    static void webBrowser_DocumentCompleted()
    {

        // Get the needed size of the control
        webBrowser.Width = webBrowser.Document.Body.ScrollRectangle.Width + webBrowser.Margin.Horizontal;
        webBrowser.Height = webBrowser.Document.Body.ScrollRectangle.Height + webBrowser.Margin.Vertical;

        // Create the graphics and save the image
        using (var graphics = webBrowser.CreateGraphics())
        {
            var bitmap = new Bitmap(webBrowser.Size.Width, webBrowser.Size.Height, graphics);
            webBrowser.DrawToBitmap(bitmap, webBrowser.ClientRectangle);

            string newFileName = Path.ChangeExtension(m_fileName, ".jpg");

            bitmap.Save(newFileName, ImageFormat.Jpeg);
        }

        // Shamefully exit the application
        Application.ExitThread();            
    }
}

Answer:

You can load all files in one page and use this lib html2canvas to covert.

You can running in the background use nodejs with node-canvas or make it a desk app with node-webkit

Answer:

In case anyone was looking for answer that works, I ended up using a program called Prince: https://www.princexml.com

It works amazingly, and just have to target the HTML with CSS or JS to make it match your output!