Home » excel » macos – Mac: How to convert Excel (and other) file types to PDF programmatically

macos – Mac: How to convert Excel (and other) file types to PDF programmatically

Posted by: admin May 14, 2020 Leave a comment

Questions:

I am interested in programmatically converting an Excel spreadsheet/workbook to a PDF file. Others have asked and answered this question (see How to convert Word and Excel documents to PDF programmatically? and convert excel to pdf in python) but neither of those do what I want to do:

  1. I want a command-line tool that doesn’t require the use of Microsoft Office.
  2. I’d like it to work on a Mac.
  3. I want it to do the full spreadsheet; the Perl solution only does constants.
  4. I’d like to use something that’s already debugged, rather than writing the code.

It seems to me that an easy way to do this should be with the “Quick Look” system that’s built into MacOS. MacOS has ways for previewing any document (select it in Finder and press space). It also has ways to convert things to PDF. There should be a way of using the Apple’s built-in knowledge of how to render XLS and XLSX files. I just don’t know how to use QuickLook from the command line and how to get it to produce PDF output to a file.

Apple does provide a program called qlmanage that can be used to programmatically produce HTML files and PNGs, but it makes a set of HTML files, not a PDF file.

How to&Answers:

Well, I’ve come up with a solution that involves using qlmanage and wkhtmltopdf. Basically, I run qlmanage to make HTML, and use wkhtmltopdf to convert the HTML to PDF. Unfortunately the HTML pages are produced in more-or-less random order, so I need to look at the property list to figure out which page to put where. Fortunately on my workbooks the pages can be sorted.

#!/usr/bin/python
# 
# convert an excel workbook to a PDF on a Mac
#
#
from subprocess import Popen,call,PIPE
import os, os.path, sys
import xml.dom.minidom
import plistlib

if len(sys.argv)==1:
    print("Usage: %s filename.xls output.pdf" % sys.argv[0])
    exit(1)

if os.path.exists("xdir"):
    raise RuntimeError,"xdir must not exists"
os.mkdir("xdir")
call(['qlmanage','-o','xdir','-p',sys.argv[1]])

# Now we need to find the sheets and sort them.
# This is done by reading the property list
qldir = sys.argv[1] + ".qlpreview"
propfile = open("%s/%s/%s" % ('xdir',qldir,'PreviewProperties.plist'))
plist = plistlib.readPlist(propfile)
attachments = plist['Attachments']
sheets = []
for k in attachments.keys():
    if k.endswith(".html"):
        basename = os.path.basename(k)
        fn = attachments[k]['DumpedAttachmentFileName']
        print("Found %s -> %s" % (basename,fn))
        sheets.append((basename,fn))
sheets.sort()

# Finally use wkhtmltopdf to generate the PDF output
os.chdir("%s/%s" % ('xdir',qldir))
cmd = ['wkhtmltopdf'];
for (basename,fn) in  sheets:
    cmd.append(fn)
cmd.append(sys.argv[2])
call(cmd)
os.chdir("../..")
call(['/bin/rm','-rf','xdir'])

Answer:

Without writing any code, I think I might checkout the following:

first install CUPS-PDF which is a generic post-script background system that ties directly into CUPS, the system OS-X uses for printing. It allows you to print directly to pdf and appears as just another printer option in the print menu. Once installed, you could use automator to print any specific finder item, from the main automator workflow palate, choose Utilities -> Print Finder Items and you can choose which driver to print, so you can choose the CUPS-PDF option.

you can save your app as a droplet or an application and call from the command line

But I would not be sure how to specify arguments, like input files, so that might take some research or you might have to accept just using automator to get all the finder items in a specific place and move the items there with bash beforehand.