Search

Ubuntu - Generate PDF file from any set of documents

Contents[Hide]

dropcap pdf

Nowadays, PDF has become the de facto exchange file format for most documents. Whenever you need to send a document to someone using a smartphone, a Linux PC, a Windows PC or a Mac, by sending a PDF file, you'll be confident that your recipient will be able to open and read it.

Till date, under Linux, I haven't found a simple tool able to generate a PDF file in one click from a selection of miscellaneous documents like :

  • a serie of scanned pages
  • some office documents (doc, docx, odf, xls, …)
  • some text files
  • some photos (.jpg, .png, ...)

ubuntu pdf generate

This article explains how to setup a desktop environment under Linux which will allow you to convert any set of documents to a single multi-pages PDF document. All documents will be merged in alphabetical order according to their filename. Conversion will be accessible straight from the comfort of your Linux file manager (Nautilus, …).

If you don't need any technical explanation and you just want to be able to generate PDF documents straight from Nautilus file manager, you can jump to Complete installation procedure. It will provide a complete and simple installation script.

1. Main principle

When you want to generate a multi-pages PDF file from multiple documents, you need to follow few steps :

  1. Sort of all input documents in alphabetical order
  2. Convert them to some temporary PDF files
  3. Assemble all temporary PDF documents to a final multi-pages PDF document.

Conversion of input documents to temporary PDF format should be done according to their mimetype :

  • Image files (jpg, png, tiff, …)
  • Plain text files (txt)
  • Libre Office & Open Office documents (odt, ods, …)
  • Microsoft Office files (doc, docx, xls, xlsx, ppt, pptx, …)

Depending on input document types, conversion to PDF should be done using specific tools :

  • convert (from ImageMagick) for image files
  • unoconv for Libre Office, Microsoft Office and plain text files

Once all input documents are converted to some temporary PDF files, final document assembly is done using GhostScript.
This powerful tool allows to :

  • merge multiple PDF documents
  • optimize final file size

2. Install tools

All tools involved in the PDF generation process are installed under Ubuntu with this command :

Terminal
# sudo apt-get install imagemagick unoconv ghostscript zenity libfile-mimeinfo-perl

If you are using Nautilus file manager, you need one extra step to get complete desktop integration with a right click menu entry on compatible files.

In fact, Nautilus is implementing DES-EMA specifications thru an extra nautilus-actions package which need to be installed. As Nautilus does not display menu icons by default, you also need to enable this feature.

Terminal
# sudo apt-get install nautilus-actions
# gsettings set org.gnome.desktop.interface menus-have-icons true

3. Create main Script

We are now ready to install the main script in charge of the complete convert and merge job.

Main script is in charge of these actions :

  1. Retrieve files given as parameter
  2. If no parameter is given, display a file selection dialog box
  3. Get conversion parameters from configuration file
  4. Check files mime type and determine if they are candidate for conversion
  5. Sort all candidate files in alphabetical order
  6. Convert them to PDF
  7. Merge all PDF files in a final PDF document

Final PDF document will be named as the first input document with a -merged.pdf suffix.

Default parameters are defined in ~/.config/pdf-generate.conf configuration file.

~/.config/pdf-generate.conf
[general]
compression=95
density=200

Parameters are self explanatory :

  • image compression quality
  • document density in DPI

Default Ghostscript conversion density is 200 DPI.

So if you want to modify any default parameter, just edit this file before running the tool.

Main script in charge of the PDF generation k=job should be placed under /usr/local/bin/pdf-generate.

/usr/local/bin/pdf-generate
#!/bin/bash
# --------------------------------------------
#  Generate a PDF document from a given list of documents
#  Documents are added in final document following
#  alphabetical order
#
#  Setup procedure : http://bernaerts.dyndns.org/linux/74-ubuntu/338-ubuntu-generate-pdf-from-documents
#
#  Depends on :
#   * mimetype
#   * convert [imagemagick]
#   * unoconv
#   * wkhtmltopdf
#   * gs [ghostscript]
#
#  Revision history :
#   05/07/2015, V1.0 - Creation by N. Bernaerts
#   15/08/2015, V1.1 - Force jpeg quality to 95
#   25/09/2015, V1.2 - Add configuration file
#   02/10/2015, V2.0 - Code rewrite to handle progress and notification
#   01/05/2016, V2.1 - Add HTML conversion
#   12/10/2017, V2.2 - Correct PNG conversion bug giving white page
# ---------------------------------------------------

# -------------------------------------------------------
# check tools availability
# -------------------------------------------------------

command -v mimetype >/dev/null 2>&1 || { zenity --error --text="Please install mimetype"; exit 1; }
command -v convert >/dev/null 2>&1 || { zenity --error --text="Please install convert [imagemagick]"; exit 1; }
command -v unoconv >/dev/null 2>&1 || { zenity --error --text="Please install unoconv"; exit 1; }
command -v wkhtmltopdf >/dev/null 2>&1 || { zenity --error --text="Please install wkhtmltopdf"; exit 1; }
command -v gs >/dev/null 2>&1 || { zenity --error --text="Please install gs utility [ghostscript]"; exit 1; }

# ---------------------------------------------------------
#   Read and calculate parameters from configuration file
# ---------------------------------------------------------

# Configuration file : ~/.config/pdf-generate.conf
FILE_CONF="$HOME/.config/pdf-generate.conf"

# check configuration file
[ -f "$FILE_CONF" ] || { zenity --error --text="Please create and configure ${FILE_CONF}"; exit 1; }

# Load configuration file
COMPRESSION=$(cat "${FILE_CONF}" | grep "compression" | cut -d'=' -f2)
DENSITY=$(cat "${FILE_CONF}" | grep "density" | cut -d'=' -f2)

# calculate page size
PAGE_WIDTH=$((${DENSITY} * 827 / 100))
PAGE_HEIGHT=$((${DENSITY} * 1170 / 100))

# -------------------------------------------------------
#          Retrieve or select input files
# -------------------------------------------------------

# set separator as carriage return
IFS=$'\n'

# loop thru arguments to load candidate files
for ARGUMENT
do
  [ -f "${ARGUMENT}" ] && ARR_FILE=("${ARR_FILE[@]}" "${ARGUMENT}")
  [ -d "${ARGUMENT}" ] && ARR_FILE=("${ARR_FILE[@]}" $(find "${ARGUMENT}" -maxdepth 1 -type f) )
done

# if there is no candidate files, open selection dialog
if [ ${#ARR_FILE[@]} -eq 0 ]
then
  # open multiple files selection dialog box
  LST_FILE=$(zenity --file --multiple --title="Select file to merge as PDF")

  # generate video files array
  ARR_FILE=($(echo "${LST_FILE}" | tr "|" "\n"))
fi 

# -------------------------------------------------------
# loop thru selected files to check convertibility
# -------------------------------------------------------

for FILE in "${ARR_FILE[@]}"
do
  # document type undefined
  DOCTYPE=""

  # get the file mime type (application/msword, ...)
  MIMETYPE=$(mimetype -b "${FILE}")

  # check if file is a image file (.jpg, .png, .tiff, ...)
  CHECKTYPE=$(echo "${MIMETYPE}" | grep "image/")
  [ "${CHECKTYPE}" != "" ] && DOCTYPE="image"

  # check if file is a libreoffice file (.odt, .ods, ...)
  CHECKTYPE=$(echo "${MIMETYPE}" | grep ".opendocument.")
  [ "${CHECKTYPE}" != "" ] && DOCTYPE="libreoffice"

  # check if file is a microsoft file 2007+ file (.docx, .xlsx, .pptx, ...)
  CHECKTYPE=$(echo "${MIMETYPE}" | grep "vnd.openxmlformats-officedocument.")
  [ "${CHECKTYPE}" != "" ] && DOCTYPE="ms-office"

  # check some specific document types
  case $MIMETYPE in 
    # ms-office document (.doc, .xls, .ppt, ...)
    "application/msword" | "application/vnd.ms-word" | "application/vnd.oasis.opendocument.text" | \
    "application/vnd.ms-excel" | "application/vnd.ms-powerpoint" )
      DOCTYPE="ms-office"
      ;;

    # PDF document (.pdf)
    "application/pdf" | "application/x-pdf" | "application/x-bzpdf" | "application/x-gzpdf" )
      DOCTYPE="pdf"
      ;;

    # plain text file (.txt)
    "text/plain" | "application/x-shellscript" )
      DOCTYPE="text"
      ;;

    # plain text file (.txt)
    "text/html" )
      DOCTYPE="html"
      ;;

    * )
      ;;
  esac

  # if document type is compatible, add current file as candidate
  [ "${DOCTYPE}" != "" ] && ARR_CANDIDATE=("${ARR_CANDIDATE[@]}" "${FILE}|${DOCTYPE}")

done

# -------------------------------------------------------
#       Confirmation dialog box
# -------------------------------------------------------

# calculate number of files to convert
NBR_FILE=${#ARR_FILE[@]}
NBR_CANDIDATE=${#ARR_CANDIDATE[@]}

# if some candidate file exist, order them and display confirmation dialog
if [ ${NBR_CANDIDATE} -gt 0 ]
then
  # order generated PDF files in alphabetical order
  ARR_CANDIDATE=($(sort <<<"${ARR_CANDIDATE[*]}"))

  # generate final file name
  FILE=$(echo "${ARR_CANDIDATE[0]}" | cut -d'|' -f1)
  FILE_FINAL="$(echo "${FILE}" | sed 's/^\(.*\)\..*$/\1/')-merged.pdf"

  # display confirmation dialog box
  RESULT=$(zenity --question --title="Merge to PDF" --text="${NBR_CANDIDATE} file(s) out of ${NBR_FILE} will be merged to a single PDF file (${DENSITY} DPI)\n\nDo you want to generate ${FILE_FINAL} ?" )
  ACTION=$?

# else display error dialog
else
  # display confirmation dialog box
  RESULT=$(zenity --error --title="Merge to PDF" --text="There is no file compatible with PDF format." )
  ACTION=""

fi

# if action canceled or error, exit
[ "${ACTION}" != "0" ] && exit 1

(

# -------------------------------------------------------
# loop thru candidate files to convert them to PDF
# -------------------------------------------------------

for CANDIDATE in "${ARR_CANDIDATE[@]}"
do
  # retrieve document type and filename
  FILE=$(echo "${CANDIDATE}" | cut -d'|' -f1)
  DOCTYPE=$(echo "${CANDIDATE}" | cut -d'|' -f2)

  # progress display
  echo "# Conversion of ${FILE}"

  # get file name without extension & generate resulting PDF file name
  FILE_BASE="$(echo "${FILE}" | sed 's/^\(.*\)\..*$/\1/')"
  FILE_PDF="${FILE_BASE}.pdf"

  # convert file according to its type
  case $DOCTYPE in
    # PDF files
    "pdf" )
    ARR_PDF=("${ARR_PDF[@]}" "${FILE}")
    ;;

    # image files
    "image" )
    convert "${FILE}" -compress jpeg -quality ${COMPRESSION} -resize ${PAGE_WIDTH}x${PAGE_HEIGHT} -extent ${PAGE_WIDTH}x${PAGE_HEIGHT} -units PixelsPerInch -density ${DENSITY}x${DENSITY} "${FILE_PDF}"
    ARR_TMP=("${ARR_TMP[@]}" "${FILE_PDF}")
    ARR_PDF=("${ARR_PDF[@]}" "${FILE_PDF}")
    ;;

    # office files
    "libreoffice" | "ms-office" | "text" )
    unoconv -f pdf -o "${FILE_PDF}" "${FILE}"
    ARR_TMP=("${ARR_TMP[@]}" "${FILE_PDF}")
    ARR_PDF=("${ARR_PDF[@]}" "${FILE_PDF}")
    ;;

    # html files
    "html" )
    wkhtmltopdf "${FILE}" "${FILE_PDF}" 
    ARR_TMP=("${ARR_TMP[@]}" "${FILE_PDF}")
    ARR_PDF=("${ARR_PDF[@]}" "${FILE_PDF}")
    ;;

    # other formats, not handled
    * )
    ;;
  esac

done

# -------------------------------------------------------
#   Final merge
# -------------------------------------------------------

if [ ${#ARR_PDF[@]} -gt 0 ]
then
  # progress display
  echo "# Final assembly of ${FILE_FINAL}"

  # generate resulting PDF
  gs -q -dNOPAUSE -dBATCH -dSAFER -sPAPERSIZE=a4 -dPDFFitPage -dCompatibilityLevel=1.4 -sDEVICE=pdfwrite -sOutputFile="${FILE_FINAL}" ${ARR_PDF[@]}
fi

# -------------------------------------------------------
#   Temporary files clean-up
# -------------------------------------------------------

# loop to remove temporary files
for TMP_FILE in "${ARR_TMP[@]}"
do
   rm "${TMP_FILE}"
done

) | zenity --width=500 --height=25 --progress --pulsate --auto-close --title "Merge to PDF"

# -------------------------------------------------------
#   End of job notification
# -------------------------------------------------------

[ ${#ARR_CANDIDATE[@]} -gt 0 ] && zenity --notification --window-icon="evince" --text="${FILE_FINAL} generated." \
                               || zenity --notification --window-icon="error" --text="Document type was not compatible for PDF generation."

You can install main script and its configuration file from command line :

Terminal
# mkdir --parents $HOME/.config
# wget --header='Accept-Encoding:none' -O $HOME/.config/pdf-generate.conf https://raw.githubusercontent.com/NicolasBernaerts/ubuntu-scripts/master/pdf/pdf-generate.conf
# sudo wget --header='Accept-Encoding:none' -O /usr/local/bin/pdf-generate https://raw.githubusercontent.com/NicolasBernaerts/ubuntu-scripts/master/pdf/pdf-generate
# sudo chmod +x /usr/local/bin/pdf-generate

4. Desktop Integration

It is now time to fully integrate this PDF generation tool in your desktop environment.

4.1. Menu declaration

To get the tool available from your desktop Application / Office menu, you just need to declare /usr/share/applications/pdf-generate.desktop.

/usr/share/applications/pdf-generate.desktop
[Desktop Entry]
Type=Application
Exec=pdf-generate
Hidden=false
NoDisplay=false
Icon=evince
Keywords=pdf;generate;image;office;document;merge
X-GNOME-Autostart-enabled=true
Name[en_US]=Generate PDF document
Name[en]=Generate PDF document
Name[C]=Generate PDF document
Name[fr_FR]=Génération d'un PDF
Comment=Tool to merge a set of documents to a PDF file.
Comment[en_US]=Tool to merge a set of documents to a PDF file.
Comment[fr_FR]=Outil de concaténation de documents en un fichier PDF.
MimeType=image/bmp;image/gif;image/jpeg;image/jpg;image/png;image/tiff;application/pdf;application/x-pdf;application/x-bzpdf;application/x-gzpdf;application/msword;application/vnd.ms-word;application/vnd.oasis.opendocument.text;application/vnd.openxmlformats-officedocument.wordprocessingml.document;application/vnd.oasis.opendocument.spreadsheet;application/vnd.ms-excel;application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;application/vnd.openxmlformats-officedocument.spreadsheetml.template;application/vnd.ms-powerpoint;application/vnd.openxmlformats-officedocument.presentationml.presentation;application/vnd.openxmlformats-officedocument.presentationml.template;application/vnd.openxmlformats-officedocument.presentationml.slideshow;text/plain;
Categories=GNOME;GTK;Graphics;Conversion;Utility;

Both files can be downloaded and installed from my GitHub account.

Terminal
# sudo wget --header='Accept-Encoding:none' -O /usr/share/applications/pdf-generate.desktop https://raw.githubusercontent.com/NicolasBernaerts/ubuntu-scripts/master/pdf/pdf-generate.desktop

After a reboot or a new session login, you should get a new Generate PDF document menu entry.

4.2. Custom Action

To get a full desktop integration, this PDF generation tool should be available from a custom action in your file manager context menu.

This context menu should be displayed for any single or multiple files selection having a compatible mime type.

With latest Extension for Menus and Actions of the freedesktop.org Desktop Entry Specification (DES-EMA) this integration has become quite easy.

You just need to declare the new custom action in a .desktop file placed under $HOME/.local/share/file-manager/actions.

$HOME/.local/share/file-manager/actions/pdf-generate-action.desktop
[Desktop Entry]
Type=Action
Icon=evince
Name[C]=Generate PDF document
Name[en]=Generate PDF document
Name[en_US]=Generate PDF document
Name[fr_FR]=Génération d'un PDF
Tooltip[C]=Tool to merge a set of documents to a PDF file
Tooltip[en]=Tool to merge a set of documents to a PDF file
Tooltip[en_US]=Tool to merge a set of documents to a PDF file
Tooltip[fr_FR]=Outil de concaténation de documents en un fichier PDF
Profiles=pdf_generate;

[X-Action-Profile pdf_generate]
Exec=pdf-generate %F
MimeTypes=image/bmp;image/gif;image/jpeg;image/jpg;image/png;image/tiff;application/pdf;application/x-pdf;application/x-bzpdf;application/x-gzpdf;application/msword;application/vnd.ms-word;application/vnd.oasis.opendocument.text;application/vnd.openxmlformats-officedocument.wordprocessingml.document;application/vnd.ms-excel;application/vnd.openxmlformats-officedocument.spreadsheetml.sheet;application/vnd.openxmlformats-officedocument.spreadsheetml.template;application/vnd.ms-powerpoint;application/vnd.openxmlformats-officedocument.presentationml.presentation;application/vnd.openxmlformats-officedocument.presentationml.template;application/vnd.openxmlformats-officedocument.presentationml.slideshow;text/plain;inode/directory;
Name[C]=Default profile
Name[en]=Default profile
Name[en_US]=Default profile
Name[fr_FR]=Profil par défaut

This can be done with these commands :

Terminal
# mkdir --parents $HOME/.local/share/file-manager/actions
# wget --header='Accept-Encoding:none' -O $HOME/.local/share/file-manager/actions/pdf-generate-action.desktop https://raw.githubusercontent.com/NicolasBernaerts/ubuntu-scripts/master/pdf/pdf-generate-action.desktop

After next login, a selection on a set of compatible file types should provide this new Generate PDF document menu entry.

ubuntu pdf generate menu

Prior to documents processing, you'll get a confirmation dialog box.

ubuntu pdf generate dialog

5. Complete installation procedure

If you want to install all needed tools and scripts in one go, you can run an all-in-one installation script available from my Github repository.

This script has been written and tested on Ubuntu 14.04 LTS. It will handle whatever installation and configuration described earlier in this article.

Terminal
# wget --header='Accept-Encoding:none' https://raw.githubusercontent.com/NicolasBernaerts/ubuntu-scripts/master/pdf/pdf-generate-install.sh
# chmod +x pdf-generate-install.sh
# ./pdf-generate-install.sh

After next login, you should be able to generate a PDF document from multiple files with a simple right click.

In case you detect any bug or if you have some update ideas which can benefit everybody, don't hesitate to contact me by email or to fork it on GitHub.

 

Hope it helps.

Signature Technoblog

This article is published "as is", without any warranty that it will work for your specific need.
If you think this article needs some complement, or simply if you think it saved you lots of time & trouble,
just let me know at This email address is being protected from spambots. You need JavaScript enabled to view it.. Cheers !

icon linux icon debian icon apache icon mysql icon php icon piwik icon googleplus