Automating PDF processing

To automate PDF processing, you need to grab the following first from your repository:

apt-get install git qpdf exiftool pdftk poppler-utils tesseract-ocr imagemagick-6.q16

Then, install PDF Ingest into a desired folder

git clone https://github.com/tezcatlipoca/pdf_ingest

Enter the followind folder:

cd pdf_ingest
mkdir DST
mkdir SRC

Put the files you wish to convert into SRC and type the following command:

./bulk

Wait until done, then you’re done! 🙂

Loading

Doing a large MySQL dump

After discovering that PHPMyAdmin is not suitable to dump or insert large databases, I did a quick search into how things are done by command line.

This worked well and my database was finally not corrupted. Host 1 is a cheap shared hosting provider with a few limitations regarding to internal data transfer, CPU and memory. Host 2 is a VPS with 4 cores, 4GB of memory with an overall decent data speed.

Keep the passwords ready for both hosts.

HOST 1

mysqldump -u [USERNAME] -p [DBNAME] | gzip > [/path_to_file/DBNAME].sql.gz

Copy the file over to Host 2

HOST 2

gzip -d [/path_to_file/DBNAME].sql.gz
[/path_to_mysql/]mysql -u [USERNAME] -p

Be very aware of what you are doing in the next steps since it involves a drop. And when you drop the wrong database, all that’s left is the cold sweat on your forehead.

SHOW DATABASES;
DROP DATABASE [DBNAME];
CREATE DATABASE [DBNAME];
USE [DBNAME];
SOURCE [/path_to_file/DBNAME].sql;

Done.

Saucysauce and more about conditional dumping: https://www.lullabot.com/articles/importexport-large-mysql-databases

Loading