====== FuzzyOCR for Spamassassin on Debian ======
aptitude install ocrad
2 years later a new release supporting SpamAssassin 3.2 has not yet been tagged, so it is probably easiest to just use the Debian unstable package. Because it is perl, it does not seem to have any unreasonable version dependencies.
You will want to check for the latest version here:
wget -c http://ftp.us.debian.org/debian/pool/main/f/fuzzyocr/fuzzyocr_3.5.1+svn135-1_all.deb
dpkg -i fuzzyocr_3.5.1+svn135-1_all.deb
apt-get -f install
This version has a bug discussed here:
You will want to make the modifications after installation.
vi /usr/share/perl5/FuzzyOcr/Preprocessor.pm
====== Create a FuzzyOCR home ======
I wanted to keep the fuzzyocr log files and image hash databases in one place so I created a directory for them.
mkdir /var/lib/spamassassin/fuzzyocr
touch /var/lib/spamassassin/fuzzyocr/FuzzyOcr.log
chown -R spamd: /var/lib/spamassassin/fuzzyocr
And making a few configuration changes
@@ -34,7 +34,7 @@
# Level 2 - Errors, Warnings and Info Messages
# Level 3 - Full debug output
# Default value: 1
-#focr_verbose 3
+focr_verbose 2
# Log Message-Id, From, To
# Default: 1
@@ -42,6 +42,6 @@
# Send logging output to stderr.
# Default value: 1
-#focr_log_stderr 0
+focr_log_stderr 1
# Logfile (make sure it is writable by the plugin)
# Default value: none
@@ -179,7 +179,7 @@
# Timeout for the plugin, in seconds. (Maximum runtime of the plugin)
# Default value: 10
-#focr_timeout 15
+focr_timeout 15
# Use a global timeout value instead of per helper application.
# Default value: 0
@@ -299,7 +299,7 @@
# skip the scans when the image is found in the database, using the score
# from the previous scans.
#--
-#focr_enable_image_hashing 3
+focr_enable_image_hashing 2
# Set this to skip updating the hashing database at startup
# Default value: 0 (update at startup)
@@ -323,16 +323,16 @@
# If the image hash db feature is enabled (Type 2 Hashing),
# specify the file to use as the SPAM database
# Default value: /etc/spamassassin/FuzzyOcr.db
-#focr_db_hash /etc/spamassassin/FuzzyOcr.db
+focr_db_hash /var/lib/spamassassin/fuzzyocr/FuzzyOcr.db
# If the image hash db feature is enabled (Type 2 Hashing),
# specify the file to use as the HAM database
# Default value: /etc/spamassassin/FuzzyOcr.safe.db
-#focr_db_safe /etc/spamassassin/FuzzyOcr.safe.db
+focr_db_safe /var/lib/spamassassin/fuzzyocr/FuzzyOcr.safe.db
# Auto-prune: Expire records from hasing databases after these many days
# Default value: 35
-#focr_db_max_days 15
+focr_db_max_days 15
###
### MySQL options (Type 3 Hashing)
Restart spamassassin and test
/etc/init.d/spamassassin restart
tail -f /var/lib/spamassassin/fuzzyocr/FuzzyOcr.log
====== Maintenance ======
Create a logrotate file /etc/logrotate.d/fuzzyocr:
/var/lib/spamassassin/fuzzyocr/FuzzyOcr.log {
daily
missingok
rotate 10
compress
delaycompress
notifempty
create 640 spamd spamd
}
Schedule a daily cleanup in cron to remove temporary images:
crontab -e -u spamd
@daily perl /usr/share/doc/fuzzyocr/Utils/fuzzy-clean