.oO  |  List directory  |  History  |  Similar  |  Print version
Home 
Main 
   Databases 
   qmail 
   Old Braindump Pages 
Linux 
FreeBSD 

Main > FuzzyOcr for SpamAssassin on Debian

 
rw-rw-r--   ian   linux

FuzzyOcr for SpamAssassin on Debian

Image spam has seen an increase in use lately, and getting SpamAssassin to recognize keywords from attached images is best done with OCR (Optical Character Recognition).

Installation

OCRAD is the easiest OCR scanning engine to use on Debian 4.0 because it is reasonably current.

aptitude install ocrad

2 years later a new release supporting SpamAssassin 3.2 has not yet been tagged, so it is probably easiest to just use the Debian unstable package. Because it is perl, it does not seem to have any unreasonable version dependencies.

You will want to check for the latest version here: http://packages.debian.org/fuzzyocr

wget -c http://ftp.us.debian.org/debian/pool/main/f/fuzzyocr/fuzzyocr_3.5.1+svn135-1_all.deb
dpkg -i fuzzyocr_3.5.1+svn135-1_all.deb
apt-get -f install

This version has a bug discussed here:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=522285

You will want to make the modifications after installation.

vi /usr/share/perl5/FuzzyOcr/Preprocessor.pm

Create a FuzzyOCR home

I wanted to keep the fuzzyocr log files and image hash databases in one place so I created a directory for them.

mkdir /var/lib/spamassassin/fuzzyocr
touch /var/lib/spamassassin/fuzzyocr/FuzzyOcr.log
chown -R spamd: /var/lib/spamassassin/fuzzyocr

And making a few configuration changes

@@ -34,7 +34,7 @@                                                     
 # Level 2 - Errors, Warnings and Info Messages                       
 # Level 3 - Full debug output                                        
 # Default value: 1                                                   
-#focr_verbose 3                                                      
+focr_verbose 2                                                       
                                                                      
 # Log Message-Id, From, To                                           
 # Default: 1                                                         
@@ -42,11 +42,11 @@                                                   
                                                                      
 # Send logging output to stderr.                                     
 # Default value: 1                                                   
-#focr_log_stderr 0                                                   
+focr_log_stderr 0                                                    

 # Logfile (make sure it is writable by the plugin)
 # Default value: none
-#focr_logfile /tmp/FuzzyOcr.log
+focr_logfile /var/lib/spamassassin/fuzzyocr/FuzzyOcr.log

 ###
 ### Wordlists
@@ -179,7 +179,7 @@

 # Timeout for the plugin, in seconds. (Maximum runtime of the plugin)
 # Default value: 10
-#focr_timeout 15
+focr_timeout 15

 # Use a global timeout value instead of per helper application.
 # Default value: 0
@@ -299,7 +299,7 @@
 # skip the scans when the image is found in the database, using the score
 # from the previous scans.
 #--
-#focr_enable_image_hashing 3
+focr_enable_image_hashing 2

 # Set this to skip updating the hashing database at startup
 # Default value: 0 (update at startup)
@@ -323,16 +323,16 @@
 # If the image hash db feature is enabled (Type 2 Hashing),
 # specify the file to use as the SPAM database
 # Default value: /etc/spamassassin/FuzzyOcr.db
-#focr_db_hash /etc/spamassassin/FuzzyOcr.db
+focr_db_hash /var/lib/spamassassin/fuzzyocr/FuzzyOcr.db

 # If the image hash db feature is enabled (Type 2 Hashing),
 # specify the file to use as the HAM database
 # Default value: /etc/spamassassin/FuzzyOcr.safe.db
-#focr_db_safe /etc/spamassassin/FuzzyOcr.safe.db
+focr_db_safe /var/lib/spamassassin/fuzzyocr/FuzzyOcr.safe.db

 # Auto-prune: Expire records from hasing databases after these many days
 # Default value: 35
-#focr_db_max_days 15
+focr_db_max_days 15

 ###
 ### MySQL options (Type 3 Hashing)

Restart spamassassin and test

/etc/init.d/spamassassin restart
tail -f /var/lib/spamassassin/fuzzyocr/FuzzyOcr.log

Maintenance

Create a logrotate file /etc/logrotate.d/fuzzyocr:

/var/lib/spamassassin/fuzzyocr/FuzzyOcr.log {
        daily
        missingok
        rotate 10
        compress
        delaycompress
        notifempty
        create 640 spamd spamd
}

Schedule a daily cleanup in cron to remove temporary images:

crontab -e -u spamd
@daily perl /usr/share/doc/fuzzyocr/Utils/fuzzy-clean


Reference http://braindump.mrzesty.net/Main/FuzzyOcrForSpamAssassinOnDebian

Comments: 0 New comment

Prev. fail2ban  


Home | Main | Linux | FreeBSD