POPFile Reference Manual and User's Guide

Contents

Introduction

POPFile is a POP3-based proxy written in Perl that sites between a mail client (such as Microsoft Outlook) and a mail server and intercepts email as it is downloaded. Your mail client is reconfigured to talk directly to a copy of POPFile running on your computer and POPFile talks on your behalf to the mail server. In that way POPFile gets to see all incoming mail.

POPFile is used to classify email into one of a collection of buckets. Buckets are user defined an usually correspond to folders within an email program or categories of mail that the user receives. For example a user might choose to split their email into buckets for work, family, friends, a flower arranging mailing list, unwanted spam and other.

POPFile automatically reads, splits open and analyzes each email as it is downloaded to determine which bucket mail should be placed in. POPFile signals the determined bucket by adding extra text to the Subject: line of an email containing the bucket name and by adding an extra email header containing the bucket name. With the bucket name embedded in the email any email client can filter the mail into an appropriate folder automatically. The combination of automatic mail bucket idenfitication and simple filters in an email client means that POPFile can provide spam filtering and automatically place messages from family in one folder, move work messages to a high priority.

POPFile uses a powerful mathematical technique known as Bayes Theorem and an engine that quickly splits email into words, open and parses attachments and filters out HTML to automatically identify which bucket an email should go in. All POPFile needs is to be shown some examples of emails and which buckets they belong in. Based on the examples POPFile learns the characteristics associated with each bucket and can sort incoming email as required. Additional examples and even additional buckets can be added at any time.

Installing POPFile

POPFile is so small and so simple that is doesn't have an installer. To install POPFile you need to perform the following steps: POPFile consists of two programs:

Using POPFile

Running POPFile

To use POPFile you must decide on the set of different buckets you want your mail sorted into. In this manual we'll use the example buckets: work, personal and spam. You can choose as many buckets as you want (make sure that you have at least two otherwise POPFile is a little pointless).

Once you've decided on the set of buckets you are going to use separate your mail into those buckets by exporting the mail from your mail client.

Mail is inserted into the POPFile system by running the insert.pl specifying the name of the bucket you wish to insert mail into and the name of a file contain a mail message or messages. For example if you saved all your spam into a file called spam.txt then you could insert it into a spam bucket with the command

perl insert.pl spam spam.txt
If you placed many different files containing family mail in a folder called familymail then you could insert it into the family bucket with the command
perl insert.pl family familymail/*
Once your mail is inserted you need to configure your mail client to talk to POPFile and then run POPFile with the command
perl popfile.pl

General instructions

A simple strategy for handling mail insertion is to create a folder in your POPFile folder called mail and place subfolders for each bucket under the mail folder with the same name as the bucket. If you had three buckets called work, personal and spam then you'd create the folders mail, mail/work, mail/personal and mail/spam and copy messages into the appropriate folder.

Once the messages are present mail insertion would be achieved with the commands

perl insert.pl spam mail/spam/*
perl insert.pl work mail/work/*
perl insert.pl personal mail/personal/*
In the following specific instructions we'll assume you are going to use this mail folder strategy.

Getting mail from Outlook Express

Outlook Express supports drag and drop of mail messages into a folder on your hard drive. To set up a folder containing mail messages for a personal bucket and then insert them do the following. Outlook Express will save the mail messages with their attachments in the folder and POPFile will open the attachments using its own decoder looking for significant words even in attachments.

Getting mail from Outlook

Outlook supports drag and drop of mail messages into a folder on your hard drive. To set up a folder containing mail messages for a personal bucket and then insert them do the following. Although Outlook creates a binary file instead of a text file containing the mail message, POPFile knows how to handle the binary file automatically.

Getting mail from Eudora

Eudora allows you to save messages to a file. To set up a folder containing mail messages for a personal bucket and then insert them do the following. Eudora does not include attachments

Configuring your mail client

General instructions

To use POPFile you must configure your email program to talk directly to the POPFile program. All incoming email passes through POPFile so that it is delivered in the optimal order. You will need to know three important facts about your email service: the POP3 server name (for example pop.company.com), your POP3 user name (for example joe) and your POP3 password.

To tell POPFile to operate you:

Configuring POPFile in Outlook Express

Configuring POPFile in Outlook

Configuring POPFile in Eudora

Filtering based in POPFile classifications

General instructions

POPFile provides two clues to the bucket in which an email should be placed: a modification to the Subject: line and the insertion of the X-Text-Classification: header. Depending on the capabilities of your mail client you can choose either of these.

Since modifting the Subject: line actually changes the text of an email it is preferable that you use the X-Text-Classification header and turn off Subject: line modification (see POPFile command line reference for details).

The most common form of filtering is moving mail in a specific bucket to a specific folder.

Filtering in Outlook Express

Filtering in Outlook

Filtering in Eudora

Eudora allows filtering based on any mail header, so the best bet is to filter on the X-Text-Classification header that POPFile adds and turn off Subject: line modification. (Run POPFile with the command line option -subject 0).

POPFile and Firewalls

perl is running the POPFile POP3 proxy and needs to be able to accept connections from clients on your local machine and needs to be able to make connections outbound to the Internet so that it can download your mail.

In ZoneAlarm configure perl.exe as follows:

POPFile and Secure Password Authentication

You need to tell POPFile the server name and port number that your email server is running on when you start the program if you email client requires Secure Password Authentication (sometimes called AUTH). Suppose that the server was pop.secureserver.com running on port 123; you would type
perl popfile.pl -port 110 -sserver pop.secureserver.com -sport 123
to start POPFile listening on port 110 and ready to connect to pop.secureserver.com on port 123 when it sees a secure authentication.

Then modify your mail client to talk to the proxy: the server becomes 127.0.0.1 and the port 110. The user name does not need to be changed.

POPFile command line reference

The POPFile popfile.pl has the following command line parameters.

Getting Help and Reporting Problems

To start understanding POPFile make sure that you've read this manual and the latest README file in the Docs section of the POPFile home page.

If you are still having trouble take a look at the POPFile Forums where POPFile is discussed and ask your question there.

If you find a bug please report it in the Bug Database

Appendix

The format of the corpus files

POPFile keeps track of the words related to each category in a folder called corpus. Inside the corpus folder you'll find folders for each of the buckets you have configured and inside each of those folders a file called table

The table is a simple text file containing a list of words and the frequency for each word. Any standard text editor can read the table file. Here's a section from a real corpus/spam/table.

free 79
availability 1
evening 5
cds 1
running 2
pertaining 2
leave 2
magical 1
download 5
The word free appears 79 times and the word magical once. To investigate the table files further use the viewer.pl script that comes with POPFile.

Other questions answered

SourceForge.net Logo