POPFile Reference Manual and User's Guide
Contents
Introduction
POPFile is a POP3-based proxy written in Perl that sites between a mail client
(such as Microsoft Outlook) and a mail server and intercepts email as it is downloaded.
Your mail client is reconfigured to talk directly to a copy of POPFile running on your
computer and POPFile talks on your behalf to the mail server. In that way POPFile gets
to see all incoming mail.
POPFile is used to classify email into one of a collection of buckets. Buckets are
user defined an usually correspond to folders within an email program or categories of
mail that the user receives. For example a user might choose to split their email into
buckets for work, family, friends, a flower arranging mailing list, unwanted spam and
other.
POPFile automatically reads, splits open and analyzes each email as it is downloaded to
determine which bucket mail should be placed in. POPFile signals the determined bucket
by adding extra text to the Subject: line of an email containing the bucket name and by
adding an extra email header containing the bucket name. With the bucket name embedded in
the email any email client can filter the mail into an
appropriate folder automatically. The combination of automatic mail bucket idenfitication
and simple filters in an email client means that POPFile can provide spam filtering and
automatically place messages from family in one folder, move work messages to a high
priority.
POPFile uses a powerful mathematical technique known as Bayes Theorem and an engine that
quickly splits email into words, open and parses attachments and filters out HTML to automatically
identify which bucket an email should go in. All POPFile needs is to be shown some
examples of emails and which buckets they belong in. Based on the examples POPFile learns
the characteristics associated with each bucket and can sort incoming email as required. Additional
examples and even additional buckets can be added at any time.
Installing POPFile
POPFile is so small and so simple that is doesn't have an installer. To install POPFile you need to
perform the following steps:
- Make sure that you have Perl installed on your system. To get Perl visit
Perl.com and get the right
version for your OS.
- Download POPFile from the POPFile home page.
- Unzip the contents of the POPFile zip file into a folder.
- That's it... POPFile is installed.
POPFile consists of two programs:
- insert.pl - The program used to add new buckets and new example emails to teach
POPFile about your mail preferences.
- popfile.pl - The POP3 proxy program that runs whenever you want to download mail
and performs email classification.
Using POPFile
To use POPFile you must decide on the set of different buckets you want your mail sorted into. In
this manual we'll use the example buckets: work, personal and spam. You can choose as many buckets
as you want (make sure that you have at least two otherwise POPFile is a little pointless).
Once you've decided on the set of buckets you are going to use separate your mail into those
buckets by exporting the mail from your mail client.
Mail is inserted into the POPFile system by running the insert.pl specifying the name of the bucket
you wish to insert mail into and the name of a file contain a mail message or messages. For example if you
saved all your spam into a file called spam.txt then you could insert it into a spam bucket with the command
perl insert.pl spam spam.txt
If you placed many different files containing family mail in a folder called familymail then you could insert it
into the family bucket with the command
perl insert.pl family familymail/*
Once your mail is inserted you need to configure your mail client to talk to POPFile and then run POPFile with the
command
perl popfile.pl
A simple strategy for handling mail insertion is to create a folder in your POPFile folder called mail and place subfolders for each
bucket under the mail folder with the same name as the bucket. If you had three buckets called work, personal and
spam then you'd create the folders mail, mail/work, mail/personal and mail/spam and copy messages
into the appropriate folder.
Once the messages are present mail insertion would be achieved with the commands
perl insert.pl spam mail/spam/*
perl insert.pl work mail/work/*
perl insert.pl personal mail/personal/*
In the following specific instructions we'll assume you are going to use this mail folder strategy.
Outlook Express supports drag and drop of mail messages into a folder on your hard drive. To set up a folder containing
mail messages for a personal bucket and then insert them do the following.
- In the POPFile folder create a folder called mail.
- In the mail folder create a folder called personal.
- In Outlook Express select the message or messages your want to extract (ctrl-click to select multiple messages)
- Drag the messages onto the personal folder.
- Run the command perl insert.pl personal mail/personal/* to insert the newly extracted messages.
Outlook Express will save the mail messages with their attachments in the folder and POPFile will open the attachments
using its own decoder looking for significant words even in attachments.
Outlook supports drag and drop of mail messages into a folder on your hard drive. To set up a folder containing
mail messages for a personal bucket and then insert them do the following.
- In the POPFile folder create a folder called mail.
- In the mail folder create a folder called personal.
- In Outlook select the message or messages your want to extract (ctrl-click to select multiple messages)
- Drag the messages onto the personal folder.
- Run the command perl insert.pl personal mail/personal/* to insert the newly extracted messages.
Although Outlook creates a binary file instead of a text file containing the mail message, POPFile knows how to handle
the binary file automatically.
Eudora allows you to save messages to a file. To set up a folder containing mail messages for a personal bucket and then
insert them do the following.
- In the POPFile folder create a folder called mail.
- In the mail folder create a folder called personal.
- In Eudora select the message or messages your want to extract.
- Click File->Save As... and navigate to the mail/personal folder, enter a file name such as messages.txt and make sure that Include Headers
is checked. Hit Save.
- Run the command perl insert.pl personal mail/personal/* to insert the newly extracted messages.
Eudora does not include attachments
Configuring your mail client
General instructions
To use POPFile you must configure your email program to talk directly to the POPFile program. All incoming
email passes through POPFile so that it is delivered in the optimal order. You will need to know three
important facts about your email service: the POP3 server name (for example pop.company.com), your POP3 user
name (for example joe) and your POP3 password.
To tell POPFile to operate you:
- Change the POP3 server name in your email client to 127.0.0.1 (if it was originally pop.company.com then
note that address and enter 127.0.0.1)
- Change the POP3 user name to the POP3 server name followed by a colon (:) and followed by your original
POP3 username (if your POP3 server name was originally pop.company.com and your POP3 user name was joe you would
change the POP3 user name to pop.company.com:joe).
Configuring POPFile in Outlook Express
- In Outlook Express select the Tools->Accounts... menu option. The Internet Accounts dialog will appear.

- Select the account you wish to modify to use POPFile and click on Properties.

- Choose the Servers tab. Make a note of the Incoming Mail (POP3) server name and the Incoming Mail Server Account Name.

- Change the Incoming Mail (POP3) server name to 127.0.0.1 and change the Incoming Mail Server Account Name to the combination of the original Incoming Mail (POP3) server name and the original Incoming Mail Server Account Name separated by a colon.

- Hit OK and then Close.
- Ensure that POPFile is running and all mail will be delivered through POPFile.
Configuring POPFile in Outlook
- In Outlook select the Tools->Email Accounts... menu option. The E-mail Accounts dialog will appear.

- Click View or change existing e-mail accounts and click Next. Choose the account you want to use POPFile with and click Change...
Make a note of the Incoming mail server (POP3) server name and the User Name.

- Change the Incoming mail server (POP3) server name to 127.0.0.1 and change the User Name to the combination of the original
Incoming mail server (POP3) server name and the original User Name separated by a colon.

- Hit Next and then Finish.
- Ensure that POPFile is running and all mail will be delivered through POPFile.
Configuring POPFile in Eudora
- In Eudora select the Tools->Options... menu option. The Options dialog will appear.

- Make a note of the Mail Server (incoming) server name and the Login Name.

- Change the Mail Server (incoming) server name to 127.0.0.1 and change the Login Name to the combination of the original
Mail Server (incoming) server name and the original Login Name separated by a colon.

- Hit OK.
- Ensure that POPFile is running and all mail will be delivered through POPFile.
Filtering based in POPFile classifications
General instructions
POPFile provides two clues to the bucket in which an email should be placed: a modification to the
Subject: line and the insertion of the X-Text-Classification: header. Depending on the capabilities
of your mail client you can choose either of these.
Since modifting the Subject: line actually changes the text of an email it is preferable that you
use the X-Text-Classification header and turn off Subject: line modification (see POPFile
command line reference for details).
The most common form of filtering is moving mail in a specific bucket to a specific folder.
Filtering in Outlook Express
- In Outlook Express select the Tools->Message Rules->Mail...

- The New Mail Rule window appears.

- Outlook Express cannot use the X-Text-Classification header so click Where the Subject line contains specific words and also click
Move it to the specified folder

- Now click the blue highlighted contains specific words link and the Type Specific Words dialog appears

- In this case we plan to put all spam directly in the Deleted Items folder so enter [spam] and hit Add, then hit OK. Now click the blue
highlighted specified word and the Move dialog appears. Click Deleted Items and hit OK.

- Hit OK and then OK.
- All mail with [spam] in the subject will now be placed in Deleted Items automatically.
Filtering in Outlook
- In Outlook select the Tools->Rules Wizard...

- The Rules Wizard appears. Click New... and select Start from a blank rule

- Outlook cannot use the X-Text-Classification header so Next and select with specific words in the subject.

- Now click the grey highlighted specific words link and enter [spam] in the box that appears and hit Add, then OK.

- Click Next and select move it to the specified folder.

- Click the grey highlighted word specified and a list of folders appears. Select Deleted Items and hit OK.

- Hit OK, then Finish, then OK.
- All mail with [spam] in the subject will now be placed in Deleted Items automatically.
Filtering in Eudora
Eudora allows filtering based on any mail header, so the best bet is to filter on the X-Text-Classification header that POPFile adds and turn
off Subject: line modification. (Run POPFile with the command line option -subject 0).
- In Eudora select the Tools->Filters

- The Filters window appears. Click New to create a new filter.

- In the Header box type X-Text-Classification and in the box next to contains type spam.

- The select Transfer To in the first Action box and select Trash from the drop down list.

- Close the Filters windows and say Yes when asked to save changes.
- All mail classified as spam will be automatically moved to Trash.
POPFile and Firewalls
perl is running the POPFile POP3 proxy and needs to be able to accept connections
from clients on your local machine and needs to be able to make connections
outbound to the Internet so that it can download your mail.
In ZoneAlarm configure perl.exe as follows:
POPFile and Secure Password Authentication
You need to tell POPFile the server name and port number that your email
server is running on when you start the program if you email client requires
Secure Password Authentication (sometimes called AUTH). Suppose that the server
was pop.secureserver.com running on port 123; you would type
perl popfile.pl -port 110 -sserver pop.secureserver.com -sport 123
to start POPFile listening on port 110 and ready to connect to
pop.secureserver.com on port 123 when it sees a secure authentication.
Then modify your mail client to talk to the proxy: the server becomes
127.0.0.1 and the port 110. The user name does not need to be changed.
POPFile command line reference
The POPFile popfile.pl has the following command line parameters.
- -port <port> Specifies the port number that POPFile should listen on for a connection
from your email client. By default POPFile listens on port 110.
Example: perl popfile.pl -port 123
- -debug <where> Specifies whether to output debug information. A value of 0 (the default)
means no debug information, a value of 1 means write debug information a log file, a value of 2 means
write debug information to the screen and 3 means both file and screen.
Example: perl popfile.pl -debug 2
- -subject <on> Specifies whether POPFile should modify the Subject: of an email or not. If
set to 1 (the default) then the Subject: will have the bucket added in the form [bucket] at the start of the
subject line. 0 turns this behaviour off.
Example: perl popfile.pl -subject 0
- -sserver <host> Specifies the host that POPFile should connect to when secure password
authentication (or AUTH) is used.
Example: perl popfile.pl -sserver spop.foo.bar.com
- -sport <port> Specifies the port number that POPFile should connect to when secure password
authentication (or AUTH) is used.
Example: perl popfile.pl -sport 991
Getting Help and Reporting Problems
To start understanding POPFile make sure that you've read this manual and the latest README file in the
Docs section of the POPFile home page.
If you are still having trouble take a look at the POPFile Forums where POPFile is discussed
and ask your question there.
If you find a bug please report it in the Bug Database
Appendix
The format of the corpus files
POPFile keeps track of the words related to each category in a folder called
corpus. Inside the corpus folder you'll find folders for each of
the buckets you have configured and inside each of those folders a file called
table
The table is a simple text file containing a list of words and the frequency
for each word. Any standard text editor can read the table file. Here's a section from
a real corpus/spam/table.
free 79
availability 1
evening 5
cds 1
running 2
pertaining 2
leave 2
magical 1
download 5
The word free appears 79 times and the word magical once. To investigate the
table files further use the viewer.pl script that comes with POPFile.
Other questions answered
- How do I chain multiple POP3 proxies together?
You can use the first command line option of POPFile to specify the port it
listens on and it automatically picks up the port to connect to from the
user name specified in the client.
Suppose you have another proxy running on the local machine on port 111 and
you have configured the user name in your client to be real-server-
name/real-user-name connecting to a server on 127.0.0.1 using port 110.
To use POPFile you would change the user name to 127.0.0.1:110:real-server-
name/real-user-name. This tells POPFile to connect to 127.0.0.1 on port 110
and pass in the user name real-server-name/real-user-name that the proxy
expects.
Then run POPFile on some port other than 110:
perl popfile.pl -port 111
And tell the email client to contact the server on 127.0.0.1 and port 111.
- Can I use multiple email accounts with POPFile?
If you are *not* using secure authentication then the answer is yes: point
each email configuration at 127.0.0.1:110 and POPFile will automatically
distinguish between the email accounts you are using, although it will use
only one email corpus for classification.
If you have multiple regular accounts (not using secure authentication) and
one that needs to be secure then you can still use POPFile. Run POPFile
with the command line specified in question 5 and point all your accounts at
127.0.0.1:110. POPFile will use the appropriate account for the secure
email account, and distinguish the rest automatically.
POPFile cannot currently be used with multiple accounts requiring secure
authentication.