Copyright (C) 2006 Masahiko Ito
These programs is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
These programs is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with these programs; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Mail suggestions and bug reports for these programs to
"Masahiko Ito" <m-ito@myh.no-ip.org>
Nowadays, Bayes method is often used to filtering spam. But it's a little hard to understand for stupid me :P. I thought there must be more simple and enough effective way to filter spam. and I have done to make it true :)
create table t_white ( term text primary key, count long long int ); create table t_black ( term text primary key, count long long int );Coming message is divided to terms(term1 to n) and count each them(count1 to n).
Search term1 from t_white and calculate white_score.
(count in t_white) x (count1) white_score = ----------------------------- (sum of all count in t_white)
Search term1 from t_black and calculate black_score.
(count in t_black) x (count1) black_score = ----------------------------- (sum of all count in t_black)
calculate score from white_score and black_score.
white_score score = ------------------------- - 0.5 white_score + black_scorerange of score is -0.5 to +0.5. Negative means spam, Positive means non-spam.
and calculate all scores of terms(term1 to n).
At final, if sum of all scores is negative, It is judged that the message is spam.
$ sf_init.sh -h Usage : sf_init.sh Initialize database.It initialize database for spam data. it must be invoked only once at beginning to use sf-0.5.
$ sf_add.sh -h Usage : sf_add.sh [-w|--white|-b|--black] [-v|--vacuum] [file ...] Add data to database. -w, --white add data to white database. -b, --black add data to black database. -v, --vacuum vacuum after add.It make sf-0.5 to learn spam data for adding.
$ sf_del.sh -h Usage : sf_del.sh [-w|--white|-b|--black] [-v|--vacuum] [file ...] Del data from database. -w, --white del data from white database. -b, --black del data from black database. -v, --vacuum vacuum after del.It make sf-0.5 to learn spam data for deleting.
$ sf_check.sh -h Usage : sf_check.sh [-w|--white|-b|--black] [file ...] Check file. -w, --white check white? -b, --black check black? return 0 when check is true. return 1 when check is false.It judge file(or stdin) is spam or non-spam. it show score(negative means spam) and return exit status (0 means TRUE, 1 means FALSE). If database is not leaned at all, it always show 0.0.
poll pop.anywhere.org proto pop3 user POP_ACCOUNT_NAME password POP_PASSWORD is LOCAL_USERNAME no keep flush no fetchall mda "/usr/bin/procmail -f %F"If `mda "/usr/bin/procmail -f %F"' line is not specified, fetchmail sends message from pop server to sendmail in local. sendmail can accept message parallel, so if too many messages are in pop server, local's loadaverage increase rapidly.
If `mda "/usr/bin/procmail -f %F"' line is specified, fetchmail calls procmail to deliver messages, so local's loadaverage may not increase rapidly.
:0 HB * ? sf_check.sh -b /home/foo/Mail/spam/.