anti-spam-stuff.txt

   1 AUTHOR: Declan Moriarty <junk _ mail AT iol.ie>
   2
   3 DATE: 2005-11-12
   4
   5 LICENSE: GNU Free Documentation License Version 1.2
   6
   7 SYNOPSIS: Setting up an Open Source Anti-Spam kit on an lfs box
   8
   9 DESCRIPTION: With an emphasis on configuration, this provides
  10 Installation & Configuration Instructions for Mail-SpamAssassin-3.1.0
  11 and it's helper tools.
  12
  13 ATTACHMENTS:
  14
  15 spamstuff.tar.bz2       A config file and init script.
  16
  17 PREREQUISITES: A Basic understanding of unix, and a hatred of spam. This
  18 hint does _not_ apply to earlier versions of SpamAssassin, but you
  19 should be OK with most recent (or future) versions of other programs.
  20 Perl5 is required. A configurable mail server also helps. I would
  21 suggest postfix instead of qmail, but whatever you know well will
  22 probably do. If your mail is relayed to you, get procmail also, or some
  23 other mda, otherwise calling all these will be difficult. I also give
  24 instructions for formail (part of the postfix package), althouugh any
  25 similar mail handling utility can do.
  26
  27 HINT:
  28
  29 SECTION 1: INTRODUCTION.
  30
  31 This is long. The only consolation is that it's about all the reading
  32 you have to do. Some jargon first
  33
  34         Spam = Unsolicited Bulk email, that is mail that the user did
  35 not subscribe for. People who subscribe to a mailing list agree to
  36 receive to bulk mail. That is solicited. Spam is not. The word is from
  37 the film "Monty Python and the Holy Grail", where knights used as a
  38 weapon the repition of the word spam.
  39
  40         Ham = good mail
  41         a 'hit' is a test that identifies spam identifying something.
  42         false hits are tests that hit ham.
  43         False Positive  = Good mail wrongly marked as spam
  44         False Negatives  = Spam wrongly let through
  45         Lint = Test validity of setup
  46
  47         Set your goals. Set your spam policy. I don't want bulk mail, I
  48 don't want any spam in my mail,and I will accept false positives.
  49 Relying on an isp for relaying mail, I cannot reject at smtp level, so I
  50 silently delete spam, after checking the subjects and sender. Others
  51 will be different, and your policy will differ accordingly.
  52
  53 In fighting spam, you have many tools. Collect your first one.
  54
  55 1. From this moment on, start keeping your spam. you need every bit of
  56 it you can hold onto, for testing. Don't read it, just store it in a
  57 mailbox somewhere. About a Meg or two is enough. Collect a few
  58 mailboxes with 50 or so, and at least one with a hundred.
  59
  60 http:razor.sourceforge.net/
  61
  62 2. Razor-agents. This operates by sending checksums of mail to a central
  63 server. If they have been reported as spam, the mail is markable as
  64 spam. If not, the checksums are discarded and you are told the mail is
  65 OK.  It's very good, but relies on reporting. For commercial use, send
  66 an email (explaining your linux installation) to partners@cloudmark.com
  67
  68 http://www.rhyolite.com/anti-spam/dcc
  69
  70 3. DCC, The Distributed Checksum Clearinghouse. This operates as above,
  71 sending checksums, but the dcc counts how many times it has received
  72 that checksum. That is what it reports. The dcc also keeps all
  73 checksums, so the server database is bigger. It goes back about six
  74 months. The DCC is an effectiive check for bulk mail. I believe
  75 commtouch offer a commercial service.
  76
  77 http://spamassassin.apache.org/downloads.cgi
  78
  79 4. SpamAssassin-3.1.0 is a major revision on previous versions. It
  80 offers heuristic or rule-based vetting of email and employs blocklists,
  81 and several novel and unusual features. Very configurable - the
  82 workhorse, and the PITA. Unlike most Perl applications, this one is
  83 inclined to land 'jam side down' or in a mess, and sorting is necessary.
  84
  85 5. Others exist. Notably, Amavisd-new and clamav. This is a sensible
  86 balance for a home user. You may want clamav if you are processing mail
  87 for windoze clients. Amavisd-new is a sort of sweeper process. The
  88 trouble is, all run on perl, and there's a limit to any box's workload.
  89 I may include them later.
  90
  91 Ownerships:
  92
  93 Preferred practise is not to run anything as root, and most of the mail
  94 programs will become user 'nobody' if they find themselves running with
  95 uid 0. Also, you do not want to make a 'super-luser' who has everything
  96 set up for him, as then if any process is breached, they have access to
  97 the whole box. So mail is handled by restricted users with few
  98 privileges until the delivery, which is done as the user to whom mail is
  99 delivered. The ultimate in this is qmail, which has a mexican wave of
 100 processes owned by users with shells like /bin/true, appearing and
 101 dissappearing playing pass-the-parcel while your mail goes through.
 102
 103 Installation instructions specify a reccomended user. Make your choice
 104
 105                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 106
 107 SECTION 2. INSTALLING:
 108
 109 Spam:
 110
 111 1. The spam seems to land naturally. If it doesn't, I can probably send
 112 you some. But if you really want pain, register a domain. You instantly
 113 go on every spammer's list. Then you get email from spammers offering
 114 you a mailing list to spam with _every_ address from registered domains
 115 :-/. If spam doesn't land, what are you doing here?
 116
 117 Razor-agent:
 118
 119 2. Razor agents. You need razor-agents, and razor-agents-sdk.  You also
 120 need to know that this service is marketed to windoze users at profit,
 121 and the open source community receive it free, or cheap. Free for
 122 individuals, cheap for business use under linux.
 123
 124 This is a perl program. To avoid messing I reccomend a symlink
 125 between /usr/lib/perl5 and /usr/local/lib/perl5. Presuming you
 126 are following LFS instructions, only one of those directories
 127 should exist now. If perl libs are on /usr/local, it will never
 128 check /usr/lib, and vice versa. This makes sure that what you
 129 install will be found.
 130
 131 Install razor-agents-sdk first with
 132         Perl Makefile.PL &&
 133         make &&
 134         make test
 135 These should pass, then install with
 136         make install
 137
 138 Repeat for razor-agents
 139
 140 You get 4 tools, razor-check, razor-report, razor-revoke, and
 141 razor-admin, each with it's own man page. The default log I have in
 142 /var/log/razor-agent.log instead of a homedir, but it should be owned
 143 and writable by the configured user
 144
 145 After install, change to the razor user, and run 'razor-admin
 146 -create'.  You should now have a ~/.razor subdir.
 147
 148 Razor-admin -register registers an identity with cloudmark, which you
 149 need for reporting & revoking. Follow the prompts.  Razor attaches a
 150 seriousness level to your reports. If you report spam that nobody else
 151 ever does, you're an idiot. If you report what others subsequently do,
 152 that's good. Your revokes are also examined; If you revoke what isn't
 153 spam, that's good.  If you revoke the wrong stuff, you're a twit. That's
 154 all in their software, and don't worry. As good netizens receiving a
 155 free service, however, we want to provide feedback.
 156
 157 Tart up ~/.razor-agents.conf to suit your site, copy the entire ~/.razor
 158 subdir to /etc/razor for a sitewide config. To allow other users to
 159 report, let them copy /etc/razor to ~/.razor and the same identity is
 160 used.
 161
 162 With the config done up above, you should be able to save off a spam
 163 email as it's own mailbox (save to a mailbox called 'test' or
 164 something). In a terminal, type
 165
 166         'cat test | razor-check -d'
 167
 168 type 'cat test | razor-report' to report it.
 169
 170 If this doesn't happen, check the firewall. Open Outgoing TCP port 2703
 171 (Razor2) and TCP port 7 (Echo), then try again. Presuming trouble,
 172
 173 cat test | razor-report -d > somefile.txt gives you verbose output of
 174 actions and you can spot problems that way.
 175
 176 Vipul does not want any automatic reporting set up. One exception is if
 177 you have mail adresses which you know are going to be 100% spam, as
 178 seeded spamtraps, and you may indeed forward them. We will want to
 179 report manually, being good netizens. Be aware that the checksums are on
 180 the body, as the headers will differ anyhow. Further if you report spam
 181 sent to a mailing list, you're a twit, because they usually add  a
 182 footer, making the mailing list copy different from the original. The
 183 list owner can report it, as he gets an unmodified copy.
 184
 185
 186 DCC:
 187
 188 3. This is a bit trickier to play with.
 189
 190 tar -zxvf dcc.tar.Z opens the archive.
 191
 192 There is also dccm, a 'milter' for sendmail. If you use sendmail, and
 193 figure this out, please send me an appropiate chunk of hint on it, and
 194 I'll include it.
 195
 196 This is a small, < 1000 messages per day setup using anonymous
 197 settings. Over that, contact somebody for a service (e.g.
 198 Commtouch). Over 100k messages, you start to save bandwidth by
 199 running your own servers.
 200
 201 Select a user:group for this to live as and insert
 202 them in lines 2422 & 2423 of the configure script instead of
 203 'bin:bin'. No matter what options you provide, manpages will not
 204 install without this mod. Find that user's uid (in /etc/passwd)
 205 and put in in for UID in this line
 206
 207 ./configure --disable-server --disable-dccm with-uid=UID \
 208 --with-rundir=/tmp  &&
 209 make
 210 Then 'make install' as root.
 211
 212 --disable-server does just that; --disable-dccm disables building the
 213 sendmail milter; --with-rundir=/tmp puts the dccifd.pid in /tmp.
 214 Otherwise it wasnt a user writable /var/run/dcc/ for the pid, and some
 215 shutdown script clears out /var/run anyhow, removing /var/run/dcc/. This
 216 is all a pain in LFS.
 217
 218 cd to /var/dcc and edit dcc_conf you need to change
 219         DCC_RUNDIR = /tmp
 220         GREY_ENABLE = 'off' (blank) unless you know what you're up to.
 221         DCCIFD = on
 222         DCCIFD_ARGS = -m /var/dcc/map -t cmn, 20 -S mail_host -x
 223
 224 The syslog facility in LFS is not mail.err, but mail.log. Fix that also,
 225 and anything else to suit your site. Check the final lines. Razor finds it's
 226 own servers - dcc wants you to specify yours. Presuming you have a small
 227 private installation within their license, Connect to the internet,
 228 backup /var/dcc/map and enter the config shell by typing (as root)
 229
 230 cd /var/dcc
 231 mv map map.orig
 232 cdcc  # This gives a cdcc shell. Enter the following:
 233
 234 cdcc>   load map.txt    # Takes in their map.txt of default servers
 235 cdcc>   trace default   # this delays, and returns information.
 236 cdcc>   info            # This should show resolved dcc servers. If it
 237                         doesn't check your internet connection.If 127.0.0.1 is
 238                         your server, it's no use to you.
 239 cdcc>   new map         # should write /var/dcc/map, a map of servers
 240 cdcc>   quit
 241
 242 You have built
 243
 244         1. cdcc - a setup program
 245         2. dccproc - executable checker - mainly for you
 246         3. dccifd - The daemon used by spamassassin's spamd/spamc.
 247
 248 start the daemon with
 249         /var/libexec/dccifd -I user:group
 250
 251 It returns one line about changing uids and then retires into the
 252 background. 'pgrep dccifd' shows me 2 pids. There should be a (newly
 253 created) socket in /tmp, or maybe /var/dcc. 'pkill dccifd' should remove
 254 socket and pids. The user chosen should be able to write to (ie touch
 255 should succeed) the socket.
 256
 257 Other Configuration:
 258
 259         1. There is a whitelist /var/dcc/whiteclnt. Whitelist everyone
 260 you can think of - linuxfromscratch.org, ebay, paypal, and any other
 261 list server you may be on. This bit '-S mail_host' told dccifd to
 262 mention check mail_host in the header. This allows you to add mail_hosts
 263 to /var/dcc/whiteclnt in the appropiate section. Putting in IPs is no
 264 use. You can specify any header, but it only passes one, so don't
 265 spacify mail_host if you want to use some other header.
 266
 267         2. There is a blacklist file, which isn't a lot of use as the
 268 spammers have to keep hopping from one place to another anyhow.  If
 269 certain weirdos stay stuck in the same place, they belong in a
 270 blacklist.
 271
 272         3. Greylisting is also an option. You may theoretically lose a
 273 small percentage of mail with this. It works as follows. In every mail
 274 transaction where this is done, your mail server says "Not right now -
 275 I'm busy. Send it in half an hour" Proper mail servers will send it
 276 later. Poorly set up mail servers may lose mail, either by not sending,
 277 or resending immediately and then giving up. Spammers will not resend in
 278 99% of cases, seeing as they can't hold messages back while relaying
 279 illegally through other servers with ease. So you don't get spammed, and
 280 your name comes off their list. That's the theory.
 281         Forget this if you have pop or imap. You'll reject nothing -
 282 just leave them on your server. This is for directly connected boxes
 283 receiving their mail by smtp only.
 284
 285 Some words on querying: dccproc is like razors check, except it reports
 286 as well by default. If you check & report ham repeatedly with dcc, the
 287 count keeps going up. Use the -Q option for repeat tests to avoid
 288 reporting again.  Each user is supposed only to report each mail once.
 289 For your tests, cat message | dccproc -QC checks and computes checksums
 290 without reporting.
 291 I would suggest a startup script for dcc and spamd (The server end of
 292 spamassassin). Mine is available.
 293
 294 The threshold figure is set by -t. The three checksums are body, fuz1
 295 and fuz2. All are covered by the 'cmn' setting. DCC say to set them at
 296 'many'. I found results dissappointing, and set it to 20, where things
 297 worked better. My dccifd options are
 298
 299 -I luser:group  # Who it runs as. A real person, please.
 300                 # You need this or it runs as root!
 301 -p /tmp/dccifd  # Location of socket.
 302 -m /var/dcc/map # Location of map [Default /var/dcc/map}
 303 -d -B set:debug # Debug (both options)
 304 -x              # Try extra hard to connect to a server (I needed that)
 305 -t cmn,20       # Set all thresholds to 20
 306
 307 Make sure to finish the 'stop' section with rm -f /tmp/dccifd to
 308 remove a stray socket if it exists. An old socket or pid will prevent dccifd
 309 from restarting.
 310
 311 To test, cat test |dccproc -QC
 312
 313 It should return something like this
 314
 315 X-DCC-CollegeOfNewCaledonia-Metrics: genius 1189; Body=47 Fuz1=84
 316 Fuz2=84
 317                             reported: 0               checksum  server
 318                  env_From: 5469b142 22af2632 54c4c668 28e32b2e
 319                      From: 55e30375 f82be1b7 c4cd63f1 1a942cc3
 320                Message-ID: 70489480 1a6e3c39 561ad9e9 5d9d6b1d
 321                  Received: d6b6cd69 a686160f 3a6cbc4b 0680596e
 322                      Body: 213f0668 14a13b4f de8a25e1 3ebf5548      47
 323                      Fuz1: 965e5582 e856e858 e775658e 00321ffd      84
 324                      Fuz2: 4f6dc268 7b2844ec 6444c79a e3508371      84
 325
 326
 327 You should not see 127.0.0.1. If you don't see the count, drop the -Q
 328 once. Lastly, run your startup command for dccifd. Stdout should see
 329
 330 getpwnam(genius:users): Success.  The socket should be created, thusly
 331
 332 srw-rw-rw-    1 root     root     0 2005-11-21 06:49 /tmp/dccifd=
 333
 334 A favourite failure mode is to start & exit, leaving the socket, & maybe
 335 even the pid file, thus preventing future startups. Permissions!
 336
 337 Once dccifd is running, you need to use spamassassin to check that it is
 338 working, but results from dccproc are a very good indicator.
 339
 340
 341                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 342
 343 SPAMASSASSIN:
 344
 345 4. Cancel the day's appointments and buy yourself in some alcaholic
 346 tranquilizer. You may need it. Open the archive.  Become root.
 347 If you had a previous version of Spamassassin, read the UPGRADE
 348 file.  Heavy going.
 349
 350
 351         REQUIREMENTS:
 352
 353 IPV6 in kernel (Some module 'requires' ipv6, which needs kernel support)
 354 OpenSSL-0.9.7   For the SSL modules and fancy encrypted stuff
 355 DB-4.3.27       for the database stuff. Perhaps mysql would do..tell me.
 356 Perl + Modules as outlines below.  I had version 5.8.5, and gcc-3.3.1
 357
 358         OPTIONAL:
 359
 360 pcre            Check mail with Perl Compatible Regular Expressions
 361 Formail         For playing with mailboxes
 362 Mysql           depending on your database preferences.
 363 mboxsplit       The spamassassin substitute for formail. A real puzzle.
 364
 365
 366         GOTCHA:
 367
 368 Installs will decide for you whether your perl libs are in
 369 /usr/local/lib/perl5 or /usr/lib/perl5. Only one of those should exist.
 370 If both exist, modules previously installed  have created the lib in the
 371 wrong place, and you have a problem there. Prevent it happening by
 372 symlinking.
 373
 374 ln -s /usr/<existing>/lib/perl5 /usr/<non-existing>/lib
 375
 376 That way, all files end up in one location. Some will reference it as
 377 /usr/local/lib/perl5, and some (Inc spamassassin) as /usr/lib/perl5
 378
 379         INSTALLATION:
 380
 381 Open the Mail-SpamAssassin archive, log in as a luser and open
 382 the INSTALL in one console(1), while you raid CPAN as root in the
 383 second (2). I would reccomensd another  root console (3), to sort things
 384 out. The commands you need in (2) are
 385
 386         perl -MCPAN -e shell    #open a perl shel
 387         o conf prerequisites_policy ask # get prerequisites
 388
 389 That sets you up. Then
 390
 391         i <Module::Name>  # What's the story with <Module::Name>
 392
 393         install <Module::Name> # guess!
 394
 395 In the spamassassin install file (1) find the section "Modules".
 396 Optional modules not really optional. Below is my list from
 397 /usr/local/lib/perl5/5.8.5/i686-linux/perllocal.pod. The order is left
 398 to right, top to bottom. That will minimize the hitches.
 399
 400 Module::Info            Digest::SHA1
 401 HTML::Tagset            HTML::Parser
 402 Digest::HMAC            Net::IP
 403 Net::DNS                Net::CIDR::Lite
 404 Sys::Hostname::Long     Mail::SPF::Query
 405 IP::Country             Time::HiRes
 406 Business::ISBN::Data    Business::ISBN
 407 Compress::Zlib          MIME::Base64
 408 Archive::Tar            Algorithm::Diff
 409 Text::Diff              Net::SSLeay
 410 IO::Socket::SSL         Crypt::OpenSSL::Random
 411 Crypt::OpenSSL::RSA     Mail::DomainKeys
 412
 413 Razor-agents-sdk also installs some of these modules, and some other
 414 ones. Above is the Spamassassin list.
 415
 416 If you have anything of value in /usr/share/spamassassin or
 417 /usr/local/share/spamassassin, _back_it_up! It will get overwritten or
 418 wiped. Any bizarre rulesets can go in /etc/mail/spamassassin.
 419
 420 Finally, install Spamassassin with
 421
 422         perl Makefile.PL &&
 423         make &&
 424         make test (Bless your patience :) &&
 425         make install
 426
 427 If you install it before updating perl, it barfs over some modules.
 428 Now, you probably will have /usr/share/spamassassin full of the latrest
 429 rules.
 430
 431
 432         CONFIG:
 433
 434 Here's where I hope you have pcregrep and formail. This is actually
 435 basically operable usually, but in a mess. I would suggest surfing to
 436
 437 http://www.rulesemporium.com/rules.htm
 438
 439 and download whatever rule sets you choose. Pop them in
 440 /etc/mail/spamassassin. As root, mv the original local.cf (if it exists)
 441 aside and download mine. Pop it likewise in /etc/mail/spamassassin.
 442 Download 70_sare_sc_top200.cf also. Don't install it, just keep it handy.
 443
 444 Enable all plugins. The plan apparently is to keep adding .pre files for
 445 plugins. I suggest leaving init.pre untouched and enabling all plugins
 446 in v310.pre. The lines are
 447
 448 in init.pre:
 449
 450 loadplugin Mail::SpamAssassin::Plugin::URIDNSBL
 451 loadplugin Mail::SpamAssassin::Plugin::Hashcash
 452 loadplugin Mail::SpamAssassin::Plugin::SPF
 453
 454 in v310.pre:
 455
 456 loadplugin Mail::SpamAssassin::Plugin::RelayCountry
 457 loadplugin Mail::SpamAssassin::Plugin::Razor2
 458 loadplugin Mail::SpamAssassin::Plugin::TextCat
 459 loadplugin Mail::SpamAssassin::Plugin::AntiVirus
 460 loadplugin Mail::SpamAssassin::Plugin::Pyzor
 461 loadplugin Mail::SpamAssassin::Plugin::DCC
 462 loadplugin Mail::SpamAssassin::Plugin::SpamCop
 463 loadplugin Mail::SpamAssassin::Plugin::AutoLearnThreshold
 464 loadplugin Mail::SpamAssassin::Plugin::AccessDB
 465 loadplugin Mail::SpamAssassin::Plugin::WhiteListSubject
 466 loadplugin Mail::SpamAssassin::Plugin::DomainKeys
 467 loadplugin Mail::SpamAssassin::Plugin::MIMEHeader
 468 loadplugin Mail::SpamAssassin::Plugin::ReplaceTags
 469
 470
 471 Download my init script or write your own. You need to start dccifd
 472 (because spamc/spamd use that) and spamd. Spamassassin wants to be a
 473 user, but not a real one. I added the user spamc in the group postfix.
 474 I have a pause (5 seconds) in the restart option so spamd will let go
 475 of ports before they try to take hold again. My spamd options are:
 476
 477 -d              # Daemonize = get lost in the background
 478 -l              # allow learning thus facilitating bayes
 479 -m 10           # Max processes. These are seriously memory hungry
 480 I only have 10 to facilitate mass tests. 5 is plenty.
 481 -u spamc        # run as user spamc. Otherwise it's nobody, and
 482 things fall over, because nobody can't write.
 483
 484         Now I presume you will copy in my available config file and edit
 485 that, rather than your own. I describe a sitewide config, but user
 486 configs can be created, and maintained by different users. The same process
 487 applies. spamassassin -c creates a user config. You can test your setup with
 488 (as anybody:)
 489
 490         cat test | spamc -R - you should get a report, and an extract.
 491
 492 root is a positive disadvantage for all mail tests, as these programs
 493 refuse to hold onto root priviliges, and drop to a specified user, or to
 494 nobody. They are all called by the user _receiving_ the mail, so they
 495 can write in his maildir, which typically has 0600 permissions. Root
 496 will never receive mail this way, as user nobody certainly can't write
 497 to root's directory! Alias root to a user. You need root for starting these
 498 tools however
 499
 500         Sorting out the bugs in things (There will be many) is achieved
 501 by these commands.
 502
 503         1. spamassassin -D --lint > debug.txt 2>&1 Examine this file for
 504 negatives
 505         2. Change the -d to -D for spamd and restart from a root
 506 terminal. It will hold the terminal, and spew information.
 507
 508         3. Poring over the entrails of /var/log/mail.log. All mail
 509 programs write to mail.log. If someone knows how to set up a separate
 510 syslog facility, let me know and I'll stuff one in for spam. I did have
 511 a go myself, but things fell over so I reverted.
 512
 513 Look for the things that didn't happen, and config lines not parsed.
 514 Your rulesets, I presume, will be different from mine. Here's mine:
 515
 516 [root@genius ~]# ls /etc/mail/spamassassin
 517 20_dec.cf           70_sare_html1.cf    72_sare_bml_post25x.cf
 518 99_DEC_Tripwire.cf  70_sare_adult.cf    70_sare_obfu.cf      82_antidrug.cf
 519 99_FVGT_meta.cf     init.pre            70_sare_genlsubj0.cf 70_sare_oem.cf
 520 88_FVGT_body.cf     local.cf            70_sare_genlsubj1.cf 70_sare_spoof.cf
 521 88_FVGT_headers.cf  local.orig          70_sare_header0.cf   70_sare_uri0.cf
 522 88_FVGT_rawbody.cf  nohits/             70_sare_header1.cf   70_sare_uri1.cf
 523 88_FVGT_subject.cf  spam@               70_sare_html0.cf     70_sare_uri_eng.cf
 524 88_FVGT_uri.cf      v310.pre
 525
 526 20_dec.cf are my own rules, nohits/ sidelines dud rulesets, and spam@ is a
 527 symlink to /usr/share/spamassassin.
 528
 529 ln -s /usr/share/spamassassin    /etc/mail/spamassassin
 530
 531 Spamassassin ignores subdirs, so you can have an archive. The bigger
 532 your throughput, the fewer rules you want to avoid loading the system.
 533 The best ones of the above lot are the sare header, html, uri, drug &
 534 adult. The FVGT rules are very efficient by comparison with some sare
 535 rules.The higher the number, the later it is read, and the more priority
 536 it has. Presuming you sort your bugs, you now have an integrated
 537 sitewide anti-spam setup.
 538
 539         You now need one other item of information. Are your mails being
 540 checked against blacklists (like spamcop, sorbs.net) upstream? To find
 541 out, use 70_sare_sc_top200.cf. View it in one console and cd to your
 542 subdir with the spam mailmoxes (I am presuming they are named spam1,
 543 spam2, etc). The first entry in 70_sc_top200.cf today is
 544
 545 Received =~ /\b12\.(?:210\.176\.205|211\.4\.79|217\.81\.151)\b/
 546
 547 Now you can check for that with pcregrep. You cannot restrict your
 548 search to the Received line too handy, but you can do this
 549
 550 pcregrep '\b12\.(?:210\.176\.205|211\.4\.79|217\.81\.151)\b' spam?
 551
 552 any instances will show. You will notice I removed the /regex/
 553 delimiters and replaced them with 'regex'. Just one other word of
 554 warning: pcregrep appears not to like the /i at the end of most regexes
 555 in the rules. Use pcregrep -i and remove the /i. You can also use -c to
 556 check the number of times. I do not get any instances of the top200
 557 spammers, so I presume the top 200 are not getting through directly to
 558 me. The ruleset is therefore unneccessary for me. I can get hits from
 559 the more obtuse dns blocklists, so not all are being checked.
 560
 561 If you haven't got prce, egrep -e will apply posix rules which are
 562 close, but different. The main weakness is in unusual character types
 563 like \d which do not behave in egrep.
 564
 565
 566 INTEGRATION:
 567
 568 Penultimately, Integration. If your mail is relayed to you, use
 569 procmail. If you are online 24/7 and serious.spammer.co.tw can reach
 570 your box directly, set up a reject configuration in your mail client.
 571 The amavisd-new package includes many configuration options for weird and
 572 wonderful mail clients with a better understanding of them than you
 573 will usually find in the documentation.
 574
 575 Think this course through. Mailing lists will get spam, and will forward
 576 it. If you bounce repeatedly to a mailing list, you will be
 577 unsubscribed, sometimes automatically.
 578
 579 Procmail's recipe looks like this (in ~/.procmailrc)
 580
 581 :0fw
 582 | /usr/bin/spamc
 583 :0
 584 * X-Spam-Level: \*\*\*\*\*
 585 $HOME/Mail/spam
 586
 587 That pipes through spamd (which calls razor & dcc) and dumps it in a
 588 spam mailbox on 5 stars. man procmail or man procmailex help here.
 589 Those exact procmail lines put spam in ~/Mail/spam. Make sure it exists.
 590 If you are content to reject on razor's say so, you cat take the recipe
 591 from 'man razor-check', not load the spamassassin razor2 plugin, and preline
 592 it in procmail. This imposes a memory load (The 'c' in ':0Wc' means 2 message
 593 copies, 2 procmail instances) but avoids spamassassin. I ran for some
 594 months with this setup, it plucked 70% of spam and had one false
 595 positive (From the LFS list :-/.) In this case, reduce your spamd
 596 instances.
 597
 598
 599
 600                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 601
 602 SECTION 4: TUNING:
 603
 604 The standard spamassassin config is very soft, and lets some spam
 605 through. Mine is short on negative rules, and hard on porn particularly.
 606 Even if you don't want to use mine, download it and lint with it once,
 607 as it will show you errors on other places. Your friends are
 608
 609 man Mail::SpamAssassin::Conf
 610 man Mail::SpamAssassin::Plugin::Name  (e.g URIDNSBL)
 611
 612 beware of the latter manpages,as they drift between config options and
 613 rules pretty seamlessly without telling you. Next tune up! As root,
 614
 615 vim /etc/mail/spamassassin/local.cf
 616
 617 Looking at my local.cf, The first things are basic setup. Leave the first
 618 line there unless you are using nfs, in which case it must come out. The
 619 host 216.171.238.83 is linuxfromscratch.org.
 620
 621 PYZOR config options are there, but commented out. I tried it, and found
 622 it very little use. You can run a local server in a large outfit and
 623 allow your users to blacklist dynamically this way. It also runs in
 624 python, which is another interpeter and libs to load. They reccomend
 625 readyexec, which takes care of that some clever way. Suit yourself.
 626 The install is a doddle, but not worth it, imho.
 627
 628 DCC options are clear enough - paths to everything, and much of the
 629 stuff on the dccifd command line. The very last option is for dccproc.
 630 Spamc/spamd use dccifd, the daemon, and if not found, dccproc. Dccproc
 631 is more resource hungry (starting an interpeter every time). If dccifd
 632 is there but not running, you barf.
 633
 634 The -B  option sets a check on spamhaus.org, which returns 127.0.0.2 as a
 635 positive result. Multiple -B options are allowed. It's there really as
 636 an example because the docs are _soo_bad.
 637
 638 RAZOR options are simple. It's neat code.
 639
 640 BAYES options allow learning from ham/spam. Also there are uridnsbl
 641 (blocklist stuff). It you don't need the blocklist, comment these out
 642 and comment out URIDNSBL in /etc/mail/spamassassin init.pre
 643
 644 SPF is Sender Policy Framework. ISPs should have a policy, and the mail
 645 is checked against that. Weak, but it catches the occasional thing.
 646
 647 Next come whitelist from. Include Family, friends, business contacts,
 648 paypal (If you're registered). The bayes_ignore entries should be all
 649 mailing lists, as some get spam, and their spam score will rise
 650 otherwise.
 651
 652 Finally we get rules, listed under groups as one progresses through an
 653 email, and scored. The general policy is to assign a weight to a score,
 654 and arrive for spam at a score of 5 or above, and for other mail, to
 655 keep the score at below 5. To check any rule (This is where the'spam'
 656 symlink comes in handy) cd to /etc/mail/spamassassin and type
 657
 658 grep -r RULE_NAME *
 659
 660 Here's an example
 661 lfs:/etc/mail/spamassassin$grep -r FORGED_RCVD_HELO *
 662
 663 local.cf:score FORGED_RCVD_HELO 1.22
 664 spam/20_head_tests.cf:header FORGED_RCVD_HELO eval:check_for_forged_received_hel
 665 o()
 666 spam/20_head_tests.cf:describe FORGED_RCVD_HELO Received: contains a forged HELO
 667 spam/50_scores.cf:score FORGED_RCVD_HELO 0 0 0 0.135
 668
 669 20_head_tests is an original spamassassin ruleset. spam/50_scores.cf is
 670 the default score  0 until the fourth time when it scores 0.135
 671
 672 The scores relate to successive hits of a rule. It scores basically
 673 nothing, but I have lifted it to 1.22. It is an excellent indicator of
 674 spam or the linuxfromscratch lists where half cocked mail setups abound.
 675 If your mailer gives out a domain that a dns check can't resolve, you're
 676 in trouble here. If you have a legit A and MX record where people would
 677 expect to find them, you're ok. All broadband modems have urls in the
 678 range of the isp, so if your private network goes out, something smells.
 679
 680 Mime and html rules are very good. Mind you , I have trained most people
 681 to send text. If you use html a lot, back some of these off. Some are still
 682 excellent spam indicators, even if you want to allow for half-assed mail
 683 from m$ outlook etc. These ones are always good
 684
 685 HTML_EMBEDS 3           HTML_FONT_BIG 3
 686 HTML_FONT_LOW_CONTRAST  HTML_FONT_INVISIBLE     HTML_IMAGE_ONLY_04
 687 HTML_IMAGE_ONLY_08      HTML_IMAGE_ONLY_12      HTML_IMAGE_RATIO_(all)
 688
 689 The high ratios are also useful. Even outlook sends text as well.
 690 The MIME tests are excellent also.
 691
 692 The default spamassassin is ambivelant to porn (Some want this stuff?) I
 693 don't, so porn words are heavily punished in my config.
 694
 695 Tests that throw false positives are:
 696
 697 FORGED_<SOMEWHERE>_RCVD
 698
 699 anything, Example: when a (top post)reply from hotmail.com comes from
 700 hotmail to a question from yahoo.com and then you get FORGED_YAHOO_RCVD.
 701
 702 These clever tests like backhair trip over linux program versions.
 703 Posted kernel configs are CAPS. Spamsigns are detected in directory
 704 names.  A subject line like VIA GRATIS (The way of thanks in latin) also
 705 has VIAGRA in there. You can't make a rule against 'love' because
 706 'glover' is a surname. Tune accordingly. try this
 707
 708 cat spam1 |formail -n 2 -ds spamc -R >> spam1_reports (presuming ~50 messages)
 709
 710 and repeat for all the others. DO NOT try that on a big mailbox, as
 711 spamc processes detach from formail, and it starts another before you
 712 finish. In 400 emails, I had 200 spamc processes looking for 10 spamd
 713 processes in one test. Then the modem backed up, and I lost all dns
 714 tests. If you don't have spare memory, drop the '-n 2' option and wait.
 715 The '-ds' splits the mailbox and pipes to the following command.
 716
 717 Then try it on your ham, your saved messages, showing fasle positives.
 718 Also,
 719
 720 cat <mailbox> |formail -n 2 -ds spamc -c, which simply outputs a line per
 721 test with the score.
 722
 723 Another option is ' cat yourmail | procmail -d $USER ' and then it pops
 724 into ham or spam boxes appropiately. If you want to retest mail that has
 725 a header, try this line
 726
 727 cat <mailbox> |formail -ds spamassassin -d >> file
 728
 729 Removing the markups. This is not 100% reliable, so this sed
 730
 731 sed -e '/X-Spam/d' -e '/>From/d' < input_file  > output_file
 732
 733 clears the remains. A sure sign that something has tripped over an old
 734 markup  is a NO_RELAYS hit in the retests.
 735
 736 Once you get spamd running and working, the above process is necessary
 737 before repeat checks. Killing dccifd before repeats is also clever. You
 738 can razor-check all you like. Remember to remove the socket if you kill
 739 dccifd. Or restart it with the "Query-only" option.
 740
 741 cat ham2 |formail -ds spamc -R |less gives you the reports and an
 742 extract on successive lines. Open consoles as you need them. On another console,
 743 get any ham marked as spam onscreen and presuming gpm is working, you
 744 can find the problem this way.
 745
 746 Get the rule onscreen   grep -r SOME_RULE_NAME /etc/mail/spamassassin/*
 747 and locate the regex
 748
 749 Set up the test         pcregrep -i 'whatever_regex' yourmail
 750
 751 In the general run of play, you can probably lower my html scores, and
 752 adjust for your own situation. If you are a doctor, you will obviously
 753 have to adjust or whitelist any mail sources that send mail about drugs.
 754
 755 Try to find negative rules that apply to your situation. To add a rule,
 756 Find a similar rule. Don't fiddle with the 'eval do something' type
 757 rules as they are spamassassin builtins. The various header lines are
 758 specified by this sort of thing "Received: = ~  and just check those
 759 lines. Invent your own rules as appropiate. These headers (Received,
 760 From, Subject, etc.) are all in ram as variables when a message is
 761 checked. Invent your own regex, and don't forget to run
 762
 763 spamassassin -D --lint afterwards to check it out. Never mind what the
 764 errors are, (some mistakes redirect) undo what you did last and lint
 765 again.  Man perlre helps. Unrecognized options are a sign of missing
 766 plugins. I, for instance, do not use HashCash or RelayCountry plugins.
 767 If you decide to use them, enter the options off the man page.
 768 "Score set for nonexistent rule" in the lint means you are not using the
 769 same rules as me. Just remove the relevant line from local.cf
 770
 771 Keep your spam for a month at least after you set the system running.
 772 You ideally need reports back of false positives and false negatives. Never
 773 get cocky, as there will be both. Tune up periodically. Spam changes.
 774
 775 My current ratio is
 776         ~ 99% of all spam successfully caught.
 777         ~ 3% of ham marked as spam (Entirely from the lfs lists) . This
 778 is a high figure, but I'm lazy. The real problem is that if the query
 779 goes to spam, the answers do also. I retuned recently, and removed the
 780 Tripwire ruleset so I expect things will be better.
 781
 782 What gets through is mail that mimicks your own mail, and genuinely sent
 783 spam from webmail, short stuff, that doesn't trigger enough to top the
 784 spam score. What gets wrongly caught usually is misinterpeted signs of spam.
 785 Regexes are a non thinking tool. This sort of email
 786
 787 "Do you require a timepiece? http://spamsite.com/"
 788
 789 is brief enough to be difficult to hit. Save off false positives and false
 790 negatives individually, and get them to land correctly by readjusting scores,
 791 linting, and restarting your spamd daemons.
 792
 793 To correct the bayes learning, you can use
 794
 795 sa-learn --ham --mbox <filename>        OR
 796 sa-learn --ham <filename> for a single email
 797 sa-learn --forget does just that, and the database can be rebuilt.
 798 Likewise sa-learn --spam learns the other way. Man sa-learn.
 799
 800
 801 ACKNOWLEDGEMENTS:
 802
 803 Authors of all software, and the regex Maestros of the anti-spam
 804 community.
 805
 806
 807 CHANGELOG:
 808 Nov. 21st 2005 Major Edit of innaccuracies, spellings, self congratulation &
 809 waffle. Tweak config files.
 810
 811 Nov. 15th 2005: Finsihed this 1st draft.