1 /******************************************************************************
2 * $Id: spamdbm.cpp 30630 2009-05-05 01:31:01Z bga $
4 * This is a BeOS program for classifying e-mail messages as spam (unwanted
5 * junk mail) or as genuine mail using a Bayesian statistical approach. There
6 * is also a Mail Daemon Replacement add-on to filter mail using the
7 * classification statistics collected earlier.
9 * See also http://www.paulgraham.com/spam.html for a good writeup and
10 * http://www.tuxedo.org/~esr/bogofilter/ for another implementation.
11 * And more recently, Gary Robinson's write up of his improved algorithm
12 * at http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html
13 * which gives a better spread in spam ratios and slightly fewer
16 * Note that this uses the AGMS vacation coding style, not the OpenTracker one.
17 * That means no tabs, indents are two spaces, m_ is the prefix for member
18 * variables, g_ is the prefix for global names, C style comments, constants
19 * are in all capital letters and most other things are mixed case, it's word
20 * wrapped to fit in 79 characters per line to make proofreading on paper
21 * easier, and functions are listed in reverse dependency order so that forward
22 * declarations (function prototypes with no code) aren't needed.
24 * The Original Design:
25 * There is a spam database (just a file listing words and number of times they
26 * were used in spam and non-spam messages) that a BeMailDaemon input filter
27 * will use when scanning email. It will mark the mail with the spam
28 * probability (an attribute, optionally a mail header field) and optionally do
29 * something if the probability exceeds a user defined level (delete message,
30 * change subject, file in a different folder). Or should that be a different
31 * filter? Outside the mail system, the probability can be used in queries to
34 * A second user application will be used to update the database. Besides
35 * showing you the current list of words, you can drag and drop files to mark
36 * them as spam or non-spam (a balanced binary tree is used internally to make
37 * word storage fast). It will add a second attribute to the files to show how
38 * they have been classified by the user (and won't update the database if you
39 * accidentally try to classify a file again). Besides drag and drop, there
40 * will be a command line interface and a message passing interface. BeMail
41 * (or other programs) will then communicate via messages to tell it when the
42 * user marks a message as spam or not (via having separate delete spam /
43 * delete genuine mail buttons and a menu item or two).
45 * Plus lots of details, like the rename swap method to update the database
46 * file (so programs with the old file open aren't affected). A nice tab text
47 * format so you can open the database in a spreadsheet. Startup and shutdown
48 * control of the updater from BeMail. Automatic creation of the indices
49 * needed by the filter. MIME types for the database file. Icons for the app.
50 * System settings to enable tracker to display the new attributes when viewing
51 * e-mail (and maybe news articles if someone ever gets around to an NNTP as
52 * files reader). Documentation. Recursive directory traversal for the
53 * command line or directory drag and drop. Options for the updater to warn or
54 * ignore non-email files. Etc.
56 * The Actual Implementation:
57 * The spam database updates and the test for spam have been combined into one
58 * program which runs as a server. That way there won't be as long a delay
59 * when the e-mail system wants to check for spam, because the database is
60 * already loaded by the server and in memory. The MDR mail filter add-on
61 * simply sends scripting commands to the server (and starts it up if it isn't
62 * already running). The filter takes care of marking the messages when it
63 * gets the rating back from the server, and then the rest of the mail system
64 * rule chain can delete the message or otherwise manipulate it.
66 * Revision History (now manually updated due to SVN's philosophy)
67 * $Log: spamdbm.cpp,v $
68 * ------------------------------------------------------------------------
69 * r15195 | agmsmith | 2005-11-27 21:07:55 -0500 (Sun, 27 Nov 2005) | 4 lines
70 * Just a few minutes after checking in, I mentioned it to Japanese expert Koki
71 * and he suggested also including the Japanese comma. So before I forget to
74 * ------------------------------------------------------------------------
75 * r15194 | agmsmith | 2005-11-27 20:37:13 -0500 (Sun, 27 Nov 2005) | 5 lines
76 * Truncate overly long URLs to the maximum word length. Convert Japanese
77 * periods to spaces so that more "words" are found. Fix UTF-8 comparison
78 * problems with tolower() incorrectly converting characters with the high bit
81 * r15098 | agmsmith | 2005-11-23 23:17:00 -0500 (Wed, 23 Nov 2005) | 5 lines
82 * Added better tokenization so that HTML is parsed and things like tags
83 * between letters of a word no longer hide that word. After testing, the
84 * result seems to be a tighter spread of ratings when done in full text plus
87 * Revision 1.10 2005/11/24 02:08:39 agmsmith
88 * Fixed up prefix codes, Z for things that are inside other things.
90 * Revision 1.9 2005/11/21 03:28:03 agmsmith
91 * Added a function for extracting URLs.
93 * Revision 1.8 2005/11/09 03:36:18 agmsmith
94 * Removed noframes detection (doesn't show up in e-mails). Now use
95 * just H for headers and Z for HTML tag junk.
97 * Revision 1.7 2005/10/24 00:00:08 agmsmith
98 * Adding HTML tag removal, which also affected the search function so it
99 * could search for single part things like .
101 * Revision 1.6 2005/10/17 01:55:08 agmsmith
102 * Remove HTML comments and a few other similar things.
104 * Revision 1.5 2005/10/16 18:35:36 agmsmith
105 * Under construction - looking into HTML not being in UTF-8.
107 * Revision 1.4 2005/10/11 01:51:21 agmsmith
108 * Starting on the tokenising passes. Still need to test asian truncation.
110 * Revision 1.3 2005/10/06 11:54:07 agmsmith
113 * Revision 1.2 2005/09/12 01:49:37 agmsmith
114 * Enable case folding for the whole file tokenizer.
116 * r13961 | agmsmith | 2005-08-13 22:25:28 -0400 (Sat, 13 Aug 2005) | 2 lines
117 * Source code changes so that mboxtobemail now compiles and is in the build
120 * r13959 | agmsmith | 2005-08-13 22:05:27 -0400 (Sat, 13 Aug 2005) | 2 lines
121 * Rename the directory before doing anything else, otherwise svn dies badly.
123 * r13952 | agmsmith | 2005-08-13 15:31:42 -0400 (Sat, 13 Aug 2005) | 3 lines
124 * Added the resources and file type associations, changed the application
125 * signature and otherwise made the spam detection system work properly again.
127 * r13951 | agmsmith | 2005-08-13 11:40:01 -0400 (Sat, 13 Aug 2005) | 2 lines
128 * Had to do the file rename as a separate operation due to SVN limitations.
130 * r13950 | agmsmith | 2005-08-13 11:38:44 -0400 (Sat, 13 Aug 2005) | 3 lines
131 * Oops, "spamdb" is already used for a Unix package. And spamdatabase is
132 * already reserved by a domain name squatter. Use "spamdbm" instead.
134 * r13949 | agmsmith | 2005-08-13 11:17:52 -0400 (Sat, 13 Aug 2005) | 3 lines
135 * Renamed spamfilter to be the more meaningful spamdb (spam database) and
136 * moved it into its own source directory in preparation for adding resources.
138 * r13628 | agmsmith | 2005-07-10 20:11:29 -0400 (Sun, 10 Jul 2005) | 3 lines
139 * Updated keyword expansion to use SVN keywords. Also seeing if svn is
140 * working well enough for me to update files from BeOS R5.
142 * r11909 | axeld | 2005-03-18 19:09:19 -0500 (Fri, 18 Mar 2005) | 2 lines
143 * Moved bin/ directory out of apps/.
145 * r11769 | bonefish | 2005-03-17 03:30:54 -0500 (Thu, 17 Mar 2005) | 1 line
146 * Move trunk into respective module.
148 * r10362 | nwhitehorn | 2004-12-06 20:14:05 -0500 (Mon, 06 Dec 2004) | 2 lines
149 * Fixed the spam filter so it works correctly now.
151 * r9934 | nwhitehorn | 2004-11-11 21:55:05 -0500 (Thu, 11 Nov 2004) | 2 lines
152 * Added AGMS's excellent spam detection software. Still some weirdness with
153 * the configuration interface from E-mail prefs.
155 * Revision 1.2 2004/12/07 01:14:05 nwhitehorn
156 * Fixed the spam filter so it works correctly now.
158 * Revision 1.87 2004/09/20 15:57:26 nwhitehorn
159 * Mostly updated the tree to Be/Haiku style identifier naming conventions. I
160 * have a few more things to work out, mostly in mail_util.h, and then I'm
161 * proceeding to jamify the build system. Then we go into Haiku CVS.
163 * Revision 1.86 2003/07/26 16:47:46 agmsmith
164 * Bug - wasn't allowing double classification if the user had turned on
165 * the option to ignore the previous classification.
167 * Revision 1.85 2003/07/08 14:52:57 agmsmith
168 * Fix bug with classification choices dialog box coming up with weird
169 * sizes due to RefsReceived message coming in before ReadyToRun had
170 * finished setting up the default sizes of the controls.
172 * Revision 1.84 2003/07/04 19:59:29 agmsmith
173 * Now with a GUI option to let you declassify messages (set them back
174 * to uncertain, rather than spam or genuine). Required a BAlert
175 * replacement since BAlerts can't do four buttons.
177 * Revision 1.83 2003/07/03 20:40:36 agmsmith
178 * Added Uncertain option for declassifying messages.
180 * Revision 1.82 2003/06/16 14:57:13 agmsmith
181 * Detect spam which uses mislabeled text attachments, going by the file name
184 * Revision 1.81 2003/04/08 20:27:04 agmsmith
185 * AGMSBayesianSpamServer now shuts down immediately and returns true if
186 * it is asked to quit by the registrar.
188 * Revision 1.80 2003/04/07 19:20:27 agmsmith
189 * Ooops, int64 doesn't exist, use long long instead.
191 * Revision 1.79 2003/04/07 19:05:22 agmsmith
192 * Now with Allen Brunson's atoll for PPC (you need the %Ld, but that
193 * becomes %lld on other systems).
195 * Revision 1.78 2003/04/04 22:43:53 agmsmith
196 * Fixed up atoll PPC processor hack so it would actually work, was just
197 * returning zero which meant that it wouldn't load in the database file
198 * (read the size as zero).
200 * Revision 1.77 2003/01/22 03:19:48 agmsmith
201 * Don't convert words to lower case, the case is important for spam.
202 * Particularly sentences which start with exciting words, which you
203 * normally won't use at the start of a sentence (and thus capitalize).
205 * Revision 1.76 2002/12/18 02:29:22 agmsmith
206 * Add space for the Uncertain display in Tracker.
208 * Revision 1.75 2002/12/18 01:54:37 agmsmith
209 * Added uncertain sound effect.
211 * Revision 1.74 2002/12/13 23:53:12 agmsmith
212 * Minimize the window before opening it so that it doesn't flash on the
213 * screen in server mode. Also load the database when the window is
214 * displayed so that the user can see the words.
216 * Revision 1.73 2002/12/13 20:55:57 agmsmith
219 * Revision 1.72 2002/12/13 20:26:11 agmsmith
220 * Fixed bug with adding messages in strings to database (was limited to
221 * messages at most 1K long). Also changed default server mode to true
222 * since that's what people use most.
224 * Revision 1.71 2002/12/11 22:37:30 agmsmith
225 * Added commands to train on spam and genuine e-mail messages passed
226 * in string arguments rather then via external files.
228 * Revision 1.70 2002/12/10 22:12:41 agmsmith
229 * Adding a message to the database now uses a BPositionIO rather than a
230 * file and file name (for future string rather than file additions). Also
231 * now re-evaluate a file after reclassifying it so that the user can see
232 * the new ratio. Also remove the [Spam 99.9%] subject prefix when doing
233 * a re-evaluation or classification (the number would be wrong).
235 * Revision 1.69 2002/12/10 01:46:04 agmsmith
236 * Added the Chi-Squared scoring method.
238 * Revision 1.68 2002/11/29 22:08:25 agmsmith
239 * Change default purge age to 2000 so that hitting the purge button
240 * doesn't erase stuff from the new sample database.
242 * Revision 1.67 2002/11/25 20:39:39 agmsmith
243 * Don't need to massage the MIME type since the mail library now does
244 * the lower case conversion and converts TEXT to text/plain too.
246 * Revision 1.66 2002/11/20 22:57:12 nwhitehorn
247 * PPC Compatibility Fixes
249 * Revision 1.65 2002/11/10 18:43:55 agmsmith
250 * Added a time delay to some quitting operations so that scripting commands
251 * from a second client (like a second e-mail account) will make the program
252 * abort the quit operation.
254 * Revision 1.64 2002/11/05 18:05:16 agmsmith
255 * Looked at Nathan's PPC changes (thanks!), modified style a bit.
257 * Revision 1.63 2002/11/04 03:30:22 nwhitehorn
258 * Now works (or compiles at least) on PowerPC. I'll get around to testing it
261 * Revision 1.62 2002/11/04 01:03:33 agmsmith
262 * Fixed warnings so it compiles under the bemaildaemon system.
264 * Revision 1.61 2002/11/03 23:00:37 agmsmith
265 * Added to the bemaildaemon project on SourceForge. Hmmmm, seems to switch to
266 * a new version if I commit and specify a message, but doesn't accept the
267 * message and puts up the text editor. Must be a bug where cvs eats the first
268 * option after "commit".
270 * Revision 1.60.1.1 2002/10/22 14:29:27 agmsmith
271 * Needed to recompile with the original Libmail.so from Beta/1 since
272 * the current library uses a different constructor, and thus wouldn't
273 * run when used with the old library.
275 * Revision 1.60 2002/10/21 16:41:27 agmsmith
276 * Return a special error code when no words are found in a message,
277 * so that messages without text/plain parts can be recognized as
278 * spam by the mail filter.
280 * Revision 1.59 2002/10/20 21:29:47 agmsmith
281 * Watch out for MIME types of "text", treat as text/plain.
283 * Revision 1.58 2002/10/20 18:29:07 agmsmith
284 * *** empty log message ***
286 * Revision 1.57 2002/10/20 18:25:02 agmsmith
287 * Fix case sensitivity in MIME type tests, and fix text/any test.
289 * Revision 1.56 2002/10/19 17:00:10 agmsmith
290 * Added the pop-up menu for the tokenize modes.
292 * Revision 1.55 2002/10/19 14:54:06 agmsmith
293 * Fudge MIME type of body text components so that they get
296 * Revision 1.54 2002/10/19 00:56:37 agmsmith
297 * The parsing of e-mail messages seems to be working now, just need
298 * to add some user interface stuff for the tokenizing mode.
300 * Revision 1.53 2002/10/18 23:37:56 agmsmith
301 * More mail kit usage, can now decode headers, but more to do.
303 * Revision 1.52 2002/10/16 23:52:33 agmsmith
304 * Getting ready to add more tokenizing modes, exploring Mail Kit to break
305 * apart messages into components (and decode BASE64 and other encodings).
307 * Revision 1.51 2002/10/11 20:05:31 agmsmith
308 * Added installation of sound effect names, which the filter will use.
310 * Revision 1.50 2002/10/02 16:50:02 agmsmith
311 * Forgot to add credits to the algorithm inventors.
313 * Revision 1.49 2002/10/01 00:39:29 agmsmith
314 * Added drag and drop to evaluate files or to add them to the list.
316 * Revision 1.48 2002/09/30 19:44:17 agmsmith
317 * Switched to Gary Robinson's method, removed max spam/genuine word.
319 * Revision 1.47 2002/09/23 17:08:55 agmsmith
320 * Add an attribute with the spam ratio to files which have been evaluated.
322 * Revision 1.46 2002/09/23 02:50:32 agmsmith
323 * Fiddling with display width of e-mail attributes.
325 * Revision 1.45 2002/09/23 01:13:56 agmsmith
326 * Oops, bug in string evaluation scripting.
328 * Revision 1.44 2002/09/22 21:00:55 agmsmith
329 * Added EvaluateString so that the BeMail add-on can pass the info without
330 * having to create a temporary file.
332 * Revision 1.43 2002/09/20 19:56:02 agmsmith
333 * Added about box and button for estimating the spam ratio of a file.
335 * Revision 1.42 2002/09/20 01:22:26 agmsmith
336 * More testing, decide that an extreme ratio bias point of 0.5 is good.
338 * Revision 1.41 2002/09/19 21:17:12 agmsmith
339 * Changed a few names and proofread the program.
341 * Revision 1.40 2002/09/19 14:27:17 agmsmith
342 * Rearranged execution of commands, moving them to a separate looper
343 * rather than the BApplication, so that thousands of files could be
344 * processed without worrying about the message queue filling up.
346 * Revision 1.39 2002/09/18 18:47:16 agmsmith
347 * Stop flickering when the view is partially obscured, update cached
348 * values in all situations except when app is busy.
350 * Revision 1.38 2002/09/18 18:08:11 agmsmith
351 * Add a function for evaluating the spam ratio of a message.
353 * Revision 1.37 2002/09/16 01:30:16 agmsmith
354 * Added Get Oldest command.
356 * Revision 1.36 2002/09/16 00:47:52 agmsmith
357 * Change the display to counter-weigh the spam ratio by the number of
360 * Revision 1.35 2002/09/15 20:49:35 agmsmith
361 * Scrolling improved, buttons, keys and mouse wheel added.
363 * Revision 1.34 2002/09/15 03:46:10 agmsmith
364 * Up and down buttons under construction.
366 * Revision 1.33 2002/09/15 02:09:21 agmsmith
367 * Took out scroll bar.
369 * Revision 1.32 2002/09/15 02:05:30 agmsmith
370 * Trying to add a scroll bar, but it isn't very useful.
372 * Revision 1.31 2002/09/14 23:06:28 agmsmith
373 * Now has live updates of the list of words.
375 * Revision 1.30 2002/09/14 19:53:11 agmsmith
376 * Now with a better display of the words.
378 * Revision 1.29 2002/09/13 21:33:54 agmsmith
379 * Now draws the words in the word display view, but still primitive.
381 * Revision 1.28 2002/09/13 19:28:02 agmsmith
382 * Added display of most genuine and most spamiest, fixed up cursor.
384 * Revision 1.27 2002/09/13 03:08:42 agmsmith
385 * Show current word and message counts, and a busy cursor.
387 * Revision 1.26 2002/09/13 00:00:08 agmsmith
388 * Fixed up some deadlock problems, now using asynchronous message replies.
390 * Revision 1.25 2002/09/12 17:56:58 agmsmith
391 * Keep track of words which are spamiest and genuinest.
393 * Revision 1.24 2002/09/12 01:57:10 agmsmith
396 * Revision 1.23 2002/09/11 23:30:45 agmsmith
397 * Added Purge button and ignore classification checkbox.
399 * Revision 1.22 2002/09/11 21:23:13 agmsmith
400 * Added bulk update choice, purge button, moved to a BView container
401 * for all the controls (so background colour could be set, and Pulse
402 * works normally for it too).
404 * Revision 1.21 2002/09/10 22:52:49 agmsmith
405 * You can now change the database name in the GUI.
407 * Revision 1.20 2002/09/09 14:20:42 agmsmith
408 * Now can have multiple backups, and implemented refs received.
410 * Revision 1.19 2002/09/07 19:14:56 agmsmith
411 * Added standard GUI measurement code.
413 * Revision 1.18 2002/09/06 21:03:03 agmsmith
414 * Rearranging code to avoid forward references when adding a window class.
416 * Revision 1.17 2002/09/06 02:54:00 agmsmith
417 * Added the ability to purge old words from the database.
419 * Revision 1.16 2002/09/05 00:46:03 agmsmith
420 * Now adds spam to the database!
422 * Revision 1.15 2002/09/04 20:32:15 agmsmith
423 * Read ahead a couple of letters to decode quoted-printable better.
425 * Revision 1.14 2002/09/04 03:10:03 agmsmith
426 * Can now tokenize (break into words) a text file.
428 * Revision 1.13 2002/09/03 21:50:54 agmsmith
429 * Count database command, set up MIME type for the database file.
431 * Revision 1.12 2002/09/03 19:55:54 agmsmith
432 * Added loading and saving the database.
434 * Revision 1.11 2002/09/02 03:35:33 agmsmith
435 * Create indices and set up attribute associations with the e-mail MIME type.
437 * Revision 1.10 2002/09/01 15:52:49 agmsmith
438 * Can now delete the database.
440 * Revision 1.9 2002/08/31 21:55:32 agmsmith
441 * Yet more scripting.
443 * Revision 1.8 2002/08/31 21:41:37 agmsmith
444 * Under construction, with example code to decode a B_REPLY.
446 * Revision 1.7 2002/08/30 19:29:06 agmsmith
447 * Combined loading and saving settings into one function.
449 * Revision 1.6 2002/08/30 02:01:10 agmsmith
450 * Working on loading and saving settings.
452 * Revision 1.5 2002/08/29 23:17:42 agmsmith
455 * Revision 1.4 2002/08/28 00:40:52 agmsmith
456 * Scripting now seems to work, at least the messages flow properly.
458 * Revision 1.3 2002/08/25 21:51:44 agmsmith
459 * Getting the about text formatting right.
461 * Revision 1.2 2002/08/25 21:28:20 agmsmith
462 * Trying out the BeOS scripting system as a way of implementing the program.
464 * Revision 1.1 2002/08/24 02:27:51 agmsmith
468 /* Standard C Library. */
475 /* Standard C++ library. */
479 /* STL (Standard Template Library) headers. */
489 /* BeOS (Be Operating System) headers. */
492 #include <Application.h>
495 #include <CheckBox.h>
497 #include <Directory.h>
500 #include <FilePanel.h>
501 #include <FindDirectory.h>
502 #include <fs_index.h>
505 #include <MenuItem.h>
507 #include <MessageQueue.h>
508 #include <MessageRunner.h>
510 #include <NodeInfo.h>
513 #include <PictureButton.h>
516 #include <PopUpMenu.h>
517 #include <PropertyInfo.h>
518 #include <RadioButton.h>
519 #include <Resources.h>
521 #include <ScrollBar.h>
523 #include <StringView.h>
524 #include <TextControl.h>
527 /* Included from the Mail Daemon Replacement project (MDR) include/public
528 directory, available from http://sourceforge.net/projects/bemaildaemon/ */
530 #include <MailMessage.h>
531 #include <MailAttachment.h>
534 /******************************************************************************
535 * Global variables, and not-so-variable things too. Grouped by functionality.
538 static float g_MarginBetweenControls
; /* Space of a letter "M" between them. */
539 static float g_LineOfTextHeight
; /* Height of text the current font. */
540 static float g_StringViewHeight
; /* Height of a string view text box. */
541 static float g_ButtonHeight
; /* How many pixels tall buttons are. */
542 static float g_CheckBoxHeight
; /* Same for check boxes. */
543 static float g_RadioButtonHeight
; /* Also for radio buttons. */
544 static float g_PopUpMenuHeight
; /* Again for pop-up menus. */
545 static float g_TextBoxHeight
; /* Ditto for editable text controls. */
547 static const char *g_ABSAppSignature
=
548 "application/x-vnd.agmsmith.spamdbm";
550 static const char *g_ABSDatabaseFileMIMEType
=
551 "text/x-vnd.agmsmith.spam_probability_database";
553 static const char *g_DefaultDatabaseFileName
=
556 static const char *g_DatabaseRecognitionString
=
557 "Spam Database File";
559 static const char *g_AttributeNameClassification
= "MAIL:classification";
560 static const char *g_AttributeNameSpamRatio
= "MAIL:ratio_spam";
561 static const char *g_BeepGenuine
= "SpamFilter-Genuine";
562 static const char *g_BeepSpam
= "SpamFilter-Spam";
563 static const char *g_BeepUncertain
= "SpamFilter-Uncertain";
564 static const char *g_ClassifiedSpam
= "Spam";
565 static const char *g_ClassifiedGenuine
= "Genuine";
566 static const char *g_DataName
= "data";
567 static const char *g_ResultName
= "result";
569 static const char *g_SettingsDirectoryName
= "Mail";
570 static const char *g_SettingsFileName
= "SpamDBM Settings";
571 static const uint32 g_SettingsWhatCode
= 'SDBM';
572 static const char *g_BackupSuffix
= ".backup %d";
573 static const int g_MaxBackups
= 10; /* Numbered from 0 to g_MaxBackups - 1. */
574 static const size_t g_MaxWordLength
= 50; /* Words longer than this aren't. */
575 static const int g_MaxInterestingWords
= 150; /* Top N words are examined. */
576 static const double g_RobinsonS
= 0.45; /* Default weight for no data. */
577 static const double g_RobinsonX
= 0.5; /* Halfway point for no data. */
579 static bool g_CommandLineMode
;
580 /* TRUE if the program was started from the command line (and thus should
581 exit after processing the command), FALSE if it is running with a graphical
584 static bool g_ServerMode
;
585 /* When TRUE the program runs in server mode - error messages don't result in
586 pop-up dialog boxes, but you can still see them in stderr. Also the window
587 is minimized, if it exists. */
589 static int g_QuitCountdown
= -1;
590 /* Set to the number of pulse timing events (about one every half second) to
591 count down before the program quits. Negative means stop counting. Zero
592 means quit at the next pulse event. This is used to keep the program alive
593 for a short while after someone requests that it quit, in case more scripting
594 commands come in, which will stop the countdown. Needed to handle the case
595 where there are multiple e-mail accounts all requesting spam identification,
596 and one finishes first and tells the server to quit. It also checks to see
597 that there is no more work to do before trying to quit. */
599 static volatile bool g_AppReadyToRunCompleted
= false;
600 /* The BApplication starts processing messages before ReadyToRun finishes,
601 which can lead to initialisation problems (button heights not determined).
602 So wait for this to turn TRUE in code that might run early, like
605 static class CommanderLooper
*g_CommanderLooperPntr
= NULL
;
606 static BMessenger
*g_CommanderMessenger
= NULL
;
607 /* Some globals for use with the looper which processes external commands
608 (arguments received, file references received), needed for avoiding deadlocks
609 which would happen if the BApplication sent a scripting message to itself. */
611 static BCursor
*g_BusyCursor
= NULL
;
612 /* The busy cursor, will be loaded from the resource file during application
615 typedef enum PropertyNumbersEnum
617 PN_DATABASE_FILE
= 0,
623 PN_IGNORE_PREVIOUS_CLASSIFICATION
,
632 PN_RESET_TO_DEFAULTS
,
639 static const char * g_PropertyNames
[PN_MAX
] =
647 "IgnorePreviousClassification",
662 /* This array lists the scripting commands we can handle, in a format that the
663 scripting system can understand too. */
665 static struct property_info g_ScriptingPropertyList
[] =
667 /* *name; commands[10]; specifiers[10]; *usage; extra_data; ... */
668 {g_PropertyNames
[PN_DATABASE_FILE
], {B_GET_PROPERTY
, 0},
669 {B_DIRECT_SPECIFIER
, 0}, "Get the pathname of the current database file. "
670 "The default name is something like B_USER_SETTINGS_DIRECTORY / "
671 "Mail / SpamDBM Database", PN_DATABASE_FILE
,
673 {g_PropertyNames
[PN_DATABASE_FILE
], {B_SET_PROPERTY
, 0},
674 {B_DIRECT_SPECIFIER
, 0}, "Change the pathname of the database file to "
675 "use. It will automatically be converted to an absolute path name, "
676 "so make sure the parent directories exist before setting it. If it "
677 "doesn't exist, you'll have to use the create command next.",
678 PN_DATABASE_FILE
, {}, {}, {}},
679 {g_PropertyNames
[PN_DATABASE_FILE
], {B_CREATE_PROPERTY
, 0},
680 {B_DIRECT_SPECIFIER
, 0}, "Creates a new empty database, will replace "
681 "the existing database file too.", PN_DATABASE_FILE
, {}, {}, {}},
682 {g_PropertyNames
[PN_DATABASE_FILE
], {B_DELETE_PROPERTY
, 0},
683 {B_DIRECT_SPECIFIER
, 0}, "Deletes the database file and all backup copies "
684 "of that file too. Really only of use for uninstallers.",
685 PN_DATABASE_FILE
, {}, {}, {}},
686 {g_PropertyNames
[PN_DATABASE_FILE
], {B_COUNT_PROPERTIES
, 0},
687 {B_DIRECT_SPECIFIER
, 0}, "Returns the number of words in the database.",
688 PN_DATABASE_FILE
, {}, {}, {}},
689 {g_PropertyNames
[PN_SPAM
], {B_SET_PROPERTY
, 0}, {B_DIRECT_SPECIFIER
, 0},
690 "Adds the spam in the given file (specify full pathname to be safe) to "
691 "the database. The words in the files will be added to the list of words "
692 "in the database that identify spam messages. The files processed will "
693 "also have the attribute MAIL:classification added with a value of "
694 "\"Spam\" or \"Genuine\" as specified. They also have their spam ratio "
695 "attribute updated, as if you had also used the Evaluate command on "
696 "them. If they already have the MAIL:classification "
697 "attribute and it matches the new classification then they won't get "
698 "processed (and if it is different, they will get removed from the "
699 "statistics for the old class and added to the statistics for the new "
700 "one). You can turn off that behaviour with the "
701 "IgnorePreviousClassification property. The command line version lets "
702 "you specify more than one pathname.", PN_SPAM
, {}, {}, {}},
703 {g_PropertyNames
[PN_SPAM
], {B_COUNT_PROPERTIES
, 0}, {B_DIRECT_SPECIFIER
, 0},
704 "Returns the number of spam messages in the database.", PN_SPAM
,
706 {g_PropertyNames
[PN_SPAM_STRING
], {B_SET_PROPERTY
, 0},
707 {B_DIRECT_SPECIFIER
, 0}, "Adds the spam in the given string (assumed to "
708 "be the text of a whole e-mail message, not just a file name) to the "
709 "database.", PN_SPAM_STRING
, {}, {}, {}},
710 {g_PropertyNames
[PN_GENUINE
], {B_SET_PROPERTY
, 0}, {B_DIRECT_SPECIFIER
, 0},
711 "Similar to adding spam except that the message file is added to the "
712 "genuine statistics.", PN_GENUINE
, {}, {}, {}},
713 {g_PropertyNames
[PN_GENUINE
], {B_COUNT_PROPERTIES
, 0},
714 {B_DIRECT_SPECIFIER
, 0}, "Returns the number of genuine messages in the "
715 "database.", PN_GENUINE
, {}, {}, {}},
716 {g_PropertyNames
[PN_GENUINE_STRING
], {B_SET_PROPERTY
, 0},
717 {B_DIRECT_SPECIFIER
, 0}, "Adds the genuine message in the given string "
718 "(assumed to be the text of a whole e-mail message, not just a file name) "
719 "to the database.", PN_GENUINE_STRING
, {}, {}, {}},
720 {g_PropertyNames
[PN_UNCERTAIN
], {B_SET_PROPERTY
, 0}, {B_DIRECT_SPECIFIER
, 0},
721 "Similar to adding spam except that the message file is removed from the "
722 "database, undoing the previous classification. Obviously, it needs to "
723 "have been classified previously (using the file attributes) so it can "
724 "tell if it is removing spam or genuine words.", PN_UNCERTAIN
, {}, {}, {}},
725 {g_PropertyNames
[PN_IGNORE_PREVIOUS_CLASSIFICATION
], {B_SET_PROPERTY
, 0},
726 {B_DIRECT_SPECIFIER
, 0}, "If set to true then the previous classification "
727 "(which was saved as an attribute of the e-mail message file) will be "
728 "ignored, so that you can add the message to the database again. If set "
729 "to false (the normal case), the attribute will be examined, and if the "
730 "message has already been classified as what you claim it is, nothing "
731 "will be done. If it was misclassified, then the message will be removed "
732 "from the statistics for the old class and added to the stats for the "
733 "new classification you have requested.",
734 PN_IGNORE_PREVIOUS_CLASSIFICATION
, {}, {}, {}},
735 {g_PropertyNames
[PN_IGNORE_PREVIOUS_CLASSIFICATION
], {B_GET_PROPERTY
, 0},
736 {B_DIRECT_SPECIFIER
, 0}, "Find out the current setting of the flag for "
737 "ignoring the previously recorded classification.",
738 PN_IGNORE_PREVIOUS_CLASSIFICATION
, {}, {}, {}},
739 {g_PropertyNames
[PN_SERVER_MODE
], {B_SET_PROPERTY
, 0},
740 {B_DIRECT_SPECIFIER
, 0}, "If set to true then error messages get printed "
741 "to the standard error stream rather than showing up in an alert box. "
742 "It also starts up with the window minimized.", PN_SERVER_MODE
,
744 {g_PropertyNames
[PN_SERVER_MODE
], {B_GET_PROPERTY
, 0},
745 {B_DIRECT_SPECIFIER
, 0}, "Find out the setting of the server mode flag.",
746 PN_SERVER_MODE
, {}, {}, {}},
747 {g_PropertyNames
[PN_FLUSH
], {B_EXECUTE_PROPERTY
, 0},
748 {B_DIRECT_SPECIFIER
, 0}, "Writes out the database file to disk, if it has "
749 "been updated in memory but hasn't been saved to disk. It will "
750 "automatically get written when the program exits, so this command is "
751 "mostly useful for server mode.", PN_FLUSH
, {}, {}, {}},
752 {g_PropertyNames
[PN_PURGE_AGE
], {B_SET_PROPERTY
, 0},
753 {B_DIRECT_SPECIFIER
, 0}, "Sets the old age limit. Words which haven't "
754 "been updated since this many message additions to the database may be "
755 "deleted when you do a purge. A good value is 1000, meaning that if a "
756 "word hasn't appeared in the last 1000 spam/genuine messages, it will "
757 "be forgotten. Zero will purge all words, 1 will purge words not in "
758 "the last message added to the database, 2 will purge words not in the "
759 "last two messages added, and so on. This is mostly useful for "
760 "removing those one time words which are often hunks of binary garbage, "
761 "not real words. This acts in combination with the popularity limit; "
762 "both conditions have to be valid before the word gets deleted.",
763 PN_PURGE_AGE
, {}, {}, {}},
764 {g_PropertyNames
[PN_PURGE_AGE
], {B_GET_PROPERTY
, 0},
765 {B_DIRECT_SPECIFIER
, 0}, "Gets the old age limit.", PN_PURGE_AGE
,
767 {g_PropertyNames
[PN_PURGE_POPULARITY
], {B_SET_PROPERTY
, 0},
768 {B_DIRECT_SPECIFIER
, 0}, "Sets the popularity limit. Words which aren't "
769 "this popular may be deleted when you do a purge. A good value is 5, "
770 "which means that the word is safe from purging if it has been seen in 6 "
771 "or more e-mail messages. If it's only in 5 or less, then it may get "
772 "purged. The extreme is zero, where only words that haven't been seen "
773 "in any message are deleted (usually means no words). This acts in "
774 "combination with the old age limit; both conditions have to be valid "
775 "before the word gets deleted.", PN_PURGE_POPULARITY
, {}, {}, {}},
776 {g_PropertyNames
[PN_PURGE_POPULARITY
], {B_GET_PROPERTY
, 0},
777 {B_DIRECT_SPECIFIER
, 0}, "Gets the purge popularity limit.",
778 PN_PURGE_POPULARITY
, {}, {}, {}},
779 {g_PropertyNames
[PN_PURGE
], {B_EXECUTE_PROPERTY
, 0},
780 {B_DIRECT_SPECIFIER
, 0}, "Purges the old obsolete words from the "
781 "database, if they are old enough according to the age limit and also "
782 "unpopular enough according to the popularity limit.", PN_PURGE
,
784 {g_PropertyNames
[PN_OLDEST
], {B_GET_PROPERTY
, 0},
785 {B_DIRECT_SPECIFIER
, 0}, "Gets the age of the oldest message in the "
786 "database. It's relative to the beginning of time, so you need to do "
787 "(total messages - age - 1) to see how many messages ago it was added.",
788 PN_OLDEST
, {}, {}, {}},
789 {g_PropertyNames
[PN_EVALUATE
], {B_SET_PROPERTY
, 0},
790 {B_DIRECT_SPECIFIER
, 0}, "Evaluates a given file (by path name) to see "
791 "if it is spam or not. Returns the ratio of spam probability vs genuine "
792 "probability, 0.0 meaning completely genuine, 1.0 for completely spam. "
793 "Normally you should safely be able to consider it as spam if it is over "
794 "0.56 for the Robinson scoring method. For the ChiSquared method, the "
795 "numbers are near 0 for genuine, near 1 for spam, and anywhere in the "
796 "middle means it can't decide. The program attaches a MAIL:ratio_spam "
797 "attribute with the ratio as its "
798 "float32 value to the file. Also returns the top few interesting words "
799 "in \"words\" and the associated per-word probability ratios in "
800 "\"ratios\".", PN_EVALUATE
, {}, {}, {}},
801 {g_PropertyNames
[PN_EVALUATE_STRING
], {B_SET_PROPERTY
, 0},
802 {B_DIRECT_SPECIFIER
, 0}, "Like Evaluate, but rather than a file name, "
803 "the string argument contains the entire text of the message to be "
804 "evaluated.", PN_EVALUATE_STRING
, {}, {}, {}},
805 {g_PropertyNames
[PN_RESET_TO_DEFAULTS
], {B_EXECUTE_PROPERTY
, 0},
806 {B_DIRECT_SPECIFIER
, 0}, "Resets all the configuration options to the "
807 "default values, including the database name.", PN_RESET_TO_DEFAULTS
,
809 {g_PropertyNames
[PN_INSTALL_THINGS
], {B_EXECUTE_PROPERTY
, 0},
810 {B_DIRECT_SPECIFIER
, 0}, "Creates indices for the MAIL:classification and "
811 "MAIL:ratio_spam attributes on all volumes which support BeOS queries, "
812 "identifies them to the system as e-mail related attributes (modifies "
813 "the text/x-email MIME type), and sets up the new MIME type "
814 "(text/x-vnd.agmsmith.spam_probability_database) for the database file. "
815 "Also registers names for the sound effects used by the separate filter "
816 "program (use the installsound BeOS program or the Sounds preferences "
817 "program to associate sound files with the names).", PN_INSTALL_THINGS
,
819 {g_PropertyNames
[PN_TOKENIZE_MODE
], {B_SET_PROPERTY
, 0},
820 {B_DIRECT_SPECIFIER
, 0}, "Sets the method used for breaking up the "
821 "message into words. Use \"Whole\" for the whole file (also use it for "
822 "non-email files). The file isn't broken into parts; the whole thing is "
823 "converted into words, headers and attachments are just more raw data. "
824 "Well, not quite raw data since it converts quoted-printable codes "
825 "(equals sign followed by hex digits or end of line) to the equivalent "
826 "single characters. \"PlainText\" breaks the file into MIME components "
827 "and only looks at the ones which are of MIME type text/plain. "
828 "\"AnyText\" will look for words in all text/* things, including "
829 "text/html attachments. \"AllParts\" will decode all message components "
830 "and look for words in them, including binary attachments. "
831 "\"JustHeader\" will only look for words in the message header. "
832 "\"AllPartsAndHeader\", \"PlainTextAndHeader\" and \"AnyTextAndHeader\" "
833 "will also include the words from the message headers.", PN_TOKENIZE_MODE
,
835 {g_PropertyNames
[PN_TOKENIZE_MODE
], {B_GET_PROPERTY
, 0},
836 {B_DIRECT_SPECIFIER
, 0}, "Gets the method used for breaking up the "
837 "message into words.", PN_TOKENIZE_MODE
, {}, {}, {}},
838 {g_PropertyNames
[PN_SCORING_MODE
], {B_SET_PROPERTY
, 0},
839 {B_DIRECT_SPECIFIER
, 0}, "Sets the method used for combining the "
840 "probabilities of individual words into an overall score. "
841 "\"Robinson\" mode will use Gary Robinson's nth root of the product "
842 "method. It gives a nice range of values between 0 and 1 so you can "
843 "see shades of spaminess. The cutoff point between spam and genuine "
844 "varies depending on your database of words (0.56 was one point in "
845 "some experiments). \"ChiSquared\" mode will use chi-squared "
846 "statistics to evaluate the difference in probabilities that the lists "
847 "of word ratios are random. The result is very close to 0 for genuine "
848 "and very close to 1 for spam, and near the middle if it is uncertain.",
849 PN_SCORING_MODE
, {}, {}, {}},
850 {g_PropertyNames
[PN_SCORING_MODE
], {B_GET_PROPERTY
, 0},
851 {B_DIRECT_SPECIFIER
, 0}, "Gets the method used for combining the "
852 "individual word ratios into an overall score.", PN_SCORING_MODE
,
854 {0, {0}, {0}, 0, 0, {}, {}, {}} /* End of list of property commands. */
858 /* The various scoring modes as text and enums. See PN_SCORING_MODE. */
860 typedef enum ScoringModeEnum
867 static const char * g_ScoringModeNames
[SM_MAX
] =
874 /* The various tokenizing modes as text and enums. See PN_TOKENIZE_MODE. */
876 typedef enum TokenizeModeEnum
880 TM_PLAIN_TEXT_HEADER
,
889 static const char * g_TokenizeModeNames
[TM_MAX
] =
893 "Plain text and header",
895 "Any text and header",
897 "All parts and header",
902 /* Possible message classifications. */
904 typedef enum ClassificationTypesEnum
910 } ClassificationTypes
;
912 static const char * g_ClassificationTypeNames
[CL_MAX
] =
920 /* Some polygon graphics for the scroll arrows. */
922 static BPoint g_UpLinePoints
[] =
925 BPoint (14, 2 * (6)),
926 BPoint (10, 2 * (6)),
927 BPoint (10, 2 * (13)),
928 BPoint (6, 2 * (13)),
933 static BPoint g_DownLinePoints
[] =
935 BPoint (8, 2 * (14-1)),
936 BPoint (14, 2 * (14-6)),
937 BPoint (10, 2 * (14-6)),
938 BPoint (10, 2 * (14-13)),
939 BPoint (6, 2 * (14-13)),
940 BPoint (6, 2 * (14-6)),
941 BPoint (2, 2 * (14-6))
944 static BPoint g_UpPagePoints
[] =
947 BPoint (13, 2 * (6)),
948 BPoint (10, 2 * (6)),
949 BPoint (14, 2 * (10)),
950 BPoint (10, 2 * (10)),
951 BPoint (10, 2 * (13)),
952 BPoint (6, 2 * (13)),
953 BPoint (6, 2 * (10)),
954 BPoint (2, 2 * (10)),
959 static BPoint g_DownPagePoints
[] =
961 BPoint (8, 2 * (14-1)),
962 BPoint (13, 2 * (14-6)),
963 BPoint (10, 2 * (14-6)),
964 BPoint (14, 2 * (14-10)),
965 BPoint (10, 2 * (14-10)),
966 BPoint (10, 2 * (14-13)),
967 BPoint (6, 2 * (14-13)),
968 BPoint (6, 2 * (14-10)),
969 BPoint (2, 2 * (14-10)),
970 BPoint (6, 2 * (14-6)),
971 BPoint (3, 2 * (14-6))
975 /* An array of flags to identify characters which are considered to be spaces.
976 If character code X has g_SpaceCharacters[X] set to true then it is a
977 space-like character. Character codes 128 and above are always non-space since
978 they are UTF-8 characters. Initialised in the ABSApp constructor. */
980 static bool g_SpaceCharacters
[128];
984 /******************************************************************************
985 * Each word in the spam database gets one of these structures. The database
986 * has a string (the word) as the key and this structure as the value
987 * (statistics for that word).
990 typedef struct StatisticsStruct
993 /* Sequence number for the time when this word was last updated in the
994 database, so that we can remove old words (haven't been seen in recent
995 spam). It's zero for the first file ever added (spam or genuine) to the
996 database, 1 for all words added or updated by the second file, etc. If a
997 later file updates an existing word, it gets the age of the later file. */
1000 /* Number of genuine messages that have this word. */
1003 /* A count of the number of spam e-mail messages which contain the word. */
1005 } StatisticsRecord
, *StatisticsPointer
;
1007 typedef map
<string
, StatisticsRecord
> StatisticsMap
;
1008 /* Define this type which will be used for our main data storage facility, so
1009 we can more conveniently specify things that are derived from it, like
1014 /******************************************************************************
1015 * An alert box asking how the user wants to mark messages. There are buttons
1016 * for each classification category, and a checkbox to mark all remaining N
1017 * messages the same way. And a cancel button. To use it, first create the
1018 * ClassificationChoicesWindow, specifying the input arguments. Then call the
1019 * Go method which will show the window, stuff the user's answer into your
1020 * output arguments (class set to CL_MAX if the user cancels), and destroy the
1021 * window. Implemented because BAlert only allows 3 buttons, max!
1024 class ClassificationChoicesWindow
: public BWindow
1027 /* Constructor and destructor. */
1028 ClassificationChoicesWindow (BRect FrameRect
,
1029 const char *FileName
, int NumberOfFiles
);
1031 /* BeOS virtual functions. */
1032 virtual void MessageReceived (BMessage
*MessagePntr
);
1035 void Go (bool *BulkModeSelectedPntr
,
1036 ClassificationTypes
*ChoosenClassificationPntr
);
1038 /* Various message codes for various buttons etc. */
1039 static const uint32 MSG_CLASS_BUTTONS
= 'ClB0';
1040 static const uint32 MSG_CANCEL_BUTTON
= 'Cncl';
1041 static const uint32 MSG_BULK_CHECKBOX
= 'BlkK';
1044 /* Member variables. */
1045 bool *m_BulkModeSelectedPntr
;
1046 ClassificationTypes
*m_ChoosenClassificationPntr
;
1049 class ClassificationChoicesView
: public BView
1052 /* Constructor and destructor. */
1053 ClassificationChoicesView (BRect FrameRect
,
1054 const char *FileName
, int NumberOfFiles
);
1056 /* BeOS virtual functions. */
1057 virtual void AttachedToWindow ();
1058 virtual void GetPreferredSize (float *width
, float *height
);
1061 /* Member variables. */
1062 const char *m_FileName
;
1063 int m_NumberOfFiles
;
1064 float m_PreferredBottomY
;
1069 /******************************************************************************
1070 * Due to deadlock problems with the BApplication posting scripting messages to
1071 * itself, we need to add a second Looper. Its job is to just to convert
1072 * command line arguments and arguments from the Tracker (refs received) into a
1073 * series of scripting commands sent to the main BApplication. It also prints
1074 * out the replies received (to stdout for command line replies). An instance
1075 * of this class will be created and run by the main() function, and shut down
1079 class CommanderLooper
: public BLooper
1083 ~CommanderLooper ();
1084 virtual void MessageReceived (BMessage
*MessagePntr
);
1086 void CommandArguments (int argc
, char **argv
);
1087 void CommandReferences (BMessage
*MessagePntr
,
1088 bool BulkMode
= false,
1089 ClassificationTypes BulkClassification
= CL_GENUINE
);
1093 void ProcessArgs (BMessage
*MessagePntr
);
1094 void ProcessRefs (BMessage
*MessagePntr
);
1096 static const uint32 MSG_COMMAND_ARGUMENTS
= 'CArg';
1097 static const uint32 MSG_COMMAND_FILE_REFS
= 'CRef';
1104 /******************************************************************************
1105 * This view contains the various buttons and other controls for setting
1106 * configuration options and displaying the state of the database (but not the
1107 * actual list of words). It will appear in the top half of the
1111 class ControlsView
: public BView
1114 /* Constructor and destructor. */
1115 ControlsView (BRect NewBounds
);
1118 /* BeOS virtual functions. */
1119 virtual void AttachedToWindow ();
1120 virtual void FrameResized (float Width
, float Height
);
1121 virtual void MessageReceived (BMessage
*MessagePntr
);
1122 virtual void Pulse ();
1125 /* Various message codes for various buttons etc. */
1126 static const uint32 MSG_BROWSE_BUTTON
= 'Brws';
1127 static const uint32 MSG_DATABASE_NAME
= 'DbNm';
1128 static const uint32 MSG_ESTIMATE_BUTTON
= 'Estm';
1129 static const uint32 MSG_ESTIMATE_FILE_REFS
= 'ERef';
1130 static const uint32 MSG_IGNORE_CLASSIFICATION
= 'IPCl';
1131 static const uint32 MSG_PURGE_AGE
= 'PuAg';
1132 static const uint32 MSG_PURGE_BUTTON
= 'Purg';
1133 static const uint32 MSG_PURGE_POPULARITY
= 'PuPo';
1134 static const uint32 MSG_SERVER_MODE
= 'SrvM';
1136 /* Our member functions. */
1137 void BrowseForDatabaseFile ();
1138 void BrowseForFileToEstimate ();
1139 void PollServerForChanges ();
1141 /* Member variables. */
1142 BButton
*m_AboutButtonPntr
;
1143 BButton
*m_AddExampleButtonPntr
;
1144 BButton
*m_BrowseButtonPntr
;
1145 BFilePanel
*m_BrowseFilePanelPntr
;
1146 BButton
*m_CreateDatabaseButtonPntr
;
1147 char m_DatabaseFileNameCachedValue
[PATH_MAX
];
1148 BTextControl
*m_DatabaseFileNameTextboxPntr
;
1149 bool m_DatabaseLoadDone
;
1150 BButton
*m_EstimateSpamButtonPntr
;
1151 BFilePanel
*m_EstimateSpamFilePanelPntr
;
1152 uint32 m_GenuineCountCachedValue
;
1153 BTextControl
*m_GenuineCountTextboxPntr
;
1154 bool m_IgnorePreviousClassCachedValue
;
1155 BCheckBox
*m_IgnorePreviousClassCheckboxPntr
;
1156 BButton
*m_InstallThingsButtonPntr
;
1157 uint32 m_PurgeAgeCachedValue
;
1158 BTextControl
*m_PurgeAgeTextboxPntr
;
1159 BButton
*m_PurgeButtonPntr
;
1160 uint32 m_PurgePopularityCachedValue
;
1161 BTextControl
*m_PurgePopularityTextboxPntr
;
1162 BButton
*m_ResetToDefaultsButtonPntr
;
1163 ScoringModes m_ScoringModeCachedValue
;
1164 BMenuBar
*m_ScoringModeMenuBarPntr
;
1165 BPopUpMenu
*m_ScoringModePopUpMenuPntr
;
1166 bool m_ServerModeCachedValue
;
1167 BCheckBox
*m_ServerModeCheckboxPntr
;
1168 uint32 m_SpamCountCachedValue
;
1169 BTextControl
*m_SpamCountTextboxPntr
;
1170 bigtime_t m_TimeOfLastPoll
;
1171 TokenizeModes m_TokenizeModeCachedValue
;
1172 BMenuBar
*m_TokenizeModeMenuBarPntr
;
1173 BPopUpMenu
*m_TokenizeModePopUpMenuPntr
;
1174 uint32 m_WordCountCachedValue
;
1175 BTextControl
*m_WordCountTextboxPntr
;
1179 /* Various message codes for various buttons etc. */
1180 static const uint32 MSG_LINE_DOWN
= 'LnDn';
1181 static const uint32 MSG_LINE_UP
= 'LnUp';
1182 static const uint32 MSG_PAGE_DOWN
= 'PgDn';
1183 static const uint32 MSG_PAGE_UP
= 'PgUp';
1185 /******************************************************************************
1186 * This view contains the list of words. It displays as many as can fit in the
1187 * view rectangle, starting at a specified word (so it can simulate scrolling).
1188 * Usually it will appear in the bottom half of the DatabaseWindow.
1191 class WordsView
: public BView
1194 /* Constructor and destructor. */
1195 WordsView (BRect NewBounds
);
1197 /* BeOS virtual functions. */
1198 virtual void AttachedToWindow ();
1199 virtual void Draw (BRect UpdateRect
);
1200 virtual void KeyDown (const char *BufferPntr
, int32 NumBytes
);
1201 virtual void MakeFocus (bool Focused
);
1202 virtual void MessageReceived (BMessage
*MessagePntr
);
1203 virtual void MouseDown (BPoint point
);
1204 virtual void Pulse ();
1207 /* Our member functions. */
1208 void MoveTextUpOrDown (uint32 MovementType
);
1209 void RefsDroppedHere (BMessage
*MessagePntr
);
1211 /* Member variables. */
1212 BPictureButton
*m_ArrowLineDownPntr
;
1213 BPictureButton
*m_ArrowLineUpPntr
;
1214 BPictureButton
*m_ArrowPageDownPntr
;
1215 BPictureButton
*m_ArrowPageUpPntr
;
1216 /* Various buttons for controlling scrolling, since we can't use a scroll
1217 bar. To make them less obvious, their background view colour needs to be
1218 changed whenever the main view's colour changes. */
1220 float m_AscentHeight
;
1221 /* The ascent height for the font used to draw words. Height from the top
1222 of the highest letter to the base line (which is near the middle bottom of
1223 the letters, the line where you would align your writing of the text by
1224 hand, all letters have part above, some also have descenders below this
1227 rgb_color m_BackgroundColour
;
1228 /* The current background colour. Changes when the focus changes. */
1230 uint32 m_CachedTotalGenuineMessages
;
1231 uint32 m_CachedTotalSpamMessages
;
1232 uint32 m_CachedWordCount
;
1233 /* These are cached copies of the similar values in the BApplication. They
1234 reflect what's currently displayed. If they are different than the values
1235 from the BApplication then the polling loop will try to redraw the display.
1236 They get set to the values actually used during drawing when drawing is
1239 char m_FirstDisplayedWord
[g_MaxWordLength
+ 1];
1240 /* The scrolling display starts at this word. Since we can't use index
1241 numbers (word[12345] for example), we use the word itself. The scroll
1242 buttons set this to the next or previous word in the database. Typing by
1243 the user when the view has the focus will also change this starting word.
1246 rgb_color m_FocusedColour
;
1247 /* The colour to use for focused mode (typing by the user is received by
1250 bigtime_t m_LastTimeAKeyWasPressed
;
1251 /* Records the time when a key was last pressed. Used for determining when
1252 the user has stopped typing a batch of letters. */
1255 /* Height of a line of text in the font used for the word display.
1256 Includes the height of the letters plus a bit of extra space for between
1257 the lines (called leading). */
1260 /* The font used to draw the text in the window. */
1263 /* Maximum total height of the letters in the text, includes the part above
1264 the baseline and the part below. Doesn't include the sliver of space
1267 rgb_color m_UnfocusedColour
;
1268 /* The colour to use for unfocused mode, when user typing isn't active. */
1273 /******************************************************************************
1274 * The BWindow class for this program. It displays the database in real time,
1275 * and has various buttons and gadgets in the top half for changing settings
1276 * (live changes, no OK button, and they reflect changes done by other programs
1277 * using the server too). The bottom half is a scrolling view listing all the
1278 * words in the database. A simple graphic blotch behind each word shows
1279 * whether the word is strongly or weakly related to spam or genuine messages.
1280 * Most operations go through the scripting message system, but it also peeks
1281 * at the BApplication data for examining simple things and when redrawing the
1285 class DatabaseWindow
: public BWindow
1288 /* Constructor and destructor. */
1291 /* BeOS virtual functions. */
1292 virtual void MessageReceived (BMessage
*MessagePntr
);
1293 virtual bool QuitRequested ();
1296 /* Member variables. */
1297 ControlsView
*m_ControlsViewPntr
;
1298 WordsView
*m_WordsViewPntr
;
1303 /******************************************************************************
1304 * ABSApp is the BApplication class for this program. This handles messages
1305 * from the outside world (requests to load a database, or to add files to the
1306 * collection). It responds to command line arguments (if you start up the
1307 * program a second time, the system will just send the arguments to the
1308 * existing running program). It responds to scripting messages. And it
1309 * responds to messages from the window. Its thread does the main work of
1310 * updating the database and reading / writing files.
1313 class ABSApp
: public BApplication
1316 /* Constructor and destructor. */
1320 /* BeOS virtual functions. */
1321 virtual void AboutRequested ();
1322 virtual void ArgvReceived (int32 argc
, char **argv
);
1323 virtual status_t
GetSupportedSuites (BMessage
*MessagePntr
);
1324 virtual void MessageReceived (BMessage
*MessagePntr
);
1325 virtual void Pulse ();
1326 virtual bool QuitRequested ();
1327 virtual void ReadyToRun ();
1328 virtual void RefsReceived (BMessage
*MessagePntr
);
1329 virtual BHandler
*ResolveSpecifier (BMessage
*MessagePntr
, int32 Index
,
1330 BMessage
*SpecifierMsgPntr
, int32 SpecificationKind
, const char *Property
);
1333 /* Our member functions. */
1334 status_t
AddFileToDatabase (ClassificationTypes IsSpamOrWhat
,
1335 const char *FileName
, char *ErrorMessage
);
1336 status_t
AddPositionIOToDatabase (ClassificationTypes IsSpamOrWhat
,
1337 BPositionIO
*MessageIOPntr
, const char *OptionalFileName
,
1338 char *ErrorMessage
);
1339 status_t
AddStringToDatabase (ClassificationTypes IsSpamOrWhat
,
1340 const char *String
, char *ErrorMessage
);
1341 void AddWordsToSet (const char *InputString
, size_t NumberOfBytes
,
1342 char PrefixCharacter
, set
<string
> &WordSet
);
1343 status_t
CreateDatabaseFile (char *ErrorMessage
);
1344 void DefaultSettings ();
1345 status_t
DeleteDatabaseFile (char *ErrorMessage
);
1346 status_t
EvaluateFile (const char *PathName
, BMessage
*ReplyMessagePntr
,
1347 char *ErrorMessage
);
1348 status_t
EvaluatePositionIO (BPositionIO
*PositionIOPntr
,
1349 const char *OptionalFileName
, BMessage
*ReplyMessagePntr
,
1350 char *ErrorMessage
);
1351 status_t
EvaluateString (const char *BufferPntr
, ssize_t BufferSize
,
1352 BMessage
*ReplyMessagePntr
, char *ErrorMessage
);
1353 status_t
GetWordsFromPositionIO (BPositionIO
*PositionIOPntr
,
1354 const char *OptionalFileName
, set
<string
> &WordSet
, char *ErrorMessage
);
1355 status_t
InstallThings (char *ErrorMessage
);
1356 status_t
LoadDatabaseIfNeeded (char *ErrorMessage
);
1357 status_t
LoadSaveDatabase (bool DoLoad
, char *ErrorMessage
);
1359 status_t
LoadSaveSettings (bool DoLoad
);
1361 status_t
MakeBackup (char *ErrorMessage
);
1362 void MakeDatabaseEmpty ();
1363 void ProcessScriptingMessage (BMessage
*MessagePntr
,
1364 struct property_info
*PropInfoPntr
);
1365 status_t
PurgeOldWords (char *ErrorMessage
);
1366 status_t
RecursivelyTokenizeMailComponent (
1367 BMailComponent
*ComponentPntr
, const char *OptionalFileName
,
1368 set
<string
> &WordSet
, char *ErrorMessage
,
1369 int RecursionLevel
, int MaxRecursionLevel
);
1370 status_t
SaveDatabaseIfNeeded (char *ErrorMessage
);
1371 status_t
TokenizeParts (BPositionIO
*PositionIOPntr
,
1372 const char *OptionalFileName
, set
<string
> &WordSet
, char *ErrorMessage
);
1373 status_t
TokenizeWhole (BPositionIO
*PositionIOPntr
,
1374 const char *OptionalFileName
, set
<string
> &WordSet
, char *ErrorMessage
);
1377 /* Member variables. Many are read by the window thread to see if it needs
1378 updating, and to draw the words. However, the other threads will lock the
1379 BApplication or using scripting commands if they want to make changes. */
1381 bool m_DatabaseHasChanged
;
1382 /* Set to TRUE when the in-memory database (stored in m_WordMap) has
1383 changed and is different from the on-disk database file. When the
1384 application exits, the database will be written out if it has changed. */
1386 BString m_DatabaseFileName
;
1387 /* The absolute path name to use for the database file on disk. */
1389 bool m_IgnorePreviousClassification
;
1390 /* If TRUE then the previous classification of a message (stored in an
1391 attribute on the message file) will be ignored, and the message will be
1392 added to the requested spam/genuine list. If this is FALSE then the spam
1393 won't be added to the list if it has already been classified as specified,
1394 but if it was mis-classified, it will be removed from the old list and
1395 added to the new list. */
1398 /* The age of the oldest word. This will be the smallest age number in the
1399 database. Mostly useful for scaling graphics representing age in the word
1400 display. If the oldest word is no longer the oldest, this variable won't
1401 get immediately updated since it would take a lot of effort to find the
1402 next older age. Since it's only used for display, we'll let it be slightly
1403 incorrect. The next database load or purge will fix it. */
1406 /* When purging old words, they have to be at least this old to be eligible
1407 for deletion. Age is measured as the number of e-mails added to the
1408 database since the word was last updated in the database. Zero means all
1411 uint32 m_PurgePopularity
;
1412 /* When purging old words, they have to be less than or equal to this
1413 popularity limit to be eligible for deletion. Popularity is measured as
1414 the number of messages (spam and genuine) which have the word. Zero means
1417 ScoringModes m_ScoringMode
;
1418 /* Controls how to combine the word probabilities into an overall score.
1419 See the PN_SCORING_MODE comments for details. */
1421 BPath m_SettingsDirectoryPath
;
1422 /* The constructor initialises this to the settings directory path. It
1423 never changes after that. */
1425 bool m_SettingsHaveChanged
;
1426 /* Set to TRUE when the settings are changed (different than the ones which
1427 were loaded). When the application exits, the settings will be written out
1428 if they have changed. */
1430 double m_SmallestUseableDouble
;
1431 /* When multiplying fractional numbers together, avoid using numbers
1432 smaller than this because the double exponent range is close to being
1433 exhausted. The IEEE STANDARD 754 floating-point arithmetic (used on the
1434 Intel i8087 and later math processors) has 64 bit numbers with 53 bits of
1435 mantissa, giving it an underflow starting at 0.5**1022 = 2.2e-308 where it
1436 rounds off to the nearest multiple of 0.5**1074 = 4.9e-324. */
1438 TokenizeModes m_TokenizeMode
;
1439 /* Controls how to convert the raw message text into words. See the
1440 PN_TOKENIZE_MODE comments for details. */
1442 uint32 m_TotalGenuineMessages
;
1443 /* Number of genuine messages which are in the database. */
1445 uint32 m_TotalSpamMessages
;
1446 /* Number of spam messages which are in the database. */
1449 /* The number of words currently in the database. Stored separately as a
1450 member variable to avoid having to call m_WordMap.size() all the time,
1451 which other threads can't do while the database is being updated (but they
1452 can look at the word count variable). */
1454 StatisticsMap m_WordMap
;
1455 /* The in-memory data structure holding the set of words and their
1456 associated statistics. When the database isn't in use, it is an empty
1457 collection. You should lock the BApplication if you are using the word
1458 collection (reading or writing) from another thread. */
1463 /******************************************************************************
1464 * Global utility function to display an error message and return. The message
1465 * part describes the error, and if ErrorNumber is non-zero, gets the string
1466 * ", error code $X (standard description)." appended to it. If the message
1467 * is NULL then it gets defaulted to "Something went wrong". The title part
1468 * doesn't get displayed (no title bar in the dialog box, but you can see it in
1469 * the debugger as the window thread name), and defaults to "Error Message" if
1470 * you didn't specify one. If running in command line mode, the error gets
1471 * printed to stderr rather than showing up in a dialog box.
1475 DisplayErrorMessage (
1476 const char *MessageString
= NULL
,
1477 int ErrorNumber
= 0,
1478 const char *TitleString
= NULL
)
1481 char ErrorBuffer
[PATH_MAX
+ 1500];
1483 if (TitleString
== NULL
)
1484 TitleString
= "SpamDBM Error Message";
1486 if (MessageString
== NULL
)
1488 if (ErrorNumber
== 0)
1489 MessageString
= "No error, no message, why bother?";
1491 MessageString
= "Something went wrong";
1494 if (ErrorNumber
!= 0)
1496 sprintf (ErrorBuffer
, "%s, error code $%X/%d (%s) has occured.",
1497 MessageString
, ErrorNumber
, ErrorNumber
, strerror (ErrorNumber
));
1498 MessageString
= ErrorBuffer
;
1501 if (g_CommandLineMode
|| g_ServerMode
)
1502 cerr
<< TitleString
<< ": " << MessageString
<< endl
;
1505 AlertPntr
= new BAlert (TitleString
, MessageString
,
1506 "Acknowledge", NULL
, NULL
, B_WIDTH_AS_USUAL
, B_STOP_ALERT
);
1507 if (AlertPntr
!= NULL
) {
1508 AlertPntr
->SetFlags(AlertPntr
->Flags() | B_CLOSE_ON_ESCAPE
);
1516 /******************************************************************************
1517 * Word wrap a long line of text into shorter 79 column lines and print the
1518 * result on the given output stream.
1522 WrapTextToStream (ostream
& OutputStream
, const char *TextPntr
)
1524 const int LineLength
= 79;
1526 char TempString
[LineLength
+1];
1528 TempString
[LineLength
] = 0; /* Only needs to be done once. */
1530 while (*TextPntr
!= 0)
1532 while (isspace (*TextPntr
))
1533 TextPntr
++; /* Skip leading spaces. */
1535 break; /* It was all spaces, don't print any more. */
1537 strncpy (TempString
, TextPntr
, LineLength
);
1539 /* Advance StringPntr to the end of the temp string, partly to see how long
1540 it is (rather than doing strlen). */
1542 StringPntr
= TempString
;
1543 while (*StringPntr
!= 0)
1546 if (StringPntr
- TempString
< LineLength
)
1548 /* This line fits completely. */
1549 OutputStream
<< TempString
<< endl
;
1550 TextPntr
+= StringPntr
- TempString
;
1554 /* Advance StringPntr to the last space in the temp string. */
1556 while (StringPntr
> TempString
)
1558 if (isspace (*StringPntr
))
1559 break; /* Found the trailing space. */
1560 else /* Go backwards, looking for the trailing space. */
1564 /* Remove more trailing spaces at the end of the line, in case there were
1565 several spaces in a row. */
1567 while (StringPntr
> TempString
&& isspace (StringPntr
[-1]))
1570 /* Print the line of text and advance the text pointer too. */
1572 if (StringPntr
== TempString
)
1574 /* This line has no spaces, don't wrap it, just split off a chunk. */
1575 OutputStream
<< TempString
<< endl
;
1576 TextPntr
+= strlen (TempString
);
1580 *StringPntr
= 0; /* Cut off after the first trailing space. */
1581 OutputStream
<< TempString
<< endl
;
1582 TextPntr
+= StringPntr
- TempString
;
1588 /******************************************************************************
1589 * Print the usage info to the stream. Includes a list of all commands.
1591 ostream
& PrintUsage (ostream
& OutputStream
);
1593 ostream
& PrintUsage (ostream
& OutputStream
)
1595 struct property_info
*PropInfoPntr
;
1597 OutputStream
<< "\nSpamDBM - A Spam Database Manager\n";
1598 OutputStream
<< "Copyright © 2002 by Alexander G. M. Smith. ";
1599 OutputStream
<< "Released to the public domain.\n\n";
1600 WrapTextToStream (OutputStream
, "Compiled on " __DATE__
" at " __TIME__
1601 ". $Id: spamdbm.cpp 30630 2009-05-05 01:31:01Z bga $ $HeadURL: http://svn.haiku-os.org/haiku/haiku/trunk/src/bin/mail_utils/spamdbm.cpp $");
1602 OutputStream
<< "\n"
1603 "This is a program for classifying e-mail messages as spam (junk mail which\n"
1604 "you don't want to read) and regular genuine messages. It can learn what's\n"
1605 "spam and what's genuine. You just give it a bunch of spam messages and a\n"
1606 "bunch of non-spam ones. It uses them to make a list of the words from the\n"
1607 "messages with the probability that each word is from a spam message or from\n"
1608 "a genuine message. Later on, it can use those probabilities to classify\n"
1609 "new messages as spam or not spam. If the classifier stops working well\n"
1610 "(because the spammers have changed their writing style and vocabulary, or\n"
1611 "your regular correspondants are writing like spammers), you can use this\n"
1612 "program to update the list of words to identify the new messages\n"
1615 "The original idea was from Paul Graham's algorithm, which has an excellent\n"
1616 "writeup at: http://www.paulgraham.com/spam.html\n"
1618 "Gary Robinson came up with the improved algorithm, which you can read about at:\n"
1619 "http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html\n"
1621 "Then he, Tim Peters and the SpamBayes mailing list developed the Chi-Squared\n"
1622 "test, see http://mail.python.org/pipermail/spambayes/2002-October/001036.html\n"
1623 "for one of the earlier messages leading from the central limit theorem to\n"
1624 "the current chi-squared scoring method.\n"
1626 "Thanks go to Isaac Yonemoto for providing a better icon, which we can\n"
1627 "unfortunately no longer use, since the Hormel company wants people to\n"
1628 "avoid associating their meat product with junk e-mail.\n"
1630 "Tokenising code updated in 2005 to use some of the tricks that SpamBayes\n"
1631 "uses to extract words from messages. In particular, HTML is now handled.\n"
1633 "Usage: Specify the operation as the first argument followed by more\n"
1634 "information as appropriate. The program's configuration will affect the\n"
1635 "actual operation (things like the name of the database file to use, or\n"
1636 "whether it should allow non-email messages to be added). In command line\n"
1637 "mode it will do the operation and exit. In GUI/server mode a command line\n"
1638 "invocation will just send the command to the running server. You can also\n"
1639 "use BeOS scripting (see the \"Hey\" command which you can get from\n"
1640 "http://www.bebits.com/app/2042 ) to control the Spam server. And finally,\n"
1641 "there's also a GUI interface which shows up if you start it without any\n"
1642 "command line arguments.\n"
1647 "Stop the program. Useful if it's running as a server.\n"
1650 /* Go through all our scripting commands and add a description of each one to
1653 for (PropInfoPntr
= g_ScriptingPropertyList
+ 0;
1654 PropInfoPntr
->name
!= 0;
1657 switch (PropInfoPntr
->commands
[0])
1659 case B_GET_PROPERTY
:
1660 OutputStream
<< "Get " << PropInfoPntr
->name
<< endl
;
1663 case B_SET_PROPERTY
:
1664 OutputStream
<< "Set " << PropInfoPntr
->name
<< " NewValue" << endl
;
1667 case B_COUNT_PROPERTIES
:
1668 OutputStream
<< "Count " << PropInfoPntr
->name
<< endl
;
1671 case B_CREATE_PROPERTY
:
1672 OutputStream
<< "Create " << PropInfoPntr
->name
<< endl
;
1675 case B_DELETE_PROPERTY
:
1676 OutputStream
<< "Delete " << PropInfoPntr
->name
<< endl
;
1679 case B_EXECUTE_PROPERTY
:
1680 OutputStream
<< PropInfoPntr
->name
<< endl
;
1684 OutputStream
<< "Buggy Command: " << PropInfoPntr
->name
<< endl
;
1687 WrapTextToStream (OutputStream
, (char *)PropInfoPntr
->usage
);
1688 OutputStream
<< endl
;
1691 return OutputStream
;
1696 /******************************************************************************
1697 * A utility function to send a command to the application, will return after a
1698 * short delay if the application is busy (doesn't wait for it to be executed).
1699 * The reply from the application is also thrown away. It used to be an
1700 * overloaded function, but the system couldn't distinguish between bool and
1701 * int, so now it has slightly different names depending on the arguments.
1705 SubmitCommand (BMessage
& CommandMessage
)
1709 ErrorCode
= be_app_messenger
.SendMessage (&CommandMessage
,
1710 be_app_messenger
/* reply messenger, throw away the reply */,
1711 1000000 /* delivery timeout */);
1713 if (ErrorCode
!= B_OK
)
1714 cerr
<< "SubmitCommand failed to send a command, code " <<
1715 ErrorCode
<< " (" << strerror (ErrorCode
) << ")." << endl
;
1720 SubmitCommandString (
1721 PropertyNumbers Property
,
1723 const char *StringArgument
= NULL
)
1725 BMessage
CommandMessage (CommandCode
);
1727 if (Property
< 0 || Property
>= PN_MAX
)
1729 DisplayErrorMessage ("SubmitCommandString bug.");
1732 CommandMessage
.AddSpecifier (g_PropertyNames
[Property
]);
1733 if (StringArgument
!= NULL
)
1734 CommandMessage
.AddString (g_DataName
, StringArgument
);
1735 SubmitCommand (CommandMessage
);
1740 SubmitCommandInt32 (
1741 PropertyNumbers Property
,
1743 int32 Int32Argument
)
1745 BMessage
CommandMessage (CommandCode
);
1747 if (Property
< 0 || Property
>= PN_MAX
)
1749 DisplayErrorMessage ("SubmitCommandInt32 bug.");
1752 CommandMessage
.AddSpecifier (g_PropertyNames
[Property
]);
1753 CommandMessage
.AddInt32 (g_DataName
, Int32Argument
);
1754 SubmitCommand (CommandMessage
);
1760 PropertyNumbers Property
,
1764 BMessage
CommandMessage (CommandCode
);
1766 if (Property
< 0 || Property
>= PN_MAX
)
1768 DisplayErrorMessage ("SubmitCommandBool bug.");
1771 CommandMessage
.AddSpecifier (g_PropertyNames
[Property
]);
1772 CommandMessage
.AddBool (g_DataName
, BoolArgument
);
1773 SubmitCommand (CommandMessage
);
1778 /******************************************************************************
1779 * A utility function which will estimate the spaminess of file(s), not
1780 * callable from the application thread since it sends a scripting command to
1781 * the application and waits for results. For each file there will be an entry
1782 * reference in the message. For each of those, run it through the spam
1783 * estimator and display a box with the results. This function is used both by
1784 * the file requestor and by dragging and dropping into the middle of the words
1789 EstimateRefFilesAndDisplay (BMessage
*MessagePntr
)
1797 BMessage ReplyMessage
;
1798 BMessage ScriptingMessage
;
1799 const char *StringPntr
;
1802 char TempString
[PATH_MAX
+ 1024 +
1803 g_MaxInterestingWords
* (g_MaxWordLength
+ 16)];
1805 for (i
= 0; MessagePntr
->FindRef ("refs", i
, &EntryRef
) == B_OK
; i
++)
1807 /* See if the entry is a valid file or directory or other thing. */
1809 ErrorCode
= Entry
.SetTo (&EntryRef
, true /* traverse symbolic links */);
1810 if (ErrorCode
!= B_OK
|| !Entry
.Exists () || Entry
.GetPath (&Path
) != B_OK
)
1813 /* Evaluate the spaminess of the file. */
1815 ScriptingMessage
.MakeEmpty ();
1816 ScriptingMessage
.what
= B_SET_PROPERTY
;
1817 ScriptingMessage
.AddSpecifier (g_PropertyNames
[PN_EVALUATE
]);
1818 ScriptingMessage
.AddString (g_DataName
, Path
.Path ());
1820 if (be_app_messenger
.SendMessage (&ScriptingMessage
,&ReplyMessage
) != B_OK
)
1821 break; /* App has died or something is wrong. */
1823 if (ReplyMessage
.FindInt32 ("error", &TempInt32
) != B_OK
||
1825 break; /* Error messages will be displayed elsewhere. */
1827 ReplyMessage
.FindFloat (g_ResultName
, &TempFloat
);
1828 sprintf (TempString
, "%f spam ratio for \"%s\".\nThe top words are:",
1829 (double) TempFloat
, Path
.Path ());
1831 for (j
= 0; j
< 20 /* Don't print too many! */; j
++)
1833 if (ReplyMessage
.FindString ("words", j
, &StringPntr
) != B_OK
||
1834 ReplyMessage
.FindFloat ("ratios", j
, &TempFloat
) != B_OK
)
1837 sprintf (TempString
+ strlen (TempString
), "\n%s / %f",
1838 StringPntr
, TempFloat
);
1840 if (j
>= 20 && j
< g_MaxInterestingWords
)
1841 sprintf (TempString
+ strlen (TempString
), "\nAnd up to %d more words.",
1842 g_MaxInterestingWords
- j
);
1844 AlertPntr
= new BAlert ("Estimate", TempString
, "OK");
1845 if (AlertPntr
!= NULL
) {
1846 AlertPntr
->SetFlags(AlertPntr
->Flags() | B_CLOSE_ON_ESCAPE
);
1854 /******************************************************************************
1855 * A utility function from the http://sourceforge.net/projects/spambayes
1856 * SpamBayes project. Return prob(chisq >= x2, with v degrees of freedom). It
1857 * computes the probability that the chi-squared value (a kind of normalized
1858 * error measurement), with v degrees of freedom, would be larger than a given
1859 * number (x2; chi is the Greek letter X thus x2). So you can tell if the
1860 * error is really unusual (the returned probability is near zero meaning that
1861 * your measured error number is kind of large - actual chi-squared is rarely
1862 * above that number merely due to random effects), or if it happens often
1863 * (usually if the probability is over 5% then it's within 3 standard
1864 * deviations - meaning that chi-squared goes over your number fairly often due
1865 * merely to random effects). v must be even for this calculation to work.
1868 static double ChiSquaredProbability (double x2
, int v
)
1877 return -1.0; /* Out of range return value as a hint v is odd. */
1879 /* If x2 is very large, exp(-m) will underflow to 0. */
1881 sum
= term
= exp (-m
);
1882 for (i
= 1; i
< halfV
; i
++)
1888 /* With small x2 and large v, accumulated roundoff error, plus error in the
1889 platform exp(), can cause this to spill a few ULP above 1.0. For example,
1890 ChiSquaredProbability(100, 300) on my box has sum == 1.0 + 2.0**-52 at this
1891 point. Returning a value even a teensy bit over 1.0 is no good. */
1900 /******************************************************************************
1901 * A utility function to remove the "[Spam 99.9%] " from in front of the
1902 * MAIL:subject attribute of a file.
1905 static status_t
RemoveSpamPrefixFromSubjectAttribute (BNode
*BNodePntr
)
1908 const char *MailSubjectName
= "MAIL:subject";
1910 char SubjectString
[2000];
1912 ErrorCode
= BNodePntr
->ReadAttr (MailSubjectName
,
1913 B_STRING_TYPE
, 0 /* offset */, SubjectString
,
1914 sizeof (SubjectString
) - 1);
1916 return 0; /* The attribute isn't there so we don't care. */
1917 if (ErrorCode
>= (int) sizeof (SubjectString
) - 1)
1918 return 0; /* Can't handle subjects which are too long. */
1920 SubjectString
[ErrorCode
] = 0;
1921 ErrorCode
= 0; /* So do-nothing exit returns zero. */
1922 if (strncmp (SubjectString
, "[Spam ", 6) == 0)
1924 for (StringPntr
= SubjectString
;
1925 *StringPntr
!= 0 && *StringPntr
!= ']'; StringPntr
++)
1926 ; /* No body in this for loop. */
1927 if (StringPntr
[0] == ']' && StringPntr
[1] == ' ')
1929 ErrorCode
= BNodePntr
->RemoveAttr (MailSubjectName
);
1930 ErrorCode
= BNodePntr
->WriteAttr (MailSubjectName
,
1931 B_STRING_TYPE
, 0 /* offset */,
1932 StringPntr
+ 2, strlen (StringPntr
+ 2) + 1);
1943 /******************************************************************************
1944 * The tokenizing functions. To make tokenization of the text easier to
1945 * understand, it is broken up into several passes. Each pass goes over the
1946 * text (can include NUL bytes) and extracts all the words it can recognise
1947 * (can be none). The extracted words are added to the WordSet, with the
1948 * PrefixCharacter prepended (zero if none) so we can distinguish between words
1949 * found in headers and in the text body. It also modifies the input text
1950 * buffer in-place to change the text that the next pass will see (blanking out
1951 * words that it wants to delete, but not inserting much new text since the
1952 * buffer can't be enlarged). They all return the number of bytes remaining in
1953 * InputString after it has been modified to be input for the next pass.
1954 * Returns zero if it has exhausted the possibility of getting more words, or
1955 * if something goes wrong.
1958 static size_t TokenizerPassLowerCase (
1960 size_t NumberOfBytes
)
1962 char *EndOfStringPntr
;
1964 EndOfStringPntr
= BufferPntr
+ NumberOfBytes
;
1966 while (BufferPntr
< EndOfStringPntr
)
1968 /* Do our own lower case conversion; tolower () has problems with UTF-8
1969 characters that have the high bit set. */
1971 if (*BufferPntr
>= 'A' && *BufferPntr
<= 'Z')
1972 *BufferPntr
= *BufferPntr
+ ('a' - 'A');
1975 return NumberOfBytes
;
1979 /* A utility function for some commonly repeated code. If this was Modula-2,
1980 we could use a nested procedure. But it's not. Adds the given word to the set
1981 of words, checking for maximum word length and prepending the prefix to the
1982 word, which gets modified by this function to reflect the word actually added
1986 AddWordAndPrefixToSet (
1988 const char *PrefixString
,
1989 set
<string
> &WordSet
)
1994 if (Word
.size () > g_MaxWordLength
)
1995 Word
.resize (g_MaxWordLength
);
1996 Word
.insert (0, PrefixString
);
1997 WordSet
.insert (Word
);
2001 /* Hunt through the text for various URLs and extract the components as
2002 separate words. Doesn't affect the text in the buffer. Looks for
2003 protocol://user:password@computer:port/path?query=key#anchor strings. Also
2004 www.blah strings are detected and broken down. Doesn't do HREF="" strings
2005 where the string has a relative path (no host computer name). Assumes the
2006 input buffer is already in lower case. */
2008 static size_t TokenizerPassExtractURLs (
2010 size_t NumberOfBytes
,
2011 char PrefixCharacter
,
2012 set
<string
> &WordSet
)
2014 char *AtSignStringPntr
;
2015 char *HostStringPntr
;
2016 char *InputStringEndPntr
;
2017 char *InputStringPntr
;
2018 char *OptionsStringPntr
;
2019 char *PathStringPntr
;
2020 char PrefixString
[2];
2021 char *ProtocolStringPntr
;
2024 InputStringPntr
= BufferPntr
;
2025 InputStringEndPntr
= BufferPntr
+ NumberOfBytes
;
2026 PrefixString
[0] = PrefixCharacter
;
2027 PrefixString
[1] = 0;
2029 while (InputStringPntr
< InputStringEndPntr
- 4)
2031 HostStringPntr
= NULL
;
2032 if (memcmp (InputStringPntr
, "www.", 4) == 0)
2033 HostStringPntr
= InputStringPntr
;
2034 else if (memcmp (InputStringPntr
, "://", 3) == 0)
2036 /* Find the protocol name, and add it as a word such as "ftp:" "http:" */
2037 ProtocolStringPntr
= InputStringPntr
;
2038 while (ProtocolStringPntr
> BufferPntr
&&
2039 isalpha (ProtocolStringPntr
[-1]))
2040 ProtocolStringPntr
--;
2041 Word
.assign (ProtocolStringPntr
,
2042 (InputStringPntr
- ProtocolStringPntr
) + 1 /* for the colon */);
2043 AddWordAndPrefixToSet (Word
, PrefixString
, WordSet
);
2044 HostStringPntr
= InputStringPntr
+ 3; /* Skip past the "://" */
2046 if (HostStringPntr
== NULL
)
2052 /* Got a host name string starting at HostStringPntr. It's everything
2053 until the next slash or space, like "user:password@computer:port". */
2055 InputStringPntr
= HostStringPntr
;
2056 AtSignStringPntr
= NULL
;
2057 while (InputStringPntr
< InputStringEndPntr
&&
2058 (*InputStringPntr
!= '/' && !isspace (*InputStringPntr
)))
2060 if (*InputStringPntr
== '@')
2061 AtSignStringPntr
= InputStringPntr
;
2064 if (AtSignStringPntr
!= NULL
)
2066 /* Add a word with the user and password, unseparated. */
2067 Word
.assign (HostStringPntr
,
2068 AtSignStringPntr
- HostStringPntr
+ 1 /* for the @ sign */);
2069 AddWordAndPrefixToSet (Word
, PrefixString
, WordSet
);
2070 HostStringPntr
= AtSignStringPntr
+ 1;
2073 /* Add a word with the computer and port, unseparated. */
2075 Word
.assign (HostStringPntr
, InputStringPntr
- HostStringPntr
);
2076 AddWordAndPrefixToSet (Word
, PrefixString
, WordSet
);
2078 /* Now get the path name, not including the extra junk after ? and #
2079 separators (they're stored as separate options). Stops at white space or a
2080 double quote mark. */
2082 PathStringPntr
= InputStringPntr
;
2083 OptionsStringPntr
= NULL
;
2084 while (InputStringPntr
< InputStringEndPntr
&&
2085 (*InputStringPntr
!= '"' && !isspace (*InputStringPntr
)))
2087 if (OptionsStringPntr
== NULL
&&
2088 (*InputStringPntr
== '?' || *InputStringPntr
== '#'))
2089 OptionsStringPntr
= InputStringPntr
;
2093 if (OptionsStringPntr
== NULL
)
2095 /* No options, all path. */
2096 Word
.assign (PathStringPntr
, InputStringPntr
- PathStringPntr
);
2097 AddWordAndPrefixToSet (Word
, PrefixString
, WordSet
);
2101 /* Insert the path before the options. */
2102 Word
.assign (PathStringPntr
, OptionsStringPntr
- PathStringPntr
);
2103 AddWordAndPrefixToSet (Word
, PrefixString
, WordSet
);
2105 /* Insert all the options as a word. */
2106 Word
.assign (OptionsStringPntr
, InputStringPntr
- OptionsStringPntr
);
2107 AddWordAndPrefixToSet (Word
, PrefixString
, WordSet
);
2110 return NumberOfBytes
;
2114 /* Replace long Asian words (likely to actually be sentences) with the first
2115 character in the word. */
2117 static size_t TokenizerPassTruncateLongAsianWords (
2119 size_t NumberOfBytes
)
2121 char *EndOfStringPntr
;
2122 char *InputStringPntr
;
2124 char *OutputStringPntr
;
2125 char *StartOfInputLongUnicodeWord
;
2126 char *StartOfOutputLongUnicodeWord
;
2128 InputStringPntr
= BufferPntr
;
2129 EndOfStringPntr
= InputStringPntr
+ NumberOfBytes
;
2130 OutputStringPntr
= InputStringPntr
;
2131 StartOfInputLongUnicodeWord
= NULL
; /* Non-NULL flags it as started. */
2132 StartOfOutputLongUnicodeWord
= NULL
;
2134 /* Copy the text from the input to the output (same buffer), but when we find
2135 a sequence of UTF-8 characters that is too long then truncate it down to one
2136 character and reset the output pointer to be after that character, thus
2137 deleting the word. Replacing the deleted characters after it with spaces
2138 won't work since we need to preserve the lack of space to handle those sneaky
2139 HTML artificial word breakers. So that Thelongword<blah>ing becomes
2140 "T<blah>ing" rather than "T <blah>ing", so the next step joins them up into
2141 "Ting" rather than "T" and "ing". The first code in a UTF-8 character is
2142 11xxxxxx and subsequent ones are 10xxxxxx. */
2144 while (InputStringPntr
< EndOfStringPntr
)
2146 Letter
= (unsigned char) *InputStringPntr
;
2147 if (Letter
< 128) // Got a regular ASCII letter?
2149 if (StartOfInputLongUnicodeWord
!= NULL
)
2151 if (InputStringPntr
- StartOfInputLongUnicodeWord
>
2152 (int) g_MaxWordLength
* 2)
2154 /* Need to truncate the long word (100 bytes or about 50 characters)
2155 back down to the first UTF-8 character, so find out where the first
2156 character ends (skip past the 10xxxxxx bytes), and rewind the output
2157 pointer to be just after that (ignoring the rest of the long word in
2160 OutputStringPntr
= StartOfOutputLongUnicodeWord
+ 1;
2161 while (OutputStringPntr
< InputStringPntr
)
2163 Letter
= (unsigned char) *OutputStringPntr
;
2164 if (Letter
< 128 || Letter
>= 192)
2166 ++OutputStringPntr
; // Still a UTF-8 middle of the character code.
2169 StartOfInputLongUnicodeWord
= NULL
;
2172 else if (Letter
>= 192 && StartOfInputLongUnicodeWord
== NULL
)
2174 /* Got the start of a UTF-8 character. Remember the spot so we can see
2175 if this is a too long UTF-8 word, which is often a whole sentence in
2176 asian languages, since they sort of use a single character per word. */
2178 StartOfInputLongUnicodeWord
= InputStringPntr
;
2179 StartOfOutputLongUnicodeWord
= OutputStringPntr
;
2181 *OutputStringPntr
++ = *InputStringPntr
++;
2183 return OutputStringPntr
- BufferPntr
;
2187 /* Find all the words in the string and add them to our local set of words.
2188 The characters considered white space are defined by g_SpaceCharacters. This
2189 function is also used as a subroutine by other tokenizer functions when they
2190 have a bunch of presumably plain text they want broken into words and added. */
2192 static size_t TokenizerPassGetPlainWords (
2194 size_t NumberOfBytes
,
2195 char PrefixCharacter
,
2196 set
<string
> &WordSet
)
2198 string AccumulatedWord
;
2199 char *EndOfStringPntr
;
2203 if (NumberOfBytes
<= 0)
2204 return 0; /* Nothing to process. */
2206 if (PrefixCharacter
!= 0)
2207 AccumulatedWord
= PrefixCharacter
;
2208 EndOfStringPntr
= BufferPntr
+ NumberOfBytes
;
2211 if (BufferPntr
>= EndOfStringPntr
)
2212 Letter
= EOF
; // Usually a negative number.
2214 Letter
= (unsigned char) *BufferPntr
++;
2216 /* See if it is a letter we treat as white space. Some word separators
2217 like dashes and periods aren't considered as space. Note that codes above
2218 127 are UTF-8 characters, which we consider non-space. */
2220 if (Letter
< 0 /* EOF is -1 */ ||
2221 (Letter
< 128 && g_SpaceCharacters
[Letter
]))
2223 /* That space finished off a word. Remove trailing periods... */
2225 while ((Length
= AccumulatedWord
.size()) > 0 &&
2226 AccumulatedWord
[Length
-1] == '.')
2227 AccumulatedWord
.resize (Length
- 1);
2229 /* If there's anything left in the word, add it to the set. Also ignore
2230 words which are too big (it's probably some binary encoded data). But
2231 leave room for supercalifragilisticexpialidoceous. According to one web
2232 site, pneumonoultramicroscopicsilicovolcanoconiosis is the longest word
2233 currently in English. Note that some uuencoded data was seen with a 60
2234 character line length. */
2236 if (PrefixCharacter
!= 0)
2237 Length
--; // Don't count prefix when judging size or emptiness.
2238 if (Length
> 0 && Length
<= g_MaxWordLength
)
2239 WordSet
.insert (AccumulatedWord
);
2241 /* Empty out the string to get ready for the next word. Not quite empty,
2242 start it off with the prefix character if any. */
2244 if (PrefixCharacter
!= 0)
2245 AccumulatedWord
= PrefixCharacter
;
2247 AccumulatedWord
.resize (0);
2249 else /* Not a space-like character, add it to the word. */
2250 AccumulatedWord
.append (1 /* one copy of the char */, (char) Letter
);
2253 break; /* End of data. Exit here so that last word got processed. */
2255 return NumberOfBytes
;
2259 /* Delete Things from the text. The Thing is marked by a start string and an
2260 end string, such as "<!--" and "--> for HTML comment things. All the text
2261 between the markers will be added to the word list before it gets deleted from
2262 the buffer. The markers must be prepared in lower case and the buffer is
2263 assumed to have already been converted to lower case. You can specify an empty
2264 string for the end marker if you're just matching a string constant like
2265 " ", which you would put in the starting marker. This is a utility
2266 function used by other tokenizer functions. */
2268 static size_t TokenizerUtilRemoveStartEndThing (
2270 size_t NumberOfBytes
,
2271 char PrefixCharacter
,
2272 set
<string
> &WordSet
,
2273 const char *ThingStartCode
,
2274 const char *ThingEndCode
,
2275 bool ReplaceWithSpace
)
2277 char *EndOfStringPntr
;
2278 bool FoundAndDeletedThing
;
2279 char *InputStringPntr
;
2280 char *OutputStringPntr
;
2283 int ThingStartLength
;
2285 InputStringPntr
= BufferPntr
;
2286 EndOfStringPntr
= InputStringPntr
+ NumberOfBytes
;
2287 OutputStringPntr
= InputStringPntr
;
2288 ThingStartLength
= strlen (ThingStartCode
);
2289 ThingEndLength
= strlen (ThingEndCode
);
2291 if (ThingStartLength
<= 0)
2292 return NumberOfBytes
; /* Need some things to look for first! */
2294 while (InputStringPntr
< EndOfStringPntr
)
2296 /* Search for the starting marker. */
2298 FoundAndDeletedThing
= false;
2299 if (EndOfStringPntr
- InputStringPntr
>=
2300 ThingStartLength
+ ThingEndLength
/* space remains for start + end */ &&
2301 *InputStringPntr
== *ThingStartCode
&&
2302 memcmp (InputStringPntr
, ThingStartCode
, ThingStartLength
) == 0)
2304 /* Found the start marker. Look for the terminating string. If it is an
2305 empty string, then we've found it right now! */
2307 ThingEndPntr
= InputStringPntr
+ ThingStartLength
;
2308 while (EndOfStringPntr
- ThingEndPntr
>= ThingEndLength
)
2310 if (ThingEndLength
== 0 ||
2311 (*ThingEndPntr
== *ThingEndCode
&&
2312 memcmp (ThingEndPntr
, ThingEndCode
, ThingEndLength
) == 0))
2314 /* Got the end of the Thing. First dump the text inbetween the start
2315 and end markers into the words list. */
2317 TokenizerPassGetPlainWords (InputStringPntr
+ ThingStartLength
,
2318 ThingEndPntr
- (InputStringPntr
+ ThingStartLength
),
2319 PrefixCharacter
, WordSet
);
2321 /* Delete by not updating the output pointer while moving the input
2322 pointer to just after the ending tag. */
2324 InputStringPntr
= ThingEndPntr
+ ThingEndLength
;
2325 if (ReplaceWithSpace
)
2326 *OutputStringPntr
++ = ' ';
2327 FoundAndDeletedThing
= true;
2331 } /* End while ThingEndPntr */
2333 if (!FoundAndDeletedThing
)
2334 *OutputStringPntr
++ = *InputStringPntr
++;
2335 } /* End while InputStringPntr */
2337 return OutputStringPntr
- BufferPntr
;
2341 static size_t TokenizerPassRemoveHTMLComments (
2343 size_t NumberOfBytes
,
2344 char PrefixCharacter
,
2345 set
<string
> &WordSet
)
2347 return TokenizerUtilRemoveStartEndThing (BufferPntr
, NumberOfBytes
,
2348 PrefixCharacter
, WordSet
, "<!--", "-->", false);
2352 static size_t TokenizerPassRemoveHTMLStyle (
2354 size_t NumberOfBytes
,
2355 char PrefixCharacter
,
2356 set
<string
> &WordSet
)
2358 return TokenizerUtilRemoveStartEndThing (BufferPntr
, NumberOfBytes
,
2359 PrefixCharacter
, WordSet
,
2360 "<style", "/style>", false /* replace with space if true */);
2364 /* Convert Japanese periods (a round hollow dot symbol) to spaces so that the
2365 start of the next sentence is recognised at least as the start of a very long
2366 word. The Japanese comma also does the same job. */
2368 static size_t TokenizerPassJapanesePeriodsToSpaces (
2370 size_t NumberOfBytes
,
2371 char PrefixCharacter
,
2372 set
<string
> &WordSet
)
2374 size_t BytesRemaining
= NumberOfBytes
;
2376 BytesRemaining
= TokenizerUtilRemoveStartEndThing (BufferPntr
,
2377 BytesRemaining
, PrefixCharacter
, WordSet
, "。" /* period */, "", true);
2378 BytesRemaining
= TokenizerUtilRemoveStartEndThing (BufferPntr
,
2379 BytesRemaining
, PrefixCharacter
, WordSet
, "、" /* comma */, "", true);
2380 return BytesRemaining
;
2384 /* Delete HTML tags from the text. The contents of the tag are added as words
2385 before being deleted. <P>, <BR> and are replaced by spaces at this
2386 stage while other HTML things get replaced by nothing. */
2388 static size_t TokenizerPassRemoveHTMLTags (
2390 size_t NumberOfBytes
,
2391 char PrefixCharacter
,
2392 set
<string
> &WordSet
)
2394 size_t BytesRemaining
= NumberOfBytes
;
2396 BytesRemaining
= TokenizerUtilRemoveStartEndThing (BufferPntr
,
2397 BytesRemaining
, PrefixCharacter
, WordSet
, " ", "", true);
2398 BytesRemaining
= TokenizerUtilRemoveStartEndThing (BufferPntr
,
2399 BytesRemaining
, PrefixCharacter
, WordSet
, "<p", ">", true);
2400 BytesRemaining
= TokenizerUtilRemoveStartEndThing (BufferPntr
,
2401 BytesRemaining
, PrefixCharacter
, WordSet
, "<br", ">", true);
2402 BytesRemaining
= TokenizerUtilRemoveStartEndThing (BufferPntr
,
2403 BytesRemaining
, PrefixCharacter
, WordSet
, "<", ">", false);
2404 return BytesRemaining
;
2409 /******************************************************************************
2410 * Implementation of the ABSApp class, constructor, destructor and the rest of
2411 * the member functions in mostly alphabetical order.
2415 : BApplication (g_ABSAppSignature
),
2416 m_DatabaseHasChanged (false),
2417 m_SettingsHaveChanged (false)
2422 const void *ResourceData
;
2423 size_t ResourceSize
;
2424 BResources
*ResourcesPntr
;
2426 MakeDatabaseEmpty ();
2428 /* Set up the pathname which identifies our settings directory. Note that
2429 the actual settings are loaded later on (or set to defaults) by the main()
2430 function, before this BApplication starts running. So we don't bother
2431 initialising the other setting related variables here. */
2434 find_directory (B_USER_SETTINGS_DIRECTORY
, &m_SettingsDirectoryPath
);
2435 if (ErrorCode
== B_OK
)
2436 ErrorCode
= m_SettingsDirectoryPath
.Append (g_SettingsDirectoryName
);
2437 if (ErrorCode
!= B_OK
)
2438 m_SettingsDirectoryPath
.SetTo (".");
2440 /* Set up the table which identifies which characters are spaces and which
2441 are not. Spaces are all control characters and all punctuation except for:
2442 apostrophe (so "it's" and possessive versions of words get stored), dash (for
2443 hyphenated words), dollar sign (for cash amounts), period (for IP addresses,
2444 we later remove trailing periods). */
2446 memset (g_SpaceCharacters
, 1, sizeof (g_SpaceCharacters
));
2447 g_SpaceCharacters
['\''] = false;
2448 g_SpaceCharacters
['-'] = false;
2449 g_SpaceCharacters
['$'] = false;
2450 g_SpaceCharacters
['.'] = false;
2451 for (i
= '0'; i
<= '9'; i
++)
2452 g_SpaceCharacters
[i
] = false;
2453 for (i
= 'A'; i
<= 'Z'; i
++)
2454 g_SpaceCharacters
[i
] = false;
2455 for (i
= 'a'; i
<= 'z'; i
++)
2456 g_SpaceCharacters
[i
] = false;
2458 /* Initialise the busy cursor from data in the application's resources. */
2460 if ((ResourcesPntr
= AppResources ()) != NULL
&& (ResourceData
=
2461 ResourcesPntr
->LoadResource ('CURS', "Busy Cursor", &ResourceSize
)) != NULL
2462 && ResourceSize
>= 68 /* Size of a raw 2x16x16x8+4 cursor is 68 bytes */)
2463 g_BusyCursor
= new BCursor (ResourceData
);
2465 /* Find out the smallest usable double by seeing how small we can make it. */
2467 m_SmallestUseableDouble
= 1.0;
2469 while (HalvingCount
< 10000 && m_SmallestUseableDouble
> 0.0)
2472 m_SmallestUseableDouble
/= 2;
2475 /* Recreate the number. But don't make quite as small, we want to allow some
2476 precision bits and a bit of extra margin for intermediate results in future
2479 HalvingCount
-= 50 + sizeof (double) * 8;
2481 m_SmallestUseableDouble
= 1.0;
2482 while (HalvingCount
> 0)
2485 m_SmallestUseableDouble
/= 2;
2493 char ErrorMessage
[PATH_MAX
+ 1024];
2495 if (m_SettingsHaveChanged
)
2496 LoadSaveSettings (false /* DoLoad */);
2497 if ((ErrorCode
= SaveDatabaseIfNeeded (ErrorMessage
)) != B_OK
)
2498 DisplayErrorMessage (ErrorMessage
, ErrorCode
, "Exiting Error");
2499 delete g_BusyCursor
;
2500 g_BusyCursor
= NULL
;
2504 /* Display a box showing information about this program. */
2507 ABSApp::AboutRequested ()
2509 BAlert
*AboutAlertPntr
;
2511 AboutAlertPntr
= new BAlert ("About",
2512 "SpamDBM - Spam Database Manager\n\n"
2514 "This is a BeOS program for classifying e-mail messages as spam (unwanted \
2515 junk mail) or as genuine mail using a Bayesian statistical approach. There \
2516 is also a Mail Daemon Replacement add-on to filter mail using the \
2517 classification statistics collected earlier.\n\n"
2519 "Written by Alexander G. M. Smith, fall 2002.\n\n"
2521 "The original idea was from Paul Graham's algorithm, which has an excellent \
2522 writeup at: http://www.paulgraham.com/spam.html\n\n"
2524 "Gary Robinson came up with the improved algorithm, which you can read about \
2525 at: http://radio.weblogs.com/0101454/stories/2002/09/16/spamDetection.html\n\n"
2527 "Mr. Robinson, Tim Peters and the SpamBayes mailing list people then \
2528 developed the even better chi-squared scoring method.\n\n"
2530 "Icon courtesy of Isaac Yonemoto, though it is no longer used since Hormel \
2531 doesn't want their meat product associated with junk e-mail.\n\n"
2533 "Tokenising code updated in 2005 to use some of the tricks that SpamBayes \
2534 uses to extract words from messages. In particular, HTML is now handled.\n\n"
2536 "Released to the public domain, with no warranty.\n"
2537 "$Revision: 30630 $\n"
2538 "Compiled on " __DATE__
" at " __TIME__
".", "Done");
2539 if (AboutAlertPntr
!= NULL
)
2541 AboutAlertPntr
->SetFlags(AboutAlertPntr
->Flags() | B_CLOSE_ON_ESCAPE
);
2542 AboutAlertPntr
->Go ();
2547 /* Add the text in the given file to the database as an example of a spam or
2548 genuine message, or removes it from the database if you claim it is
2549 CL_UNCERTAIN. Also resets the spam ratio attribute to show the effect of the
2552 status_t
ABSApp::AddFileToDatabase (
2553 ClassificationTypes IsSpamOrWhat
,
2554 const char *FileName
,
2559 BMessage TempBMessage
;
2561 ErrorCode
= MessageFile
.SetTo (FileName
, B_READ_ONLY
);
2562 if (ErrorCode
!= B_OK
)
2564 sprintf (ErrorMessage
, "Unable to open file \"%s\" for reading", FileName
);
2568 ErrorCode
= AddPositionIOToDatabase (IsSpamOrWhat
,
2569 &MessageFile
, FileName
, ErrorMessage
);
2570 MessageFile
.Unset ();
2571 if (ErrorCode
!= B_OK
)
2574 /* Re-evaluate the file so that the user sees the new ratio attribute. */
2575 return EvaluateFile (FileName
, &TempBMessage
, ErrorMessage
);
2579 /* Add the given text to the database. The unique words found in MessageIOPntr
2580 will be added to the database (incrementing the count for the number of
2581 messages using each word, either the spam or genuine count depending on
2582 IsSpamOrWhat). It will remove the message (decrement the word counts) if you
2583 specify CL_UNCERTAIN as the new classification. And if it switches from spam
2584 to genuine or vice versa, it will do both - decrement the counts for the old
2585 class and increment the counts for the new one. An attribute will be added to
2586 MessageIOPntr (if it is a file) to record that it has been marked as Spam or
2587 Genuine (so that it doesn't get added to the database a second time). If it is
2588 being removed from the database, the classification attribute gets removed too.
2589 If things go wrong, a non-zero error code will be returned and an explanation
2590 written to ErrorMessage (assumed to be at least PATH_MAX + 1024 bytes long).
2591 OptionalFileName is just used in the error message to identify the file to the
2594 status_t
ABSApp::AddPositionIOToDatabase (
2595 ClassificationTypes IsSpamOrWhat
,
2596 BPositionIO
*MessageIOPntr
,
2597 const char *OptionalFileName
,
2601 char ClassificationString
[NAME_MAX
];
2602 StatisticsMap::iterator DataIter
;
2603 status_t ErrorCode
= 0;
2604 pair
<StatisticsMap::iterator
,bool> InsertResult
;
2606 StatisticsRecord NewStatistics
;
2607 ClassificationTypes PreviousClassification
;
2608 StatisticsPointer StatisticsPntr
;
2609 set
<string
>::iterator WordEndIter
;
2610 set
<string
>::iterator WordIter
;
2611 set
<string
> WordSet
;
2613 NewAge
= m_TotalGenuineMessages
+ m_TotalSpamMessages
;
2614 if (NewAge
>= 0xFFFFFFF0UL
)
2616 sprintf (ErrorMessage
, "The database is full! There are %lu messages in "
2617 "it and we can't add any more without overflowing the maximum integer "
2618 "representation in 32 bits", NewAge
);
2622 /* Check that this file hasn't already been added to the database. */
2624 PreviousClassification
= CL_UNCERTAIN
;
2625 BNodePntr
= dynamic_cast<BNode
*> (MessageIOPntr
);
2626 if (BNodePntr
!= NULL
) /* If this thing might have attributes. */
2628 ErrorCode
= BNodePntr
->ReadAttr (g_AttributeNameClassification
,
2629 B_STRING_TYPE
, 0 /* offset */, ClassificationString
,
2630 sizeof (ClassificationString
) - 1);
2631 if (ErrorCode
<= 0) /* Positive values for the number of bytes read */
2632 strcpy (ClassificationString
, "none");
2633 else /* Just in case it needs a NUL at the end. */
2634 ClassificationString
[ErrorCode
] = 0;
2636 if (strcasecmp (ClassificationString
, g_ClassifiedSpam
) == 0)
2637 PreviousClassification
= CL_SPAM
;
2638 else if (strcasecmp (ClassificationString
, g_ClassifiedGenuine
) == 0)
2639 PreviousClassification
= CL_GENUINE
;
2642 if (!m_IgnorePreviousClassification
&&
2643 PreviousClassification
!= CL_UNCERTAIN
)
2645 if (IsSpamOrWhat
== PreviousClassification
)
2647 sprintf (ErrorMessage
, "Ignoring file \"%s\" since it seems to have "
2648 "already been classified as %s.", OptionalFileName
,
2649 g_ClassificationTypeNames
[IsSpamOrWhat
]);
2653 sprintf (ErrorMessage
, "Changing existing classification of file \"%s\" "
2654 "from %s to %s.", OptionalFileName
,
2655 g_ClassificationTypeNames
[PreviousClassification
],
2656 g_ClassificationTypeNames
[IsSpamOrWhat
]);
2658 DisplayErrorMessage (ErrorMessage
, 0, "Note");
2661 if (!m_IgnorePreviousClassification
&&
2662 IsSpamOrWhat
== PreviousClassification
)
2663 /* Nothing to do if it is already classified correctly and the user doesn't
2664 want double classification. */
2667 /* Get the list of unique words in the file. */
2669 ErrorCode
= GetWordsFromPositionIO (MessageIOPntr
, OptionalFileName
,
2670 WordSet
, ErrorMessage
);
2671 if (ErrorCode
!= B_OK
)
2674 /* Update the count of the number of messages processed, with corrections if
2675 reclassifying a message. */
2677 m_DatabaseHasChanged
= true;
2679 if (!m_IgnorePreviousClassification
&&
2680 PreviousClassification
== CL_SPAM
&& m_TotalSpamMessages
> 0)
2681 m_TotalSpamMessages
--;
2683 if (IsSpamOrWhat
== CL_SPAM
)
2684 m_TotalSpamMessages
++;
2686 if (!m_IgnorePreviousClassification
&&
2687 PreviousClassification
== CL_GENUINE
&& m_TotalGenuineMessages
> 0)
2688 m_TotalGenuineMessages
--;
2690 if (IsSpamOrWhat
== CL_GENUINE
)
2691 m_TotalGenuineMessages
++;
2693 /* Mark the file's attributes with the new classification. Don't care if it
2696 if (BNodePntr
!= NULL
) /* If this thing might have attributes. */
2698 ErrorCode
= BNodePntr
->RemoveAttr (g_AttributeNameClassification
);
2699 if (IsSpamOrWhat
!= CL_UNCERTAIN
)
2701 strcpy (ClassificationString
, g_ClassificationTypeNames
[IsSpamOrWhat
]);
2702 ErrorCode
= BNodePntr
->WriteAttr (g_AttributeNameClassification
,
2703 B_STRING_TYPE
, 0 /* offset */,
2704 ClassificationString
, strlen (ClassificationString
) + 1);
2708 /* Add the words to the database by incrementing or decrementing the counts
2709 for each word as appropriate. */
2711 WordEndIter
= WordSet
.end ();
2712 for (WordIter
= WordSet
.begin (); WordIter
!= WordEndIter
; WordIter
++)
2714 if ((DataIter
= m_WordMap
.find (*WordIter
)) == m_WordMap
.end ())
2716 /* No record in the database for the word. */
2718 if (IsSpamOrWhat
== CL_UNCERTAIN
)
2719 continue; /* Not adding words, don't have to subtract from nothing. */
2721 /* Create a new one record in the database for the new word. */
2723 memset (&NewStatistics
, 0, sizeof (NewStatistics
));
2724 InsertResult
= m_WordMap
.insert (
2725 StatisticsMap::value_type (*WordIter
, NewStatistics
));
2726 if (!InsertResult
.second
)
2728 sprintf (ErrorMessage
, "Failed to insert new database entry for "
2729 "word \"%s\", while processing file \"%s\"",
2730 WordIter
->c_str (), OptionalFileName
);
2733 DataIter
= InsertResult
.first
;
2737 /* Got the database record for the word, update the statistics. */
2739 StatisticsPntr
= &DataIter
->second
;
2741 StatisticsPntr
->age
= NewAge
;
2743 /* Can't update m_OldestAge here, since it would take a lot of effort to
2744 find the next older age. Since it's only used for display, we'll let it be
2745 slightly incorrect. The next database load or purge will fix it. */
2747 if (IsSpamOrWhat
== CL_SPAM
)
2748 StatisticsPntr
->spamCount
++;
2750 if (IsSpamOrWhat
== CL_GENUINE
)
2751 StatisticsPntr
->genuineCount
++;
2753 if (!m_IgnorePreviousClassification
&&
2754 PreviousClassification
== CL_SPAM
&& StatisticsPntr
->spamCount
> 0)
2755 StatisticsPntr
->spamCount
--;
2757 if (!m_IgnorePreviousClassification
&&
2758 PreviousClassification
== CL_GENUINE
&& StatisticsPntr
->genuineCount
> 0)
2759 StatisticsPntr
->genuineCount
--;
2766 /* Add the text in the string to the database as an example of a spam or
2769 status_t
ABSApp::AddStringToDatabase (
2770 ClassificationTypes IsSpamOrWhat
,
2774 BMemoryIO
MemoryIO (String
, strlen (String
));
2776 return AddPositionIOToDatabase (IsSpamOrWhat
, &MemoryIO
,
2777 "Memory Buffer" /* OptionalFileName */, ErrorMessage
);
2781 /* Given a bunch of text, find the words within it (doing special tricks to
2782 extract words from HTML), and add them to the set. Allow NULs in the text. If
2783 the PrefixCharacter isn't zero then it is prepended to all words found (so you
2784 can distinguish words as being from a header or from the body text). See also
2785 TokenizeWhole which does something similar. */
2788 ABSApp::AddWordsToSet (
2789 const char *InputString
,
2790 size_t NumberOfBytes
,
2791 char PrefixCharacter
,
2792 set
<string
> &WordSet
)
2798 /* Copy the input buffer. The code will be modifying it in-place as HTML
2799 fragments and other junk are deleted. */
2801 BufferPntr
= new char [NumberOfBytes
];
2802 if (BufferPntr
== NULL
)
2804 memcpy (BufferPntr
, InputString
, NumberOfBytes
);
2806 /* Do the tokenization. Each pass does something to the text in the buffer,
2807 and may add words to the word set. */
2809 CurrentSize
= NumberOfBytes
;
2810 for (PassNumber
= 1; PassNumber
<= 8 && CurrentSize
> 0 ; PassNumber
++)
2814 case 1: /* Lowercase first, rest of them assume lower case inputs. */
2815 CurrentSize
= TokenizerPassLowerCase (BufferPntr
, CurrentSize
);
2817 case 2: CurrentSize
= TokenizerPassJapanesePeriodsToSpaces (
2818 BufferPntr
, CurrentSize
, PrefixCharacter
, WordSet
); break;
2819 case 3: CurrentSize
= TokenizerPassTruncateLongAsianWords (
2820 BufferPntr
, CurrentSize
); break;
2821 case 4: CurrentSize
= TokenizerPassRemoveHTMLComments (
2822 BufferPntr
, CurrentSize
, 'Z', WordSet
); break;
2823 case 5: CurrentSize
= TokenizerPassRemoveHTMLStyle (
2824 BufferPntr
, CurrentSize
, 'Z', WordSet
); break;
2825 case 6: CurrentSize
= TokenizerPassExtractURLs (
2826 BufferPntr
, CurrentSize
, 'Z', WordSet
); break;
2827 case 7: CurrentSize
= TokenizerPassRemoveHTMLTags (
2828 BufferPntr
, CurrentSize
, 'Z', WordSet
); break;
2829 case 8: CurrentSize
= TokenizerPassGetPlainWords (
2830 BufferPntr
, CurrentSize
, PrefixCharacter
, WordSet
); break;
2835 delete [] BufferPntr
;
2839 /* The user has provided a command line. This could actually be from a
2840 separate attempt to invoke the program (this application's resource/attributes
2841 have the launch flags set to "single launch", so the shell doesn't start the
2842 program but instead sends the arguments to the already running instance). In
2843 either case, the command is sent to an intermediary thread where it is
2844 asynchronously converted into a scripting message(s) that are sent back to this
2845 BApplication. The intermediary is needed since we can't recursively execute
2846 scripting messages while processing a message (this ArgsReceived one). */
2849 ABSApp::ArgvReceived (int32 argc
, char **argv
)
2851 if (g_CommanderLooperPntr
!= NULL
)
2852 g_CommanderLooperPntr
->CommandArguments (argc
, argv
);
2856 /* Create a new empty database. Note that we have to write out the new file
2857 immediately, otherwise other operations will see the empty database and then
2858 try to load the file, and complain that it doesn't exist. Now they will see
2859 the empty database and redundantly load the empty file. */
2861 status_t
ABSApp::CreateDatabaseFile (char *ErrorMessage
)
2863 MakeDatabaseEmpty ();
2864 m_DatabaseHasChanged
= true;
2865 return SaveDatabaseIfNeeded (ErrorMessage
); /* Make it now. */
2869 /* Set the settings to the defaults. Needed in case there isn't a settings
2870 file or it is obsolete. */
2873 ABSApp::DefaultSettings ()
2876 BPath
DatabasePath (m_SettingsDirectoryPath
);
2877 char TempString
[PATH_MAX
];
2879 /* The default database file is in the settings directory. */
2881 ErrorCode
= DatabasePath
.Append (g_DefaultDatabaseFileName
);
2882 if (ErrorCode
!= B_OK
)
2883 strcpy (TempString
, g_DefaultDatabaseFileName
); /* Unlikely to happen. */
2885 strcpy (TempString
, DatabasePath
.Path ());
2886 m_DatabaseFileName
.SetTo (TempString
);
2888 // Users need to be allowed to undo their mistakes...
2889 m_IgnorePreviousClassification
= true;
2890 g_ServerMode
= true;
2892 m_PurgePopularity
= 2;
2893 m_ScoringMode
= SM_CHISQUARED
;
2894 m_TokenizeMode
= TM_ANY_TEXT_HEADER
;
2896 m_SettingsHaveChanged
= true;
2900 /* Deletes the database file, and the backup file, and clears the database but
2901 marks it as not changed so that it doesn't get written out when the program
2904 status_t
ABSApp::DeleteDatabaseFile (char *ErrorMessage
)
2909 char TempString
[PATH_MAX
+20];
2911 /* Clear the in-memory database. */
2913 MakeDatabaseEmpty ();
2914 m_DatabaseHasChanged
= false;
2916 /* Delete the backup files first. Don't care if it fails. */
2918 for (i
= 0; i
< g_MaxBackups
; i
++)
2920 strcpy (TempString
, m_DatabaseFileName
.String ());
2921 sprintf (TempString
+ strlen (TempString
), g_BackupSuffix
, i
);
2922 ErrorCode
= FileEntry
.SetTo (TempString
);
2923 if (ErrorCode
== B_OK
)
2924 FileEntry
.Remove ();
2927 /* Delete the main database file. */
2929 strcpy (TempString
, m_DatabaseFileName
.String ());
2930 ErrorCode
= FileEntry
.SetTo (TempString
);
2931 if (ErrorCode
!= B_OK
)
2933 sprintf (ErrorMessage
, "While deleting, failed to make BEntry for "
2934 "\"%s\" (does the directory exist?)", TempString
);
2938 ErrorCode
= FileEntry
.Remove ();
2939 if (ErrorCode
!= B_OK
)
2940 sprintf (ErrorMessage
, "While deleting, failed to remove file "
2941 "\"%s\"", TempString
);
2947 /* Evaluate the given file as being a spam message, and tag it with the
2948 resulting spam probability ratio. If it also has an e-mail subject attribute,
2949 remove the [Spam 99.9%] prefix since the number usually changes. */
2951 status_t
ABSApp::EvaluateFile (
2952 const char *PathName
,
2953 BMessage
*ReplyMessagePntr
,
2960 /* Open the specified file. */
2962 ErrorCode
= TextFile
.SetTo (PathName
, B_READ_ONLY
);
2963 if (ErrorCode
!= B_OK
)
2965 sprintf (ErrorMessage
, "Problems opening file \"%s\" for evaluating",
2971 EvaluatePositionIO (&TextFile
, PathName
, ReplyMessagePntr
, ErrorMessage
);
2973 if (ErrorCode
== B_OK
&&
2974 ReplyMessagePntr
->FindFloat (g_ResultName
, &TempFloat
) == B_OK
)
2976 TextFile
.WriteAttr (g_AttributeNameSpamRatio
, B_FLOAT_TYPE
,
2977 0 /* offset */, &TempFloat
, sizeof (TempFloat
));
2978 /* Don't know the spam cutoff ratio, that's in the e-mail filter, so just
2979 blindly remove the prefix, which would have the wrong percentage. */
2980 RemoveSpamPrefixFromSubjectAttribute (&TextFile
);
2987 /* Evaluate a given file or memory buffer (a BPositionIO handles both cases)
2988 for spaminess. The output is added to the ReplyMessagePntr message, with the
2989 probability ratio stored in "result" (0.0 means genuine and 1.0 means spam).
2990 It also adds the most significant words (used in the ratio calculation) to the
2991 array "words" and the associated per-word probability ratios in "ratios". If
2992 it fails, an error code is returned and an error message written to the
2993 ErrorMessage string (which is at least MAX_PATH + 1024 bytes long).
2994 OptionalFileName is only used in the error message.
2996 The math used for combining the individual word probabilities in my method is
2997 based on Gary Robinson's method (formerly it was a variation of Paul Graham's
2998 method) or the Chi-Squared method. It's input is the database of words that
2999 has a count of the number of spam and number of genuine messages each word
3000 appears in (doesn't matter if it appears more than once in a message, it still
3003 The spam word count is divided the by the total number of spam e-mail messages
3004 in the database to get the probability of spam and probability of genuineness
3005 is similarly computed for a particular word. The spam probability is divided
3006 by the sum of the spam and genuine probabilities to get the Raw Spam Ratio for
3007 the word. It's nearer to 0.0 for genuine and nearer to 1.0 for spam, and can
3008 be exactly zero or one too.
3010 To avoid multiplying later results by zero, and to compensate for a lack of
3011 data points, the Raw Spam Ratio is adjusted towards the 0.5 halfway point. The
3012 0.5 is combined with the raw spam ratio, with a weight of 0.45 (determined to
3013 be a good value by the "spambayes" mailing list tests) messages applied to the
3014 half way point and a weight of the number of spam + genuine messages applied to
3015 the raw spam ratio. This gives you the compensated spam ratio for the word.
3017 The top N (150 was good in the spambayes tests) extreme words are selected by
3018 the distance of each word's compensated spam ratio from 0.5. Then the ratios
3019 of the words are combined.
3021 The Gary Robinson combining (scoring) method gets one value from the Nth root
3022 of the product of all the word ratios. The other is the Nth root of the
3023 product of (1 - ratio) for all the words. The final result is the first value
3024 divided by the sum of the two values. The Nth root helps spread the resulting
3025 range of values more evenly between 0.0 and 1.0, otherwise the values all clump
3026 together at 0 or 1. Also you can think of the Nth root as a kind of average
3027 for products; it's like a generic word probability which when multiplied by
3028 itself N times gives you the same result as the N separate actual word
3029 probabilities multiplied together.
3031 The Chi-Squared combining (scoring) method assumes that the spam word
3032 probabilities are uniformly distributed and computes an error measurement
3033 (called chi squared - see http://bmj.com/collections/statsbk/8.shtml for a good
3034 tutorial) and then sees how likely that error value would be observed in
3035 practice. If it's rare to observe, then the words are likely not just randomly
3036 occuring and it's spammy. The same is done for genuine words. The two
3037 resulting unlikelynesses are compared to see which is more unlikely, if neither
3038 is, then the method says it can't decide. The SpamBayes notes (see the
3039 classifier.py file in CVS in http://sourceforge.net/projects/spambayes) say:
3041 "Across vectors of length n, containing random uniformly-distributed
3042 probabilities, -2*sum(ln(p_i)) follows the chi-squared distribution with 2*n
3043 degrees of freedom. This has been proven (in some appropriate sense) to be the
3044 most sensitive possible test for rejecting the hypothesis that a vector of
3045 probabilities is uniformly distributed. Gary Robinson's original scheme was
3046 monotonic *with* this test, but skipped the details. Turns out that getting
3047 closer to the theoretical roots gives a much sharper classification, with a
3048 very small (in # of msgs), but also very broad (in range of scores), "middle
3049 ground", where most of the mistakes live. In particular, this scheme seems
3050 immune to all forms of "cancellation disease": if there are many strong ham
3051 *and* spam clues, this reliably scores close to 0.5. Most other schemes are
3052 extremely certain then -- and often wrong."
3054 I did a test with 448 example genuine messages including personal mail (some
3055 with HTML attachments) and mailing lists, and 267 spam messages for 27471 words
3056 total. Test messages were more recent messages in the same groups. Out of 100
3057 test genuine messages, with Gary Robinson (0.56 cutoff limit), 1 (1%) was
3058 falsely identified as spam and 8 of 73 (11%) spam messages were incorrectly
3059 classified as genuine. With my variation of Paul Graham's scheme (0.90 cutoff)
3060 I got 6 of 100 (6%) genuine messages incorrectly marked as spam and 2 of 73
3061 (3%) spam messages were incorrectly classified as genuine. Pretty close, but
3062 Robinson's values are more evenly spread out so you can tell just how spammy it
3063 is by looking at the number. */
3065 struct WordAndRatioStruct
3067 double probabilityRatio
; /* Actually the compensated ratio. */
3068 const string
*wordPntr
;
3070 bool operator() ( /* Our less-than comparison function for sorting. */
3071 const WordAndRatioStruct
&ItemA
,
3072 const WordAndRatioStruct
&ItemB
) const
3075 (fabs (ItemA
.probabilityRatio
- 0.5) <
3076 fabs (ItemB
.probabilityRatio
- 0.5));
3080 status_t
ABSApp::EvaluatePositionIO (
3081 BPositionIO
*PositionIOPntr
,
3082 const char *OptionalFileName
,
3083 BMessage
*ReplyMessagePntr
,
3086 StatisticsMap::iterator DataEndIter
;
3087 StatisticsMap::iterator DataIter
;
3089 double GenuineProbability
;
3090 uint32 GenuineSpamSum
;
3093 WordAndRatioStruct
/* Data type stored in the queue */,
3094 vector
<WordAndRatioStruct
> /* Underlying container */,
3095 WordAndRatioStruct
/* Function for comparing elements */>
3097 double ProductGenuine
;
3098 double ProductLogGenuine
;
3099 double ProductLogSpam
;
3101 double RawProbabilityRatio
;
3103 double SpamProbability
;
3104 StatisticsPointer StatisticsPntr
;
3106 double TotalGenuine
;
3108 WordAndRatioStruct WordAndRatio
;
3109 set
<string
>::iterator WordEndIter
;
3110 set
<string
>::iterator WordIter
;
3111 const WordAndRatioStruct
*WordRatioPntr
;
3112 set
<string
> WordSet
;
3114 /* Get the list of unique words in the file / memory buffer. */
3116 ErrorCode
= GetWordsFromPositionIO (PositionIOPntr
, OptionalFileName
,
3117 WordSet
, ErrorMessage
);
3118 if (ErrorCode
!= B_OK
)
3121 /* Prepare a few variables. Mostly these are stored double values of some of
3122 the numbers involved (to avoid the overhead of multiple conversions from
3123 integer to double), with extra precautions to avoid divide by zero. */
3125 if (m_TotalGenuineMessages
<= 0)
3128 TotalGenuine
= m_TotalGenuineMessages
;
3130 if (m_TotalSpamMessages
<= 0)
3133 TotalSpam
= m_TotalSpamMessages
;
3135 /* Look up the words in the database and calculate their compensated spam
3136 ratio. The results are stored in a priority queue so that we can later find
3137 the top g_MaxInterestingWords for doing the actual determination. */
3139 WordEndIter
= WordSet
.end ();
3140 DataEndIter
= m_WordMap
.end ();
3141 for (WordIter
= WordSet
.begin (); WordIter
!= WordEndIter
; WordIter
++)
3143 WordAndRatio
.wordPntr
= &(*WordIter
);
3145 if ((DataIter
= m_WordMap
.find (*WordIter
)) != DataEndIter
)
3147 StatisticsPntr
= &DataIter
->second
;
3149 /* Calculate the probability the word is spam and the probability it is
3150 genuine. Then the raw probability ratio. */
3152 SpamProbability
= StatisticsPntr
->spamCount
/ TotalSpam
;
3153 GenuineProbability
= StatisticsPntr
->genuineCount
/ TotalGenuine
;
3155 if (SpamProbability
+ GenuineProbability
> 0)
3156 RawProbabilityRatio
=
3157 SpamProbability
/ (SpamProbability
+ GenuineProbability
);
3158 else /* Word with zero statistics, perhaps due to reclassification. */
3159 RawProbabilityRatio
= 0.5;
3161 /* The compensated ratio leans towards 0.5 (g_RobinsonX) more for fewer
3162 data points, with a weight of 0.45 (g_RobinsonS). */
3165 StatisticsPntr
->spamCount
+ StatisticsPntr
->genuineCount
;
3167 WordAndRatio
.probabilityRatio
=
3168 (g_RobinsonS
* g_RobinsonX
+ GenuineSpamSum
* RawProbabilityRatio
) /
3169 (g_RobinsonS
+ GenuineSpamSum
);
3171 else /* Unknown word. With N=0, compensated ratio equation is RobinsonX. */
3172 WordAndRatio
.probabilityRatio
= g_RobinsonX
;
3174 PriorityQueue
.push (WordAndRatio
);
3177 /* Compute the combined probability (multiply them together) of the top few
3178 words. To avoid numeric underflow (doubles can only get as small as 1E-300),
3179 logarithms are also used. But avoid the logarithms (sum of logs of numbers
3180 is the same as the product of numbers) as much as possible due to reduced
3181 accuracy and slowness. */
3183 ProductGenuine
= 1.0;
3184 ProductLogGenuine
= 0.0;
3186 ProductLogSpam
= 0.0;
3188 i
< g_MaxInterestingWords
&& !PriorityQueue
.empty();
3189 i
++, PriorityQueue
.pop())
3191 WordRatioPntr
= &PriorityQueue
.top();
3192 ProductSpam
*= WordRatioPntr
->probabilityRatio
;
3193 ProductGenuine
*= 1.0 - WordRatioPntr
->probabilityRatio
;
3195 /* Check for the numbers getting dangerously small, close to underflowing.
3196 If they are, move the value into the logarithm storage part. */
3198 if (ProductSpam
< m_SmallestUseableDouble
)
3200 ProductLogSpam
+= log (ProductSpam
);
3204 if (ProductGenuine
< m_SmallestUseableDouble
)
3206 ProductLogGenuine
+= log (ProductGenuine
);
3207 ProductGenuine
= 1.0;
3210 ReplyMessagePntr
->AddString ("words", WordRatioPntr
->wordPntr
->c_str ());
3211 ReplyMessagePntr
->AddFloat ("ratios", WordRatioPntr
->probabilityRatio
);
3214 /* Get the resulting log of the complete products. */
3218 ProductLogSpam
+= log (ProductSpam
);
3219 ProductLogGenuine
+= log (ProductGenuine
);
3222 if (m_ScoringMode
== SM_ROBINSON
)
3224 /* Apply Gary Robinson's scoring method where we take the Nth root of the
3225 products. This is easiest in logarithm form. */
3229 ProductSpam
= exp (ProductLogSpam
/ i
);
3230 ProductGenuine
= exp (ProductLogGenuine
/ i
);
3231 ResultRatio
= ProductSpam
/ (ProductGenuine
+ ProductSpam
);
3233 else /* Somehow got no words! */
3234 ResultRatio
= g_RobinsonX
;
3236 else if (m_ScoringMode
== SM_CHISQUARED
)
3238 /* From the SpamBayes notes: "We compute two chi-squared statistics, one
3239 for ham and one for spam. The sum-of-the-logs business is more sensitive
3240 to probs near 0 than to probs near 1, so the spam measure uses 1-p (so that
3241 high-spamprob words have greatest effect), and the ham measure uses p
3242 directly (so that lo-spamprob words have greatest effect)." That means we
3243 just reversed the meaning of the previously calculated spam and genuine
3244 products! Oh well. */
3246 TempDouble
= ProductLogSpam
;
3247 ProductLogSpam
= ProductLogGenuine
;
3248 ProductLogGenuine
= TempDouble
;
3253 1.0 - ChiSquaredProbability (-2.0 * ProductLogSpam
, 2 * i
);
3255 1.0 - ChiSquaredProbability (-2.0 * ProductLogGenuine
, 2 * i
);
3257 /* The SpamBayes notes say: "How to combine these into a single spam
3258 score? We originally used (S-H)/(S+H) scaled into [0., 1.], which equals
3259 S/(S+H). A systematic problem is that we could end up being near-certain
3260 a thing was (for example) spam, even if S was small, provided that H was
3261 much smaller. Rob Hooft stared at these problems and invented the
3262 measure we use now, the simpler S-H, scaled into [0., 1.]." */
3264 ResultRatio
= (ProductSpam
- ProductGenuine
+ 1.0) / 2.0;
3266 else /* No words to analyse. */
3269 else /* Unknown scoring mode. */
3271 strcpy (ErrorMessage
, "Unknown scoring mode specified in settings");
3275 ReplyMessagePntr
->AddFloat (g_ResultName
, ResultRatio
);
3280 /* Just evaluate the given string as being spam text. */
3282 status_t
ABSApp::EvaluateString (
3283 const char *BufferPntr
,
3285 BMessage
*ReplyMessagePntr
,
3288 BMemoryIO
MemoryIO (BufferPntr
, BufferSize
);
3290 return EvaluatePositionIO (&MemoryIO
, "Memory Buffer",
3291 ReplyMessagePntr
, ErrorMessage
);
3295 /* Tell other programs about the scripting commands we support. Try this
3296 command: "hey application/x-vnd.agmsmith.spamdbm getsuites" to
3297 see it in action (this program has to be already running for it to work). */
3299 status_t
ABSApp::GetSupportedSuites (BMessage
*MessagePntr
)
3301 BPropertyInfo
TempPropInfo (g_ScriptingPropertyList
);
3303 MessagePntr
->AddString ("suites", "suite/x-vnd.agmsmith.spamdbm");
3304 MessagePntr
->AddFlat ("messages", &TempPropInfo
);
3305 return BApplication::GetSupportedSuites (MessagePntr
);
3309 /* Add all the words in the given file or memory buffer to the supplied set.
3310 The file name is only there for error messages, it assumes you have already
3311 opened the PositionIO to the right file. If things go wrong, a non-zero error
3312 code will be returned and an explanation written to ErrorMessage (assumed to be
3313 at least PATH_MAX + 1024 bytes long). */
3315 status_t
ABSApp::GetWordsFromPositionIO (
3316 BPositionIO
*PositionIOPntr
,
3317 const char *OptionalFileName
,
3318 set
<string
> &WordSet
,
3323 if (m_TokenizeMode
== TM_WHOLE
)
3324 ErrorCode
= TokenizeWhole (PositionIOPntr
, OptionalFileName
,
3325 WordSet
, ErrorMessage
);
3327 ErrorCode
= TokenizeParts (PositionIOPntr
, OptionalFileName
,
3328 WordSet
, ErrorMessage
);
3330 if (ErrorCode
== B_OK
&& WordSet
.empty ())
3332 /* ENOMSG usually means no message found in queue, but I'm using it to show
3333 no words, a good indicator of spam which is pure HTML. */
3335 sprintf (ErrorMessage
, "No words were found in \"%s\"", OptionalFileName
);
3343 /* Set up indices for attributes MAIL:classification (string) and
3344 MAIL:ratio_spam (float) on all mounted disk volumes that support queries. Also
3345 tell the system to make those attributes visible to the user (so they can see
3346 them in Tracker) and associate them with e-mail messages. Also set up the
3347 database file MIME type (provide a description and associate it with this
3348 program so that it picks up the right icon). And register the names for our
3351 status_t
ABSApp::InstallThings (char *ErrorMessage
)
3355 status_t ErrorCode
= B_OK
;
3358 int32 iClassification
;
3361 index_info IndexInfo
;
3363 BMessage Parameters
;
3364 const char *StringPntr
;
3368 /* Iterate through all mounted devices and try to make the indices on each
3369 one. Don't bother if the index exists or the device doesn't support indices
3370 (actually queries). */
3373 while ((DeviceID
= next_dev (&Cookie
)) >= 0)
3375 if (!fs_stat_dev (DeviceID
, &FSInfo
) && (FSInfo
.flags
& B_FS_HAS_QUERY
))
3377 if (fs_stat_index (DeviceID
, g_AttributeNameClassification
, &IndexInfo
)
3378 && errno
== B_ENTRY_NOT_FOUND
)
3380 if (fs_create_index (DeviceID
, g_AttributeNameClassification
,
3381 B_STRING_TYPE
, 0 /* flags */))
3384 sprintf (ErrorMessage
, "Unable to make string index %s on "
3385 "volume #%d, volume name \"%s\", file system type \"%s\", "
3386 "on device \"%s\"", g_AttributeNameClassification
,
3387 (int) DeviceID
, FSInfo
.volume_name
, FSInfo
.fsh_name
,
3388 FSInfo
.device_name
);
3392 if (fs_stat_index (DeviceID
, g_AttributeNameSpamRatio
,
3393 &IndexInfo
) && errno
== B_ENTRY_NOT_FOUND
)
3395 if (fs_create_index (DeviceID
, g_AttributeNameSpamRatio
,
3396 B_FLOAT_TYPE
, 0 /* flags */))
3399 sprintf (ErrorMessage
, "Unable to make float index %s on "
3400 "volume #%d, volume name \"%s\", file system type \"%s\", "
3401 "on device \"%s\"", g_AttributeNameSpamRatio
,
3402 (int) DeviceID
, FSInfo
.volume_name
, FSInfo
.fsh_name
,
3403 FSInfo
.device_name
);
3408 if (ErrorCode
!= B_OK
)
3411 /* Set up the MIME types for the classification attributes, associate them
3412 with e-mail and make them visible to the user (but not editable). First need
3413 to get the existing MIME settings, then add ours to them (otherwise the
3414 existing ones get wiped out). */
3416 ErrorCode
= MimeType
.SetTo ("text/x-email");
3417 if (ErrorCode
!= B_OK
|| !MimeType
.IsInstalled ())
3419 sprintf (ErrorMessage
, "No e-mail MIME type (%s) in the system, can't "
3420 "update it to add our special attributes, and without e-mail this "
3421 "program is useless!", MimeType
.Type ());
3422 if (ErrorCode
== B_OK
)
3427 ErrorCode
= MimeType
.GetAttrInfo (&Parameters
);
3428 if (ErrorCode
!= B_OK
)
3430 sprintf (ErrorMessage
, "Unable to retrieve list of attributes "
3431 "associated with e-mail messages in the MIME database");
3435 for (i
= 0, iClassification
= -1, iProbability
= -1;
3436 i
< 1000 && (iClassification
< 0 || iProbability
< 0);
3439 ErrorCode
= Parameters
.FindString ("attr:name", i
, &StringPntr
);
3440 if (ErrorCode
!= B_OK
)
3441 break; /* Reached the end of the attributes. */
3442 if (strcmp (StringPntr
, g_AttributeNameClassification
) == 0)
3443 iClassification
= i
;
3444 else if (strcmp (StringPntr
, g_AttributeNameSpamRatio
) == 0)
3448 /* Add extra default settings for those programs which previously didn't
3449 update the MIME database with all the attributes that exist (so our new
3450 additions don't show up at the wrong index). */
3452 i
--; /* Set i to index of last valid attribute. */
3454 for (j
= 0; j
<= i
; j
++)
3456 if (Parameters
.FindString ("attr:public_name", j
, &StringPntr
) ==
3459 if (Parameters
.FindString ("attr:name", j
, &StringPntr
) != B_OK
)
3460 StringPntr
= "None!";
3461 Parameters
.AddString ("attr:public_name", StringPntr
);
3465 while (Parameters
.FindInt32 ("attr:type", i
, &TempInt32
) == B_BAD_INDEX
)
3466 Parameters
.AddInt32 ("attr:type", B_STRING_TYPE
);
3468 while (Parameters
.FindBool ("attr:viewable", i
, &TempBool
) == B_BAD_INDEX
)
3469 Parameters
.AddBool ("attr:viewable", true);
3471 while (Parameters
.FindBool ("attr:editable", i
, &TempBool
) == B_BAD_INDEX
)
3472 Parameters
.AddBool ("attr:editable", false);
3474 while (Parameters
.FindInt32 ("attr:width", i
, &TempInt32
) == B_BAD_INDEX
)
3475 Parameters
.AddInt32 ("attr:width", 60);
3477 while (Parameters
.FindInt32 ("attr:alignment", i
, &TempInt32
) == B_BAD_INDEX
)
3478 Parameters
.AddInt32 ("attr:alignment", B_ALIGN_LEFT
);
3480 while (Parameters
.FindBool ("attr:extra", i
, &TempBool
) == B_BAD_INDEX
)
3481 Parameters
.AddBool ("attr:extra", false);
3483 /* Add our new attributes to e-mail related things, if not already there. */
3485 if (iClassification
< 0)
3487 Parameters
.AddString ("attr:name", g_AttributeNameClassification
);
3488 Parameters
.AddString ("attr:public_name", "Classification Group");
3489 Parameters
.AddInt32 ("attr:type", B_STRING_TYPE
);
3490 Parameters
.AddBool ("attr:viewable", true);
3491 Parameters
.AddBool ("attr:editable", false);
3492 Parameters
.AddInt32 ("attr:width", 45);
3493 Parameters
.AddInt32 ("attr:alignment", B_ALIGN_LEFT
);
3494 Parameters
.AddBool ("attr:extra", false);
3497 if (iProbability
< 0)
3499 Parameters
.AddString ("attr:name", g_AttributeNameSpamRatio
);
3500 Parameters
.AddString ("attr:public_name", "Spam/Genuine Estimate");
3501 Parameters
.AddInt32 ("attr:type", B_FLOAT_TYPE
);
3502 Parameters
.AddBool ("attr:viewable", true);
3503 Parameters
.AddBool ("attr:editable", false);
3504 Parameters
.AddInt32 ("attr:width", 50);
3505 Parameters
.AddInt32 ("attr:alignment", B_ALIGN_LEFT
);
3506 Parameters
.AddBool ("attr:extra", false);
3509 if (iClassification
< 0 || iProbability
< 0)
3511 ErrorCode
= MimeType
.SetAttrInfo (&Parameters
);
3512 if (ErrorCode
!= B_OK
)
3514 sprintf (ErrorMessage
, "Unable to associate the classification "
3515 "attributes with e-mail messages in the MIME database");
3520 /* Set up the MIME type for the database file. */
3522 sprintf (ErrorMessage
, "Problems with setting up MIME type (%s) for "
3523 "the database files", g_ABSDatabaseFileMIMEType
); /* A generic message. */
3525 ErrorCode
= MimeType
.SetTo (g_ABSDatabaseFileMIMEType
);
3526 if (ErrorCode
!= B_OK
)
3530 ErrorCode
= MimeType
.Install ();
3531 if (ErrorCode
!= B_OK
)
3533 sprintf (ErrorMessage
, "Failed to install MIME type (%s) in the system",
3538 MimeType
.SetShortDescription ("Spam Database");
3539 MimeType
.SetLongDescription ("Bayesian Statistical Database for "
3540 "Classifying Junk E-Mail");
3541 sprintf (ErrorMessage
, "1.0 ('%s')", g_DatabaseRecognitionString
);
3542 MimeType
.SetSnifferRule (ErrorMessage
);
3543 MimeType
.SetPreferredApp (g_ABSAppSignature
);
3545 /* Set up the names of the sound effects. Later on the user can associate
3546 sound files with the names by using the Sounds preferences panel or the
3547 installsound command. The MDR add-on filter will trigger these sounds. */
3549 add_system_beep_event (g_BeepGenuine
);
3550 add_system_beep_event (g_BeepSpam
);
3551 add_system_beep_event (g_BeepUncertain
);
3557 /* Load the database if it hasn't been loaded yet. Otherwise do nothing. */
3559 status_t
ABSApp::LoadDatabaseIfNeeded (char *ErrorMessage
)
3561 if (m_WordMap
.empty ())
3562 return LoadSaveDatabase (true /* DoLoad */, ErrorMessage
);
3568 /* Either load the database of spam words (DoLoad is TRUE) from the file
3569 specified in the settings, or write (DoLoad is FALSE) the database to it. If
3570 it doesn't exist (and its parent directories do exist) then it will be created
3571 when saving. If it doesn't exist when loading, the in-memory database will be
3572 set to an empty one and an error will be returned with an explanation put into
3573 ErrorMessage (should be big enough for a path name and a couple of lines of
3576 The database file format is a UTF-8 text file (well, there could be some
3577 latin-1 characters and other junk in there - it just copies the bytes from the
3578 e-mail messages directly), with tab characters to separate fields (so that you
3579 can also load it into a spreadsheet). The first line identifies the overall
3580 file type. The second lists pairs of classifications plus the number of
3581 messages in each class. Currently it is just Genuine and Spam, but for future
3582 compatability, that could be followed by more classification pairs. The
3583 remaining lines each contain a word, the date it was last updated (actually
3584 it's the number of messages in the database when the word was added, smaller
3585 numbers mean it was updated longer ago), the genuine count and the spam count.
3588 status_t
ABSApp::LoadSaveDatabase (bool DoLoad
, char *ErrorMessage
)
3591 FILE *DatabaseFile
= NULL
;
3593 BNodeInfo DatabaseNodeInfo
;
3594 StatisticsMap::iterator DataIter
;
3595 StatisticsMap::iterator EndIter
;
3598 pair
<StatisticsMap::iterator
,bool> InsertResult
;
3599 char LineString
[10240];
3600 StatisticsRecord Statistics
;
3601 const char *StringPntr
;
3603 const char *WordPntr
;
3607 MakeDatabaseEmpty ();
3608 m_DatabaseHasChanged
= false; /* In case of early error exit. */
3610 else /* Saving the database, backup the old version on disk. */
3612 ErrorCode
= MakeBackup (ErrorMessage
);
3613 if (ErrorCode
!= B_OK
) /* Usually because the directory isn't there. */
3617 DatabaseFile
= fopen (m_DatabaseFileName
.String (), DoLoad
? "rb" : "wb");
3618 if (DatabaseFile
== NULL
)
3621 sprintf (ErrorMessage
, "Can't open database file \"%s\" for %s",
3622 m_DatabaseFileName
.String (), DoLoad
? "reading" : "writing");
3626 /* Process the first line, which identifies the file. */
3630 sprintf (ErrorMessage
, "Can't read first line of database file \"%s\", "
3631 "expected it to start with \"%s\"",
3632 m_DatabaseFileName
.String (), g_DatabaseRecognitionString
);
3635 if (fgets (LineString
, sizeof (LineString
), DatabaseFile
) == NULL
)
3637 if (strncmp (LineString
, g_DatabaseRecognitionString
,
3638 strlen (g_DatabaseRecognitionString
)) != 0)
3643 CurrentTime
= time (NULL
);
3644 if (fprintf (DatabaseFile
, "%s V1 (word, age, genuine count, spam count)\t"
3645 "Written by SpamDBM $Revision: 30630 $\t"
3646 "Compiled on " __DATE__
" at " __TIME__
"\tThis file saved on %s",
3647 g_DatabaseRecognitionString
, ctime (&CurrentTime
)) <= 0)
3650 sprintf (ErrorMessage
, "Problems when writing to database file \"%s\"",
3651 m_DatabaseFileName
.String ());
3656 /* The second line lists the different classifications. We just check to see
3657 that the first two are Genuine and Spam. If there are others, they'll be
3658 ignored and lost when the database is saved. */
3662 sprintf (ErrorMessage
, "Can't read second line of database file \"%s\", "
3663 "expected it to list classifications %s and %s along with their totals",
3664 m_DatabaseFileName
.String (), g_ClassifiedGenuine
, g_ClassifiedSpam
);
3665 ErrorCode
= B_BAD_VALUE
;
3667 if (fgets (LineString
, sizeof (LineString
), DatabaseFile
) == NULL
)
3669 i
= strlen (LineString
);
3670 if (i
> 0 && LineString
[i
-1] == '\n')
3671 LineString
[i
-1] = 0; /* Remove trailing line feed character. */
3673 /* Look for the title word at the start of the line. */
3675 TabPntr
= LineString
;
3676 for (StringPntr
= TabPntr
; *TabPntr
!= 0 && *TabPntr
!= '\t'; TabPntr
++)
3677 ; if (*TabPntr
== '\t') *TabPntr
++ = 0; /* Stringify up to next tab. */
3679 if (strncmp (StringPntr
, "Classifications", 15) != 0)
3682 /* Look for the Genuine class and count. */
3684 for (StringPntr
= TabPntr
; *TabPntr
!= 0 && *TabPntr
!= '\t'; TabPntr
++)
3685 ; if (*TabPntr
== '\t') *TabPntr
++ = 0; /* Stringify up to next tab. */
3687 if (strcmp (StringPntr
, g_ClassifiedGenuine
) != 0)
3690 for (StringPntr
= TabPntr
; *TabPntr
!= 0 && *TabPntr
!= '\t'; TabPntr
++)
3691 ; if (*TabPntr
== '\t') *TabPntr
++ = 0; /* Stringify up to next tab. */
3693 m_TotalGenuineMessages
= atoll (StringPntr
);
3695 /* Look for the Spam class and count. */
3697 for (StringPntr
= TabPntr
; *TabPntr
!= 0 && *TabPntr
!= '\t'; TabPntr
++)
3698 ; if (*TabPntr
== '\t') *TabPntr
++ = 0; /* Stringify up to next tab. */
3700 if (strcmp (StringPntr
, g_ClassifiedSpam
) != 0)
3703 for (StringPntr
= TabPntr
; *TabPntr
!= 0 && *TabPntr
!= '\t'; TabPntr
++)
3704 ; if (*TabPntr
== '\t') *TabPntr
++ = 0; /* Stringify up to next tab. */
3706 m_TotalSpamMessages
= atoll (StringPntr
);
3710 fprintf (DatabaseFile
,
3711 "Classifications and total messages:\t%s\t%lu\t%s\t%lu\n",
3712 g_ClassifiedGenuine
, m_TotalGenuineMessages
,
3713 g_ClassifiedSpam
, m_TotalSpamMessages
);
3716 /* The remainder of the file is the list of words and statistics. Each line
3717 has a word, a tab, the time when the word was last changed in the database
3718 (sequence number of message addition, starts at 0 and goes up by one for each
3719 message added to the database), a tab then the number of messages in the
3720 first class (genuine) that had that word, then a tab, then the number of
3721 messages in the second class (spam) with that word, and so on. */
3725 while (!feof (DatabaseFile
))
3727 if (fgets (LineString
, sizeof (LineString
), DatabaseFile
) == NULL
)
3730 if (feof (DatabaseFile
))
3732 if (ErrorCode
== B_OK
)
3734 sprintf (ErrorMessage
, "Error while reading words and statistics "
3735 "from database file \"%s\"", m_DatabaseFileName
.String ());
3739 i
= strlen (LineString
);
3740 if (i
> 0 && LineString
[i
-1] == '\n')
3741 LineString
[i
-1] = 0; /* Remove trailing line feed character. */
3743 /* Get the word at the start of the line, save in WordPntr. */
3745 TabPntr
= LineString
;
3746 for (WordPntr
= TabPntr
; *TabPntr
!= 0 && *TabPntr
!= '\t'; TabPntr
++)
3747 ; if (*TabPntr
== '\t') *TabPntr
++ = 0; /* Stringify up to next tab. */
3749 /* Get the date stamp. Actually a sequence number, not a date. */
3751 for (StringPntr
= TabPntr
; *TabPntr
!= 0 && *TabPntr
!= '\t'; TabPntr
++)
3752 ; if (*TabPntr
== '\t') *TabPntr
++ = 0; /* Stringify up to next tab. */
3754 Statistics
.age
= atoll (StringPntr
);
3756 /* Get the Genuine count. */
3758 for (StringPntr
= TabPntr
; *TabPntr
!= 0 && *TabPntr
!= '\t'; TabPntr
++)
3759 ; if (*TabPntr
== '\t') *TabPntr
++ = 0; /* Stringify up to next tab. */
3761 Statistics
.genuineCount
= atoll (StringPntr
);
3763 /* Get the Spam count. */
3765 for (StringPntr
= TabPntr
; *TabPntr
!= 0 && *TabPntr
!= '\t'; TabPntr
++)
3766 ; if (*TabPntr
== '\t') *TabPntr
++ = 0; /* Stringify up to next tab. */
3768 Statistics
.spamCount
= atoll (StringPntr
);
3770 /* Ignore empty words, totally unused words and ones which are too long
3771 (avoids lots of length checking everywhere). */
3773 if (WordPntr
[0] == 0 || strlen (WordPntr
) > g_MaxWordLength
||
3774 (Statistics
.genuineCount
<= 0 && Statistics
.spamCount
<= 0))
3775 continue; /* Ignore this line of text, start on next one. */
3777 /* Add the combination to the database. */
3779 InsertResult
= m_WordMap
.insert (
3780 StatisticsMap::value_type (WordPntr
, Statistics
));
3781 if (InsertResult
.second
== false)
3783 ErrorCode
= B_BAD_VALUE
;
3784 sprintf (ErrorMessage
, "Error while inserting word \"%s\" from "
3785 "database \"%s\", perhaps it is a duplicate",
3786 WordPntr
, m_DatabaseFileName
.String ());
3791 /* And the hunt for the oldest word. */
3793 if (Statistics
.age
< m_OldestAge
)
3794 m_OldestAge
= Statistics
.age
;
3797 else /* Saving, dump all words and statistics to the file. */
3799 EndIter
= m_WordMap
.end ();
3800 for (DataIter
= m_WordMap
.begin (); DataIter
!= EndIter
; DataIter
++)
3802 if (fprintf (DatabaseFile
, "%s\t%lu\t%lu\t%lu\n",
3803 DataIter
->first
.c_str (), DataIter
->second
.age
,
3804 DataIter
->second
.genuineCount
, DataIter
->second
.spamCount
) <= 0)
3807 sprintf (ErrorMessage
, "Error while writing word \"%s\" to "
3809 DataIter
->first
.c_str(), m_DatabaseFileName
.String ());
3815 /* Set the file type so that the new file gets associated with this program,
3816 and picks up the right icon. */
3820 sprintf (ErrorMessage
, "Unable to set attributes (file type) of database "
3821 "file \"%s\"", m_DatabaseFileName
.String ());
3822 ErrorCode
= DatabaseNode
.SetTo (m_DatabaseFileName
.String ());
3823 if (ErrorCode
!= B_OK
)
3825 DatabaseNodeInfo
.SetTo (&DatabaseNode
);
3826 ErrorCode
= DatabaseNodeInfo
.SetType (g_ABSDatabaseFileMIMEType
);
3827 if (ErrorCode
!= B_OK
)
3832 m_DatabaseHasChanged
= false;
3836 if (DatabaseFile
!= NULL
)
3837 fclose (DatabaseFile
);
3842 /* Either load the settings (DoLoad is TRUE) from the configuration file or
3843 write them (DoLoad is FALSE) to it. The configuration file is a flattened
3844 BMessage containing the various program settings. If it doesn't exist (and its
3845 parent directories don't exist) then it will be created when saving. If it
3846 doesn't exist when loading, the settings will be set to default values. */
3848 status_t
ABSApp::LoadSaveSettings (bool DoLoad
)
3851 const char *NamePntr
;
3853 BDirectory SettingsDirectory
;
3855 const char *StringPntr
;
3858 char TempString
[PATH_MAX
+ 100];
3860 /* Preset things to default values if loading, in case of an error or it's an
3861 older version of the settings file which doesn't have every field defined. */
3866 /* Look for our settings directory. When saving we can try to create it. */
3868 ErrorCode
= SettingsDirectory
.SetTo (m_SettingsDirectoryPath
.Path ());
3869 if (ErrorCode
!= B_OK
)
3871 if (DoLoad
|| ErrorCode
!= B_ENTRY_NOT_FOUND
)
3873 sprintf (TempString
, "Can't find settings directory \"%s\"",
3874 m_SettingsDirectoryPath
.Path ());
3877 ErrorCode
= create_directory (m_SettingsDirectoryPath
.Path (), 0755);
3878 if (ErrorCode
== B_OK
)
3879 ErrorCode
= SettingsDirectory
.SetTo (m_SettingsDirectoryPath
.Path ());
3880 if (ErrorCode
!= B_OK
)
3882 sprintf (TempString
, "Can't create settings directory \"%s\"",
3883 m_SettingsDirectoryPath
.Path ());
3888 ErrorCode
= SettingsFile
.SetTo (&SettingsDirectory
, g_SettingsFileName
,
3889 DoLoad
? B_READ_ONLY
: B_READ_WRITE
| B_CREATE_FILE
| B_ERASE_FILE
);
3890 if (ErrorCode
!= B_OK
)
3892 sprintf (TempString
, "Can't open settings file \"%s\" in directory \"%s\" "
3893 "for %s", g_SettingsFileName
, m_SettingsDirectoryPath
.Path(),
3894 DoLoad
? "reading" : "writing");
3900 ErrorCode
= Settings
.Unflatten (&SettingsFile
);
3901 if (ErrorCode
!= 0 || Settings
.what
!= g_SettingsWhatCode
)
3903 sprintf (TempString
, "Corrupt data detected while reading settings "
3904 "file \"%s\" in directory \"%s\", will revert to defaults",
3905 g_SettingsFileName
, m_SettingsDirectoryPath
.Path());
3910 /* Transfer the settings between the BMessage and our various global
3911 variables. For loading, if the setting isn't present, leave it at the
3912 default value. Note that loading and saving are intermingled here to make
3913 code maintenance easier (less chance of forgetting to update it if load and
3914 save were separate functions). */
3916 ErrorCode
= B_OK
; /* So that saving settings can record an error. */
3918 NamePntr
= "DatabaseFileName";
3921 if (Settings
.FindString (NamePntr
, &StringPntr
) == B_OK
)
3922 m_DatabaseFileName
.SetTo (StringPntr
);
3924 else if (ErrorCode
== B_OK
)
3925 ErrorCode
= Settings
.AddString (NamePntr
, m_DatabaseFileName
);
3927 NamePntr
= "ServerMode";
3930 if (Settings
.FindBool (NamePntr
, &TempBool
) == B_OK
)
3931 g_ServerMode
= TempBool
;
3933 else if (ErrorCode
== B_OK
)
3934 ErrorCode
= Settings
.AddBool (NamePntr
, g_ServerMode
);
3936 NamePntr
= "IgnorePreviousClassification";
3939 if (Settings
.FindBool (NamePntr
, &TempBool
) == B_OK
)
3940 m_IgnorePreviousClassification
= TempBool
;
3942 else if (ErrorCode
== B_OK
)
3943 ErrorCode
= Settings
.AddBool (NamePntr
, m_IgnorePreviousClassification
);
3945 NamePntr
= "PurgeAge";
3948 if (Settings
.FindInt32 (NamePntr
, &TempInt32
) == B_OK
)
3949 m_PurgeAge
= TempInt32
;
3951 else if (ErrorCode
== B_OK
)
3952 ErrorCode
= Settings
.AddInt32 (NamePntr
, m_PurgeAge
);
3954 NamePntr
= "PurgePopularity";
3957 if (Settings
.FindInt32 (NamePntr
, &TempInt32
) == B_OK
)
3958 m_PurgePopularity
= TempInt32
;
3960 else if (ErrorCode
== B_OK
)
3961 ErrorCode
= Settings
.AddInt32 (NamePntr
, m_PurgePopularity
);
3963 NamePntr
= "ScoringMode";
3966 if (Settings
.FindInt32 (NamePntr
, &TempInt32
) == B_OK
)
3967 m_ScoringMode
= (ScoringModes
) TempInt32
;
3968 if (m_ScoringMode
< 0 || m_ScoringMode
>= SM_MAX
)
3969 m_ScoringMode
= (ScoringModes
) 0;
3971 else if (ErrorCode
== B_OK
)
3972 ErrorCode
= Settings
.AddInt32 (NamePntr
, m_ScoringMode
);
3974 NamePntr
= "TokenizeMode";
3977 if (Settings
.FindInt32 (NamePntr
, &TempInt32
) == B_OK
)
3978 m_TokenizeMode
= (TokenizeModes
) TempInt32
;
3979 if (m_TokenizeMode
< 0 || m_TokenizeMode
>= TM_MAX
)
3980 m_TokenizeMode
= (TokenizeModes
) 0;
3982 else if (ErrorCode
== B_OK
)
3983 ErrorCode
= Settings
.AddInt32 (NamePntr
, m_TokenizeMode
);
3985 if (ErrorCode
!= B_OK
)
3987 strcpy (TempString
, "Unable to stuff the program settings into a "
3988 "temporary BMessage, settings not saved");
3992 /* Save the settings BMessage to the settings file. */
3996 Settings
.what
= g_SettingsWhatCode
;
3997 ErrorCode
= Settings
.Flatten (&SettingsFile
);
4000 sprintf (TempString
, "Problems while writing settings file \"%s\" in "
4001 "directory \"%s\"", g_SettingsFileName
,
4002 m_SettingsDirectoryPath
.Path ());
4007 m_SettingsHaveChanged
= false;
4010 ErrorExit
: /* Error message in TempString, code in ErrorCode. */
4011 DisplayErrorMessage (TempString
, ErrorCode
, DoLoad
?
4012 "Loading Settings Error" : "Saving Settings Error");
4018 ABSApp::MessageReceived (BMessage
*MessagePntr
)
4020 const char *PropertyName
;
4021 struct property_info
*PropInfoPntr
;
4022 int32 SpecifierIndex
;
4023 int32 SpecifierKind
;
4024 BMessage SpecifierMessage
;
4026 /* See if it is a scripting message that applies to the database or one of
4027 the other operations this program supports. Pass on other scripting messages
4028 to the inherited parent MessageReceived function (they're usually scripting
4029 messages for the BApplication). */
4031 switch (MessagePntr
->what
)
4033 case B_GET_PROPERTY
:
4034 case B_SET_PROPERTY
:
4035 case B_COUNT_PROPERTIES
:
4036 case B_CREATE_PROPERTY
:
4037 case B_DELETE_PROPERTY
:
4038 case B_EXECUTE_PROPERTY
:
4039 if (MessagePntr
->GetCurrentSpecifier (&SpecifierIndex
, &SpecifierMessage
,
4040 &SpecifierKind
, &PropertyName
) == B_OK
&&
4041 SpecifierKind
== B_DIRECT_SPECIFIER
)
4043 for (PropInfoPntr
= g_ScriptingPropertyList
+ 0; true; PropInfoPntr
++)
4045 if (PropInfoPntr
->name
== 0)
4046 break; /* Ran out of commands. */
4048 if (PropInfoPntr
->commands
[0] == MessagePntr
->what
&&
4049 strcasecmp (PropInfoPntr
->name
, PropertyName
) == 0)
4051 ProcessScriptingMessage (MessagePntr
, PropInfoPntr
);
4059 /* Pass the unprocessed message to the inherited function, maybe it knows
4060 what to do. This includes replies to messages we sent ourselves. */
4062 BApplication::MessageReceived (MessagePntr
);
4066 /* Rename the existing database file to a backup file name, potentially
4067 replacing an older backup. If something goes wrong, returns an error code and
4068 puts an explanation in ErrorMessage. */
4070 status_t
ABSApp::MakeBackup (char *ErrorMessage
)
4075 char LeafName
[NAME_MAX
];
4076 char NewName
[PATH_MAX
+20];
4077 char OldName
[PATH_MAX
+20];
4079 ErrorCode
= Entry
.SetTo (m_DatabaseFileName
.String ());
4080 if (ErrorCode
!= B_OK
)
4082 sprintf (ErrorMessage
, "While making backup, failed to make a BEntry for "
4083 "\"%s\" (maybe the directory doesn't exist?)",
4084 m_DatabaseFileName
.String ());
4087 if (!Entry
.Exists ())
4088 return B_OK
; /* No existing file to worry about overwriting. */
4089 Entry
.GetName (LeafName
);
4091 /* Find the first hole (no file) where we will stop the renaming chain. */
4093 for (i
= 0; i
< g_MaxBackups
- 1; i
++)
4095 strcpy (OldName
, m_DatabaseFileName
.String ());
4096 sprintf (OldName
+ strlen (OldName
), g_BackupSuffix
, i
);
4097 Entry
.SetTo (OldName
);
4098 if (!Entry
.Exists ())
4102 /* Move the files down by one to fill in the hole in the name series. */
4104 for (i
--; i
>= 0; i
--)
4106 strcpy (OldName
, m_DatabaseFileName
.String ());
4107 sprintf (OldName
+ strlen (OldName
), g_BackupSuffix
, i
);
4108 Entry
.SetTo (OldName
);
4109 strcpy (NewName
, LeafName
);
4110 sprintf (NewName
+ strlen (NewName
), g_BackupSuffix
, i
+ 1);
4111 ErrorCode
= Entry
.Rename (NewName
, true /* clobber */);
4114 Entry
.SetTo (m_DatabaseFileName
.String ());
4115 strcpy (NewName
, LeafName
);
4116 sprintf (NewName
+ strlen (NewName
), g_BackupSuffix
, 0);
4117 ErrorCode
= Entry
.Rename (NewName
, true /* clobber */);
4118 if (ErrorCode
!= B_OK
)
4119 sprintf (ErrorMessage
, "While making backup, failed to rename "
4120 "\"%s\" to \"%s\"", m_DatabaseFileName
.String (), NewName
);
4127 ABSApp::MakeDatabaseEmpty ()
4129 m_WordMap
.clear (); /* Sets the map to empty, deallocating any old data. */
4131 m_TotalGenuineMessages
= 0;
4132 m_TotalSpamMessages
= 0;
4133 m_OldestAge
= (uint32
) -1 /* makes largest number possible */;
4137 /* Do what the scripting command says. A reply message will be sent back with
4138 several fields: "error" containing the numerical error code (0 for success),
4139 "CommandText" with a text representation of the command, "result" with the
4140 resulting data for a get or count command. If it isn't understood, then rather
4141 than a B_REPLY kind of message, it will be a B_MESSAGE_NOT_UNDERSTOOD message
4142 with an "error" number and an "message" string with a description. */
4145 ABSApp::ProcessScriptingMessage (
4146 BMessage
*MessagePntr
,
4147 struct property_info
*PropInfoPntr
)
4149 bool ArgumentBool
= false;
4150 bool ArgumentGotBool
= false;
4151 bool ArgumentGotInt32
= false;
4152 bool ArgumentGotString
= false;
4153 int32 ArgumentInt32
= 0;
4154 const char *ArgumentString
= NULL
;
4155 BString CommandText
;
4158 BMessage
ReplyMessage (B_MESSAGE_NOT_UNDERSTOOD
);
4159 ssize_t StringBufferSize
;
4160 BMessage TempBMessage
;
4162 char TempString
[PATH_MAX
+ 1024];
4164 if (g_QuitCountdown
>= 0 && !g_CommandLineMode
)
4166 g_QuitCountdown
= -1;
4167 cerr
<< "Quit countdown aborted due to a scripting command arriving.\n";
4170 if (g_BusyCursor
!= NULL
)
4171 SetCursor (g_BusyCursor
);
4173 ErrorCode
= MessagePntr
->FindData (g_DataName
, B_STRING_TYPE
,
4174 (const void **) &ArgumentString
, &StringBufferSize
);
4175 if (ErrorCode
== B_OK
)
4177 if (PropInfoPntr
->extra_data
!= PN_EVALUATE_STRING
&&
4178 PropInfoPntr
->extra_data
!= PN_SPAM_STRING
&&
4179 PropInfoPntr
->extra_data
!= PN_GENUINE_STRING
&&
4180 strlen (ArgumentString
) >= PATH_MAX
)
4182 sprintf (TempString
, "\"data\" string of a scripting message is too "
4183 "long, for SET %s action", PropInfoPntr
->name
);
4184 ErrorCode
= B_NAME_TOO_LONG
;
4187 ArgumentGotString
= true;
4189 else if (MessagePntr
->FindBool (g_DataName
, &ArgumentBool
) == B_OK
)
4190 ArgumentGotBool
= true;
4191 else if (MessagePntr
->FindInt32 (g_DataName
, &ArgumentInt32
) == B_OK
)
4192 ArgumentGotInt32
= true;
4194 /* Prepare a Human readable description of the scripting command. */
4196 switch (PropInfoPntr
->commands
[0])
4198 case B_SET_PROPERTY
:
4199 CommandText
.SetTo ("Set ");
4202 case B_GET_PROPERTY
:
4203 CommandText
.SetTo ("Get ");
4206 case B_COUNT_PROPERTIES
:
4207 CommandText
.SetTo ("Count ");
4210 case B_CREATE_PROPERTY
:
4211 CommandText
.SetTo ("Create ");
4214 case B_DELETE_PROPERTY
:
4215 CommandText
.SetTo ("Delete ");
4218 case B_EXECUTE_PROPERTY
:
4219 CommandText
.SetTo ("Execute ");
4223 sprintf (TempString
, "Bug: scripting command for \"%s\" has an unknown "
4224 "action code %d", PropInfoPntr
->name
,
4225 (int) PropInfoPntr
->commands
[0]);
4229 CommandText
.Append (PropInfoPntr
->name
);
4231 /* Add on the argument value to our readable command, if there is one. */
4233 if (ArgumentGotString
)
4235 CommandText
.Append (" \"");
4236 CommandText
.Append (ArgumentString
);
4237 CommandText
.Append ("\"");
4239 if (ArgumentGotBool
)
4240 CommandText
.Append (ArgumentBool
? " true" : " false");
4241 if (ArgumentGotInt32
)
4243 sprintf (TempString
, " %ld", ArgumentInt32
);
4244 CommandText
.Append (TempString
);
4247 /* From now on the scripting command has been recognized and is in the
4248 correct format, so it always returns a B_REPLY message. A readable version
4249 of the command is also added to make debugging easier. */
4251 ReplyMessage
.what
= B_REPLY
;
4252 ReplyMessage
.AddString ("CommandText", CommandText
);
4254 /* Now actually do the command. First prepare a default error message. */
4256 sprintf (TempString
, "Operation code %d (get, set, count, etc) "
4257 "unsupported for property %s",
4258 (int) PropInfoPntr
->commands
[0], PropInfoPntr
->name
);
4259 ErrorCode
= B_BAD_INDEX
;
4261 switch (PropInfoPntr
->extra_data
)
4263 case PN_DATABASE_FILE
:
4264 switch (PropInfoPntr
->commands
[0])
4266 case B_GET_PROPERTY
: /* Get the database file name. */
4267 ReplyMessage
.AddString (g_ResultName
, m_DatabaseFileName
);
4270 case B_SET_PROPERTY
: /* Set the database file name to a new one. */
4271 if (!ArgumentGotString
)
4273 ErrorCode
= B_BAD_TYPE
;
4274 sprintf (TempString
, "You need to specify a string for the "
4275 "SET %s command", PropInfoPntr
->name
);
4278 ErrorCode
= TempPath
.SetTo (ArgumentString
, NULL
/* leaf */,
4279 true /* normalize - verifies parent directories exist */);
4280 if (ErrorCode
!= B_OK
)
4282 sprintf (TempString
, "New database path name of \"%s\" is invalid "
4283 "(parent directories must exist)", ArgumentString
);
4286 if ((ErrorCode
= SaveDatabaseIfNeeded (TempString
)) != B_OK
)
4288 MakeDatabaseEmpty (); /* So that the new one gets loaded if used. */
4290 if (strlen (TempPath
.Leaf ()) > NAME_MAX
-strlen(g_BackupSuffix
)-1)
4292 /* Truncate the name so that there is enough space for the backup
4293 extension. Approximately. */
4294 strcpy (TempString
, TempPath
.Leaf ());
4295 TempString
[NAME_MAX
- strlen (g_BackupSuffix
) - 1] = 0;
4296 TempPath
.GetParent (&TempPath
);
4297 TempPath
.Append (TempString
);
4299 m_DatabaseFileName
.SetTo (TempPath
.Path ());
4300 m_SettingsHaveChanged
= true;
4303 case B_CREATE_PROPERTY
: /* Make a new database file plus more. */
4304 if ((ErrorCode
= CreateDatabaseFile (TempString
)) != B_OK
)
4308 case B_DELETE_PROPERTY
: /* Delete the file and its backups too. */
4309 if ((ErrorCode
= DeleteDatabaseFile (TempString
)) != B_OK
)
4313 case B_COUNT_PROPERTIES
:
4314 if ((ErrorCode
= LoadDatabaseIfNeeded (TempString
)) != B_OK
)
4316 ReplyMessage
.AddInt32 (g_ResultName
, m_WordCount
);
4319 default: /* Unknown operation code, error message already set. */
4325 case PN_SPAM_STRING
:
4327 case PN_GENUINE_STRING
:
4329 switch (PropInfoPntr
->commands
[0])
4331 case B_COUNT_PROPERTIES
: /* Get the number of spam/genuine messages. */
4332 if ((ErrorCode
= LoadDatabaseIfNeeded (TempString
)) != B_OK
)
4334 if (PropInfoPntr
->extra_data
== PN_SPAM
||
4335 PropInfoPntr
->extra_data
== PN_SPAM_STRING
)
4336 ReplyMessage
.AddInt32 (g_ResultName
, m_TotalSpamMessages
);
4338 ReplyMessage
.AddInt32 (g_ResultName
, m_TotalGenuineMessages
);
4341 case B_SET_PROPERTY
: /* Add spam/genuine/uncertain to database. */
4342 if (!ArgumentGotString
)
4344 ErrorCode
= B_BAD_TYPE
;
4345 sprintf (TempString
, "You need to specify a string (%s) "
4346 "for the SET %s command",
4347 (PropInfoPntr
->extra_data
== PN_GENUINE_STRING
||
4348 PropInfoPntr
->extra_data
== PN_SPAM_STRING
)
4349 ? "text of the message to be added"
4350 : "pathname of the file containing the text to be added",
4351 PropInfoPntr
->name
);
4354 if ((ErrorCode
= LoadDatabaseIfNeeded (TempString
)) != B_OK
)
4356 if (PropInfoPntr
->extra_data
== PN_GENUINE
||
4357 PropInfoPntr
->extra_data
== PN_SPAM
||
4358 PropInfoPntr
->extra_data
== PN_UNCERTAIN
)
4359 ErrorCode
= AddFileToDatabase (
4360 (PropInfoPntr
->extra_data
== PN_SPAM
) ? CL_SPAM
:
4361 ((PropInfoPntr
->extra_data
== PN_GENUINE
) ? CL_GENUINE
:
4363 ArgumentString
, TempString
/* ErrorMessage */);
4365 ErrorCode
= AddStringToDatabase (
4366 (PropInfoPntr
->extra_data
== PN_SPAM_STRING
) ?
4367 CL_SPAM
: CL_GENUINE
,
4368 ArgumentString
, TempString
/* ErrorMessage */);
4369 if (ErrorCode
!= B_OK
)
4373 default: /* Unknown operation code, error message already set. */
4378 case PN_IGNORE_PREVIOUS_CLASSIFICATION
:
4379 switch (PropInfoPntr
->commands
[0])
4381 case B_GET_PROPERTY
:
4382 ReplyMessage
.AddBool (g_ResultName
, m_IgnorePreviousClassification
);
4385 case B_SET_PROPERTY
:
4386 if (!ArgumentGotBool
)
4388 ErrorCode
= B_BAD_TYPE
;
4389 sprintf (TempString
, "You need to specify a boolean (true/yes, "
4390 "false/no) for the SET %s command", PropInfoPntr
->name
);
4393 m_IgnorePreviousClassification
= ArgumentBool
;
4394 m_SettingsHaveChanged
= true;
4397 default: /* Unknown operation code, error message already set. */
4402 case PN_SERVER_MODE
:
4403 switch (PropInfoPntr
->commands
[0])
4405 case B_GET_PROPERTY
:
4406 ReplyMessage
.AddBool (g_ResultName
, g_ServerMode
);
4409 case B_SET_PROPERTY
:
4410 if (!ArgumentGotBool
)
4412 ErrorCode
= B_BAD_TYPE
;
4413 sprintf (TempString
, "You need to specify a boolean (true/yes, "
4414 "false/no) for the SET %s command", PropInfoPntr
->name
);
4417 g_ServerMode
= ArgumentBool
;
4418 m_SettingsHaveChanged
= true;
4421 default: /* Unknown operation code, error message already set. */
4427 if (PropInfoPntr
->commands
[0] == B_EXECUTE_PROPERTY
&&
4428 (ErrorCode
= SaveDatabaseIfNeeded (TempString
)) == B_OK
)
4433 switch (PropInfoPntr
->commands
[0])
4435 case B_GET_PROPERTY
:
4436 ReplyMessage
.AddInt32 (g_ResultName
, m_PurgeAge
);
4439 case B_SET_PROPERTY
:
4440 if (!ArgumentGotInt32
)
4442 ErrorCode
= B_BAD_TYPE
;
4443 sprintf (TempString
, "You need to specify a 32 bit integer "
4444 "for the SET %s command", PropInfoPntr
->name
);
4447 m_PurgeAge
= ArgumentInt32
;
4448 m_SettingsHaveChanged
= true;
4451 default: /* Unknown operation code, error message already set. */
4456 case PN_PURGE_POPULARITY
:
4457 switch (PropInfoPntr
->commands
[0])
4459 case B_GET_PROPERTY
:
4460 ReplyMessage
.AddInt32 (g_ResultName
, m_PurgePopularity
);
4463 case B_SET_PROPERTY
:
4464 if (!ArgumentGotInt32
)
4466 ErrorCode
= B_BAD_TYPE
;
4467 sprintf (TempString
, "You need to specify a 32 bit integer "
4468 "for the SET %s command", PropInfoPntr
->name
);
4471 m_PurgePopularity
= ArgumentInt32
;
4472 m_SettingsHaveChanged
= true;
4475 default: /* Unknown operation code, error message already set. */
4481 if (PropInfoPntr
->commands
[0] == B_EXECUTE_PROPERTY
&&
4482 (ErrorCode
= LoadDatabaseIfNeeded (TempString
)) == B_OK
&&
4483 (ErrorCode
= PurgeOldWords (TempString
)) == B_OK
)
4488 if (PropInfoPntr
->commands
[0] == B_GET_PROPERTY
&&
4489 (ErrorCode
= LoadDatabaseIfNeeded (TempString
)) == B_OK
)
4491 ReplyMessage
.AddInt32 (g_ResultName
, m_OldestAge
);
4497 case PN_EVALUATE_STRING
:
4498 if (PropInfoPntr
->commands
[0] == B_SET_PROPERTY
)
4500 if (!ArgumentGotString
)
4502 ErrorCode
= B_BAD_TYPE
;
4503 sprintf (TempString
, "You need to specify a string for the "
4504 "SET %s command", PropInfoPntr
->name
);
4507 if ((ErrorCode
= LoadDatabaseIfNeeded (TempString
)) == B_OK
)
4509 if (PropInfoPntr
->extra_data
== PN_EVALUATE
)
4511 if ((ErrorCode
= EvaluateFile (ArgumentString
, &ReplyMessage
,
4512 TempString
)) == B_OK
)
4515 else /* PN_EVALUATE_STRING */
4517 if ((ErrorCode
= EvaluateString (ArgumentString
, StringBufferSize
,
4518 &ReplyMessage
, TempString
)) == B_OK
)
4525 case PN_RESET_TO_DEFAULTS
:
4526 if (PropInfoPntr
->commands
[0] == B_EXECUTE_PROPERTY
)
4533 case PN_INSTALL_THINGS
:
4534 if (PropInfoPntr
->commands
[0] == B_EXECUTE_PROPERTY
&&
4535 (ErrorCode
= InstallThings (TempString
)) == B_OK
)
4539 case PN_SCORING_MODE
:
4540 switch (PropInfoPntr
->commands
[0])
4542 case B_GET_PROPERTY
:
4543 ReplyMessage
.AddString (g_ResultName
,
4544 g_ScoringModeNames
[m_ScoringMode
]);
4547 case B_SET_PROPERTY
:
4549 if (ArgumentGotString
)
4550 for (i
= 0; i
< SM_MAX
; i
++)
4552 if (strcasecmp (ArgumentString
, g_ScoringModeNames
[i
]) == 0)
4554 m_ScoringMode
= (ScoringModes
) i
;
4555 m_SettingsHaveChanged
= true;
4559 if (i
>= SM_MAX
) /* Didn't find a valid scoring mode word. */
4561 ErrorCode
= B_BAD_TYPE
;
4562 sprintf (TempString
, "You used the unrecognized \"%s\" as "
4563 "a scoring mode for the SET %s command. Should be one of: ",
4564 ArgumentGotString
? ArgumentString
: "not specified",
4565 PropInfoPntr
->name
);
4566 for (i
= 0; i
< SM_MAX
; i
++)
4568 strcat (TempString
, g_ScoringModeNames
[i
]);
4570 strcat (TempString
, ", ");
4576 default: /* Unknown operation code, error message already set. */
4581 case PN_TOKENIZE_MODE
:
4582 switch (PropInfoPntr
->commands
[0])
4584 case B_GET_PROPERTY
:
4585 ReplyMessage
.AddString (g_ResultName
,
4586 g_TokenizeModeNames
[m_TokenizeMode
]);
4589 case B_SET_PROPERTY
:
4591 if (ArgumentGotString
)
4592 for (i
= 0; i
< TM_MAX
; i
++)
4594 if (strcasecmp (ArgumentString
, g_TokenizeModeNames
[i
]) == 0)
4596 m_TokenizeMode
= (TokenizeModes
) i
;
4597 m_SettingsHaveChanged
= true;
4601 if (i
>= TM_MAX
) /* Didn't find a valid tokenize mode word. */
4603 ErrorCode
= B_BAD_TYPE
;
4604 sprintf (TempString
, "You used the unrecognized \"%s\" as "
4605 "a tokenize mode for the SET %s command. Should be one of: ",
4606 ArgumentGotString
? ArgumentString
: "not specified",
4607 PropInfoPntr
->name
);
4608 for (i
= 0; i
< TM_MAX
; i
++)
4610 strcat (TempString
, g_TokenizeModeNames
[i
]);
4612 strcat (TempString
, ", ");
4618 default: /* Unknown operation code, error message already set. */
4624 sprintf (TempString
, "Bug! Unrecognized property identification "
4625 "number %d (should be between 0 and %d). Fix the entry in "
4626 "the g_ScriptingPropertyList array!",
4627 (int) PropInfoPntr
->extra_data
, PN_MAX
- 1);
4633 ReplyMessage
.AddInt32 ("error", B_OK
);
4634 ErrorCode
= MessagePntr
->SendReply (&ReplyMessage
,
4635 this /* Reply's reply handler */, 500000 /* send timeout */);
4636 if (ErrorCode
!= B_OK
)
4637 cerr
<< "ProcessScriptingMessage failed to send a reply message, code " <<
4638 ErrorCode
<< " (" << strerror (ErrorCode
) << ")" << " for " <<
4639 CommandText
.String () << endl
;
4640 SetCursor (B_CURSOR_SYSTEM_DEFAULT
);
4643 ErrorExit
: /* Error message in TempString, return code in ErrorCode. */
4644 ReplyMessage
.AddInt32 ("error", ErrorCode
);
4645 ReplyMessage
.AddString ("message", TempString
);
4646 DisplayErrorMessage (TempString
, ErrorCode
);
4647 ErrorCode
= MessagePntr
->SendReply (&ReplyMessage
,
4648 this /* Reply's reply handler */, 500000 /* send timeout */);
4649 if (ErrorCode
!= B_OK
)
4650 cerr
<< "ProcessScriptingMessage failed to send an error message, code " <<
4651 ErrorCode
<< " (" << strerror (ErrorCode
) << ")" << " for " <<
4652 CommandText
.String () << endl
;
4653 SetCursor (B_CURSOR_SYSTEM_DEFAULT
);
4657 /* Since quitting stops the program before the results of a script command are
4658 received, we use a time delay to do the quit and make sure there are no pending
4659 commands being processed by the auxiliary looper which is sending us commands.
4660 Also, we have a countdown which can be interrupted by an incoming scripting
4661 message in case one client tells us to quit while another one is still using us
4662 (happens when you have two or more e-mail accounts). But if the system is
4663 shutting down, quit immediately! */
4668 if (g_QuitCountdown
== 0)
4670 if (g_CommanderLooperPntr
== NULL
||
4671 !g_CommanderLooperPntr
->IsBusy ())
4672 PostMessage (B_QUIT_REQUESTED
);
4674 else if (g_QuitCountdown
> 0)
4676 cerr
<< "SpamDBM quitting in " << g_QuitCountdown
<< ".\n";
4682 /* A quit request message has come in. If the quit countdown has reached zero,
4683 allow the request, otherwise reject it (and start the countdown if it hasn't
4687 ABSApp::QuitRequested ()
4689 BMessage
*QuitMessage
;
4690 team_info RemoteInfo
;
4691 BMessenger RemoteMessenger
;
4694 /* See if the quit is from the system shutdown command (which goes through
4695 the registrar server), if so, quit immediately. */
4697 QuitMessage
= CurrentMessage ();
4698 if (QuitMessage
!= NULL
&& QuitMessage
->IsSourceRemote ())
4700 RemoteMessenger
= QuitMessage
->ReturnAddress ();
4701 RemoteTeam
= RemoteMessenger
.Team ();
4702 if (get_team_info (RemoteTeam
, &RemoteInfo
) == B_OK
&&
4703 strstr (RemoteInfo
.args
, "registrar") != NULL
)
4704 g_QuitCountdown
= 0;
4707 if (g_QuitCountdown
== 0)
4708 return BApplication::QuitRequested ();
4710 if (g_QuitCountdown
< 0)
4711 // g_QuitCountdown = 10; /* Start the countdown. */
4712 g_QuitCountdown
= 5; /* Quit more quickly */
4718 /* Go through the current database and delete words which are too old (time is
4719 equivalent to the number of messages added to the database) and too unpopular
4720 (words not used by many messages). Hopefully this will get rid of words which
4721 are just hunks of binary or other garbage. The database has been loaded
4725 ABSApp::PurgeOldWords (char *ErrorMessage
)
4728 StatisticsMap::iterator CurrentIter
;
4729 StatisticsMap::iterator EndIter
;
4730 StatisticsMap::iterator NextIter
;
4731 char TempString
[80];
4733 strcpy (ErrorMessage
, "Purge can't fail"); /* So argument gets used. */
4734 CurrentTime
= m_TotalGenuineMessages
+ m_TotalSpamMessages
- 1;
4735 m_OldestAge
= (uint32
) -1 /* makes largest number possible */;
4737 EndIter
= m_WordMap
.end ();
4738 NextIter
= m_WordMap
.begin ();
4739 while (NextIter
!= EndIter
) {
4740 CurrentIter
= NextIter
++;
4742 if (CurrentTime
- CurrentIter
->second
.age
>= m_PurgeAge
&&
4743 CurrentIter
->second
.genuineCount
+ CurrentIter
->second
.spamCount
<=
4744 m_PurgePopularity
) {
4745 /* Delete this word, it is unpopular and old. Sob. */
4747 m_WordMap
.erase (CurrentIter
);
4748 if (m_WordCount
> 0)
4751 m_DatabaseHasChanged
= true;
4753 else /* This word is still in the database. Update oldest age. */
4755 if (CurrentIter
->second
.age
< m_OldestAge
)
4756 m_OldestAge
= CurrentIter
->second
.age
;
4760 /* Just a little bug check here. Just in case. */
4762 if (m_WordCount
!= m_WordMap
.size ()) {
4763 sprintf (TempString
, "Our word count of %lu doesn't match the "
4764 "size of the database, %lu", m_WordCount
, m_WordMap
.size());
4765 DisplayErrorMessage (TempString
, -1, "Bug!");
4766 m_WordCount
= m_WordMap
.size ();
4774 ABSApp::ReadyToRun ()
4776 DatabaseWindow
*DatabaseWindowPntr
;
4778 BButton
*TempButtonPntr
;
4779 BCheckBox
*TempCheckBoxPntr
;
4780 font_height TempFontHeight
;
4781 BMenuBar
*TempMenuBarPntr
;
4782 BMenuItem
*TempMenuItemPntr
;
4783 BPopUpMenu
*TempPopUpMenuPntr
;
4784 BRadioButton
*TempRadioButtonPntr
;
4786 const char *TempString
= "Testing My Things";
4787 BStringView
*TempStringViewPntr
;
4788 BTextControl
*TempTextPntr
;
4789 BWindow
*TempWindowPntr
;
4791 /* This batch of code gets some measurements which will be used for laying
4792 out controls and other GUI elements. Set the spacing between buttons and
4793 other controls to the width of the letter "M" in the user's desired font. */
4795 g_MarginBetweenControls
= (int) be_plain_font
->StringWidth ("M");
4797 /* Also find out how much space a line of text uses. */
4799 be_plain_font
->GetHeight (&TempFontHeight
);
4800 g_LineOfTextHeight
= ceilf (
4801 TempFontHeight
.ascent
+ TempFontHeight
.descent
+ TempFontHeight
.leading
);
4803 /* Start finding out the height of various user interface gadgets, which can
4804 vary based on the current font size. Make a temporary gadget, which is
4805 attached to our window, then resize it to its prefered size so that it
4806 accomodates the font size and other frills it needs. */
4808 TempWindowPntr
= new (std::nothrow
) BWindow (BRect (10, 20, 200, 200),
4809 "Temporary Window", B_DOCUMENT_WINDOW
,
4810 B_NO_WORKSPACE_ACTIVATION
| B_ASYNCHRONOUS_CONTROLS
);
4811 if (TempWindowPntr
== NULL
) {
4812 DisplayErrorMessage ("Unable to create temporary window for finding "
4813 "sizes of controls.");
4814 g_QuitCountdown
= 0;
4818 TempRect
= TempWindowPntr
->Bounds ();
4820 /* Find the height of a single line of text in a BStringView. */
4822 TempStringViewPntr
= new (std::nothrow
) BStringView (TempRect
, TempString
, TempString
);
4823 if (TempStringViewPntr
!= NULL
) {
4824 TempWindowPntr
->Lock();
4825 TempWindowPntr
->AddChild (TempStringViewPntr
);
4826 TempStringViewPntr
->GetPreferredSize (&JunkFloat
, &g_StringViewHeight
);
4827 TempWindowPntr
->RemoveChild (TempStringViewPntr
);
4828 TempWindowPntr
->Unlock();
4829 delete TempStringViewPntr
;
4832 /* Find the height of a button, which seems to be larger than a text
4833 control and can make life difficult. Make a temporary button, which
4834 is attached to our window so that it resizes to accomodate the font size. */
4836 TempButtonPntr
= new (std::nothrow
) BButton (TempRect
, TempString
, TempString
, NULL
);
4837 if (TempButtonPntr
!= NULL
) {
4838 TempWindowPntr
->Lock();
4839 TempWindowPntr
->AddChild (TempButtonPntr
);
4840 TempButtonPntr
->GetPreferredSize (&JunkFloat
, &g_ButtonHeight
);
4841 TempWindowPntr
->RemoveChild (TempButtonPntr
);
4842 TempWindowPntr
->Unlock();
4843 delete TempButtonPntr
;
4846 /* Find the height of a text box. */
4848 TempTextPntr
= new (std::nothrow
) BTextControl (TempRect
, TempString
, NULL
/* label */,
4850 if (TempTextPntr
!= NULL
) {
4851 TempWindowPntr
->Lock ();
4852 TempWindowPntr
->AddChild (TempTextPntr
);
4853 TempTextPntr
->GetPreferredSize (&JunkFloat
, &g_TextBoxHeight
);
4854 TempWindowPntr
->RemoveChild (TempTextPntr
);
4855 TempWindowPntr
->Unlock ();
4856 delete TempTextPntr
;
4859 /* Find the height of a checkbox control. */
4861 TempCheckBoxPntr
= new (std::nothrow
) BCheckBox (TempRect
, TempString
, TempString
, NULL
);
4862 if (TempCheckBoxPntr
!= NULL
) {
4863 TempWindowPntr
->Lock ();
4864 TempWindowPntr
->AddChild (TempCheckBoxPntr
);
4865 TempCheckBoxPntr
->GetPreferredSize (&JunkFloat
, &g_CheckBoxHeight
);
4866 TempWindowPntr
->RemoveChild (TempCheckBoxPntr
);
4867 TempWindowPntr
->Unlock ();
4868 delete TempCheckBoxPntr
;
4871 /* Find the height of a radio button control. */
4873 TempRadioButtonPntr
=
4874 new (std::nothrow
) BRadioButton (TempRect
, TempString
, TempString
, NULL
);
4875 if (TempRadioButtonPntr
!= NULL
) {
4876 TempWindowPntr
->Lock ();
4877 TempWindowPntr
->AddChild (TempRadioButtonPntr
);
4878 TempRadioButtonPntr
->GetPreferredSize (&JunkFloat
, &g_RadioButtonHeight
);
4879 TempWindowPntr
->RemoveChild (TempRadioButtonPntr
);
4880 TempWindowPntr
->Unlock ();
4881 delete TempRadioButtonPntr
;
4884 /* Find the height of a pop-up menu. */
4886 TempMenuBarPntr
= new (std::nothrow
) BMenuBar (TempRect
, TempString
,
4887 B_FOLLOW_LEFT
| B_FOLLOW_TOP
, B_ITEMS_IN_COLUMN
,
4888 true /* resize to fit items */);
4889 TempPopUpMenuPntr
= new (std::nothrow
) BPopUpMenu (TempString
);
4890 TempMenuItemPntr
= new (std::nothrow
) BMenuItem (TempString
, new BMessage (12345), 'g');
4892 if (TempMenuBarPntr
!= NULL
&& TempPopUpMenuPntr
!= NULL
&&
4893 TempMenuItemPntr
!= NULL
)
4895 TempPopUpMenuPntr
->AddItem (TempMenuItemPntr
);
4896 TempMenuBarPntr
->AddItem (TempPopUpMenuPntr
);
4898 TempWindowPntr
->Lock ();
4899 TempWindowPntr
->AddChild (TempMenuBarPntr
);
4900 TempMenuBarPntr
->GetPreferredSize (&JunkFloat
, &g_PopUpMenuHeight
);
4901 TempWindowPntr
->RemoveChild (TempMenuBarPntr
);
4902 TempWindowPntr
->Unlock ();
4903 delete TempMenuBarPntr
; // It will delete contents too.
4906 TempWindowPntr
->Lock ();
4907 TempWindowPntr
->Quit ();
4909 SetPulseRate (500000);
4911 if (g_CommandLineMode
)
4912 g_QuitCountdown
= 0; /* Quit as soon as queued up commands done. */
4913 else /* GUI mode, make a window. */
4915 DatabaseWindowPntr
= new (std::nothrow
) DatabaseWindow ();
4916 if (DatabaseWindowPntr
== NULL
) {
4917 DisplayErrorMessage ("Unable to create window.");
4918 g_QuitCountdown
= 0;
4920 DatabaseWindowPntr
->Show (); /* Starts the window's message loop. */
4924 g_AppReadyToRunCompleted
= true;
4928 /* Given a mail component (body text, attachment, whatever), look for words in
4929 it. If the tokenize mode specifies that it isn't one of the ones we are
4930 looking for, just skip it. For container type components, recursively examine
4931 their contents, up to the maximum depth specified. */
4934 ABSApp::RecursivelyTokenizeMailComponent (
4935 BMailComponent
*ComponentPntr
,
4936 const char *OptionalFileName
,
4937 set
<string
> &WordSet
,
4940 int MaxRecursionLevel
)
4942 char AttachmentName
[B_FILE_NAME_LENGTH
];
4943 BMailAttachment
*AttachmentPntr
;
4944 BMimeType ComponentMIMEType
;
4945 BMailContainer
*ContainerPntr
;
4946 BMallocIO ContentsIO
;
4947 const char *ContentsBufferPntr
;
4948 size_t ContentsBufferSize
;
4950 bool ExamineComponent
;
4951 const char *HeaderKeyPntr
;
4952 const char *HeaderValuePntr
;
4955 const char *NameExtension
;
4957 BMimeType
TextAnyMIMEType ("text");
4958 BMimeType
TextPlainMIMEType ("text/plain");
4960 if (ComponentPntr
== NULL
)
4963 /* Add things in the sub-headers that might be useful. Things like the file
4964 name of attachments, the encoding type, etc. */
4966 if (m_TokenizeMode
== TM_PLAIN_TEXT_HEADER
||
4967 m_TokenizeMode
== TM_ANY_TEXT_HEADER
||
4968 m_TokenizeMode
== TM_ALL_PARTS_HEADER
||
4969 m_TokenizeMode
== TM_JUST_HEADER
)
4971 for (i
= 0; i
< 1000; i
++)
4973 HeaderKeyPntr
= ComponentPntr
->HeaderAt (i
);
4974 if (HeaderKeyPntr
== NULL
)
4976 AddWordsToSet (HeaderKeyPntr
, strlen (HeaderKeyPntr
),
4977 'H' /* Prefix for Headers, uppercase unlike normal words. */, WordSet
);
4978 for (j
= 0; j
< 1000; j
++)
4980 HeaderValuePntr
= ComponentPntr
->HeaderField (HeaderKeyPntr
, j
);
4981 if (HeaderValuePntr
== NULL
)
4983 AddWordsToSet (HeaderValuePntr
, strlen (HeaderValuePntr
),
4989 /* Check the MIME type of the thing. It's used to decide if the contents are
4990 worth examining for words. */
4992 ErrorCode
= ComponentPntr
->MIMEType (&ComponentMIMEType
);
4993 if (ErrorCode
!= B_OK
)
4995 sprintf (ErrorMessage
, "ABSApp::RecursivelyTokenizeMailComponent: "
4996 "Unable to get MIME type at level %d in \"%s\"",
4997 RecursionLevel
, OptionalFileName
);
5000 if (ComponentMIMEType
.Type() == NULL
)
5002 /* Have to make up a MIME type for things which don't have them, such as
5003 the main body text, otherwise it would get ignored. */
5005 if (NULL
!= dynamic_cast<BTextMailComponent
*>(ComponentPntr
))
5006 ComponentMIMEType
.SetType ("text/plain");
5008 if (!TextAnyMIMEType
.Contains (&ComponentMIMEType
) &&
5009 NULL
!= (AttachmentPntr
= dynamic_cast<BMailAttachment
*>(ComponentPntr
)))
5011 /* Sometimes spam doesn't give a text MIME type for text when they do an
5012 attachment (which is often base64 encoded). Use the file name extension to
5013 see if it really is text. */
5014 NameExtension
= NULL
;
5015 if (AttachmentPntr
->FileName (AttachmentName
) >= 0)
5016 NameExtension
= strrchr (AttachmentName
, '.');
5017 if (NameExtension
!= NULL
)
5019 if (strcasecmp (NameExtension
, ".txt") == 0)
5020 ComponentMIMEType
.SetType ("text/plain");
5021 else if (strcasecmp (NameExtension
, ".htm") == 0 ||
5022 strcasecmp (NameExtension
, ".html") == 0)
5023 ComponentMIMEType
.SetType ("text/html");
5027 switch (m_TokenizeMode
)
5030 case TM_PLAIN_TEXT_HEADER
:
5031 ExamineComponent
= TextPlainMIMEType
.Contains (&ComponentMIMEType
);
5035 case TM_ANY_TEXT_HEADER
:
5036 ExamineComponent
= TextAnyMIMEType
.Contains (&ComponentMIMEType
);
5040 case TM_ALL_PARTS_HEADER
:
5041 ExamineComponent
= true;
5045 ExamineComponent
= false;
5049 if (ExamineComponent
)
5051 /* Get the contents of the component. This will be UTF-8 text (converted
5052 from whatever encoding was used) for text attachments. For other ones,
5053 it's just the raw data, or perhaps decoded from base64 encoding. */
5055 ContentsIO
.SetBlockSize (16 * 1024);
5056 ErrorCode
= ComponentPntr
->GetDecodedData (&ContentsIO
);
5057 if (ErrorCode
== B_OK
) /* Can fail for container components: no data. */
5059 /* Look for words in the decoded data. */
5061 ContentsBufferPntr
= (const char *) ContentsIO
.Buffer ();
5062 ContentsBufferSize
= ContentsIO
.BufferLength ();
5063 if (ContentsBufferPntr
!= NULL
/* can be empty */)
5064 AddWordsToSet (ContentsBufferPntr
, ContentsBufferSize
,
5065 0 /* no prefix character, this is body text */, WordSet
);
5069 /* Examine any sub-components in the message. */
5071 if (RecursionLevel
+ 1 <= MaxRecursionLevel
&&
5072 NULL
!= (ContainerPntr
= dynamic_cast<BMailContainer
*>(ComponentPntr
)))
5074 NumComponents
= ContainerPntr
->CountComponents ();
5076 for (i
= 0; i
< NumComponents
; i
++)
5078 ComponentPntr
= ContainerPntr
->GetComponent (i
);
5080 ErrorCode
= RecursivelyTokenizeMailComponent (ComponentPntr
,
5081 OptionalFileName
, WordSet
, ErrorMessage
, RecursionLevel
+ 1,
5083 if (ErrorCode
!= B_OK
)
5092 /* The user has tried to open a file or several files with this application,
5093 via Tracker's open-with menu item. If it is a database type file, then change
5094 the database file name to it. Otherwise, ask the user whether they want to
5095 classify it as spam or non-spam. There will be at most around 100 files, BeOS
5096 R5.0.3's Tracker crashes if it tries to pass on more than that many using Open
5097 With... etc. The command is sent to an intermediary thread where it is
5098 asynchronously converted into a scripting message(s) that are sent back to this
5099 BApplication. The intermediary is needed since we can't recursively execute
5100 scripting messages while processing a message (this RefsReceived one). */
5103 ABSApp::RefsReceived (BMessage
*MessagePntr
)
5105 if (g_CommanderLooperPntr
!= NULL
)
5106 g_CommanderLooperPntr
->CommandReferences (MessagePntr
);
5110 /* A scripting command is looking for something to execute it. See if it is
5111 targetted at our database. */
5113 BHandler
* ABSApp::ResolveSpecifier (
5114 BMessage
*MessagePntr
,
5116 BMessage
*SpecifierMsgPntr
,
5117 int32 SpecificationKind
,
5118 const char *PropertyPntr
)
5122 /* See if it is one of our commands. */
5124 if (SpecificationKind
== B_DIRECT_SPECIFIER
)
5126 for (i
= PN_MAX
- 1; i
>= 0; i
--)
5128 if (strcasecmp (PropertyPntr
, g_PropertyNames
[i
]) == 0)
5129 return this; /* Found it! Return the Handler (which is us). */
5133 /* Handle an unrecognized scripting command, let the parent figure it out. */
5135 return BApplication::ResolveSpecifier (
5136 MessagePntr
, Index
, SpecifierMsgPntr
, SpecificationKind
, PropertyPntr
);
5140 /* Save the database if it hasn't been saved yet. Otherwise do nothing. */
5142 status_t
ABSApp::SaveDatabaseIfNeeded (char *ErrorMessage
)
5144 if (m_DatabaseHasChanged
)
5145 return LoadSaveDatabase (false /* DoLoad */, ErrorMessage
);
5151 /* Presumably the file is an e-mail message (or at least the header portion of
5152 one). Break it into parts: header, body and MIME components. Then add the
5153 words in the portions that match the current tokenization settings to the set
5156 status_t
ABSApp::TokenizeParts (
5157 BPositionIO
*PositionIOPntr
,
5158 const char *OptionalFileName
,
5159 set
<string
> &WordSet
,
5162 status_t ErrorCode
= B_OK
;
5163 BEmailMessage WholeEMail
;
5165 sprintf (ErrorMessage
, "ABSApp::TokenizeParts: While getting e-mail "
5166 "headers, had problems with \"%s\"", OptionalFileName
);
5168 ErrorCode
= WholeEMail
.SetToRFC822 (
5169 PositionIOPntr
/* it does its own seeking to the start */,
5170 -1 /* length */, true /* parse_now */);
5171 if (ErrorCode
< 0) goto ErrorExit
;
5173 ErrorCode
= RecursivelyTokenizeMailComponent (&WholeEMail
,
5174 OptionalFileName
, WordSet
, ErrorMessage
, 0 /* Initial recursion level */,
5175 (m_TokenizeMode
== TM_JUST_HEADER
) ? 0 : 500 /* Max recursion level */);
5182 /* Add all the words in the whole file or memory buffer to the supplied set.
5183 The file doesn't have to be an e-mail message since it isn't parsed for e-mail
5184 headers or MIME headers or anything. It blindly adds everything that looks
5185 like a word, though it does convert quoted printable codes to the characters
5186 they represent. See also AddWordsToSet which does something more advanced. */
5188 status_t
ABSApp::TokenizeWhole (
5189 BPositionIO
*PositionIOPntr
,
5190 const char *OptionalFileName
,
5191 set
<string
> &WordSet
,
5194 string AccumulatedWord
;
5195 uint8 Buffer
[16 * 1024];
5196 uint8
*BufferCurrentPntr
= Buffer
+ 0;
5197 uint8
*BufferEndPntr
= Buffer
+ 0;
5198 const char *IOErrorString
=
5199 "TokenizeWhole: Error %ld while reading \"%s\"";
5203 int NextLetter
= ' ';
5204 int NextNextLetter
= ' ';
5206 /* Use a buffer since reading single characters from a BFile is so slow.
5207 BufferCurrentPntr is the position of the next character to be read. When it
5208 reaches BufferEndPntr, it is time to fill the buffer again. */
5210 #define ReadChar(CharVar) \
5212 if (BufferCurrentPntr < BufferEndPntr) \
5213 CharVar = *BufferCurrentPntr++; \
5214 else /* Try to fill the buffer. */ \
5216 ssize_t AmountRead; \
5217 AmountRead = PositionIOPntr->Read (Buffer, sizeof (Buffer)); \
5218 if (AmountRead < 0) \
5220 sprintf (ErrorMessage, IOErrorString, AmountRead, OptionalFileName); \
5221 return AmountRead; \
5223 else if (AmountRead == 0) \
5227 BufferEndPntr = Buffer + AmountRead; \
5228 BufferCurrentPntr = Buffer + 0; \
5229 CharVar = *BufferCurrentPntr++; \
5234 /* Read all the words in the file and add them to our local set of words. A
5235 set is used since we don't care how many times a word occurs. */
5239 /* We read two letters ahead so that we can decode quoted printable
5240 characters (an equals sign followed by two hex digits or a new line). Note
5241 that Letter can become EOF (-1) when end of file is reached. */
5243 Letter
= NextLetter
;
5244 NextLetter
= NextNextLetter
;
5245 ReadChar (NextNextLetter
);
5247 /* Decode quoted printable codes first, so that the rest of the code just
5248 sees an ordinary character. Or even nothing, if it is the hidden line
5249 break combination. This may falsely corrupt stuff following an equals
5250 sign, but usually won't. */
5254 if ((NextLetter
== '\r' && NextNextLetter
== '\n') ||
5255 (NextLetter
== '\n' && NextNextLetter
== '\r'))
5257 /* Make the "=\r\n" pair disappear. It's not even white space. */
5258 ReadChar (NextLetter
);
5259 ReadChar (NextNextLetter
);
5262 if (NextLetter
== '\n' || NextLetter
== '\r')
5264 /* Make the "=\n" pair disappear. It's not even white space. */
5265 NextLetter
= NextNextLetter
;
5266 ReadChar (NextNextLetter
);
5269 if (NextNextLetter
!= EOF
&&
5270 isxdigit (NextLetter
) && isxdigit (NextNextLetter
))
5272 /* Convert the hex code to a letter. */
5273 HexString
[0] = NextLetter
;
5274 HexString
[1] = NextNextLetter
;
5276 Letter
= strtoul (HexString
, NULL
, 16 /* number system base */);
5277 ReadChar (NextLetter
);
5278 ReadChar (NextNextLetter
);
5282 /* Convert to lower case to improve word matches. Of course this loses a
5283 bit of information, such as MONEY vs Money, an indicator of spam. Well,
5284 apparently that isn't all that useful a distinction, so do it. */
5286 if (Letter
>= 'A' && Letter
< 'Z')
5287 Letter
= Letter
+ ('a' - 'A');
5289 /* See if it is a letter we treat as white space - all control characters
5290 and all punctuation except for: apostrophe (so "it's" and possessive
5291 versions of words get stored), dash (for hyphenated words), dollar sign
5292 (for cash amounts), period (for IP addresses, we later remove trailing
5293 (periods). Note that codes above 127 are UTF-8 characters, which we
5294 consider non-space. */
5296 if (Letter
< 0 /* EOF */ || (Letter
< 128 && g_SpaceCharacters
[Letter
]))
5298 /* That space finished off a word. Remove trailing periods... */
5300 while ((Length
= AccumulatedWord
.size()) > 0 &&
5301 AccumulatedWord
[Length
-1] == '.')
5302 AccumulatedWord
.resize (Length
- 1);
5304 /* If there's anything left in the word, add it to the set. Also ignore
5305 words which are too big (it's probably some binary encoded data). But
5306 leave room for supercalifragilisticexpialidoceous. According to one web
5307 site, pneumonoultramicroscopicsilicovolcanoconiosis is the longest word
5308 currently in English. Note that some uuencoded data was seen with a 60
5309 character line length. */
5311 if (Length
> 0 && Length
<= g_MaxWordLength
)
5312 WordSet
.insert (AccumulatedWord
);
5314 /* Empty out the string to get ready for the next word. */
5316 AccumulatedWord
.resize (0);
5318 else /* Not a space-like character, add it to the word. */
5319 AccumulatedWord
.append (1 /* one copy of the char */, (char) Letter
);
5321 /* Stop at end of file or error. Don't care which. Exit here so that last
5322 word got processed. */
5333 /******************************************************************************
5334 * Implementation of the ClassificationChoicesView class, constructor,
5335 * destructor and the rest of the member functions in mostly alphabetical
5339 ClassificationChoicesWindow::ClassificationChoicesWindow (
5341 const char *FileName
,
5343 : BWindow (FrameRect
, "Classification Choices", B_TITLED_WINDOW
,
5344 B_NOT_ZOOMABLE
| B_NOT_RESIZABLE
| B_ASYNCHRONOUS_CONTROLS
),
5345 m_BulkModeSelectedPntr (NULL
),
5346 m_ChoosenClassificationPntr (NULL
)
5348 ClassificationChoicesView
*SubViewPntr
;
5350 SubViewPntr
= new ClassificationChoicesView (Bounds(),
5351 FileName
, NumberOfFiles
);
5352 AddChild (SubViewPntr
);
5353 SubViewPntr
->ResizeToPreferred ();
5354 ResizeTo (SubViewPntr
->Frame().Width(), SubViewPntr
->Frame().Height());
5359 ClassificationChoicesWindow::MessageReceived (BMessage
*MessagePntr
)
5361 BControl
*ControlPntr
;
5363 if (MessagePntr
->what
>= MSG_CLASS_BUTTONS
&&
5364 MessagePntr
->what
< MSG_CLASS_BUTTONS
+ CL_MAX
)
5366 if (m_ChoosenClassificationPntr
!= NULL
)
5367 *m_ChoosenClassificationPntr
=
5368 (ClassificationTypes
) (MessagePntr
->what
- MSG_CLASS_BUTTONS
);
5369 PostMessage (B_QUIT_REQUESTED
); // Close and destroy the window.
5373 if (MessagePntr
->what
== MSG_BULK_CHECKBOX
)
5375 if (m_BulkModeSelectedPntr
!= NULL
&&
5376 MessagePntr
->FindPointer ("source", (void **) &ControlPntr
) == B_OK
)
5377 *m_BulkModeSelectedPntr
= (ControlPntr
->Value() == B_CONTROL_ON
);
5381 if (MessagePntr
->what
== MSG_CANCEL_BUTTON
)
5383 PostMessage (B_QUIT_REQUESTED
); // Close and destroy the window.
5387 BWindow::MessageReceived (MessagePntr
);
5392 ClassificationChoicesWindow::Go (
5393 bool *BulkModeSelectedPntr
,
5394 ClassificationTypes
*ChoosenClassificationPntr
)
5396 status_t ErrorCode
= 0;
5397 BView
*MainViewPntr
;
5398 thread_id WindowThreadID
;
5400 m_BulkModeSelectedPntr
= BulkModeSelectedPntr
;
5401 m_ChoosenClassificationPntr
= ChoosenClassificationPntr
;
5402 if (m_ChoosenClassificationPntr
!= NULL
)
5403 *m_ChoosenClassificationPntr
= CL_MAX
;
5405 Show (); // Starts the window thread running.
5407 /* Move the window to the center of the screen it is now being displayed on
5408 (have to wait for it to be showing). */
5411 MainViewPntr
= FindView ("ClassificationChoicesView");
5412 if (MainViewPntr
!= NULL
)
5415 BScreen
TempScreen (this);
5419 TempRect
= TempScreen
.Frame ();
5420 X
= TempRect
.Width() / 2;
5421 Y
= TempRect
.Height() / 2;
5422 TempRect
= MainViewPntr
->Frame();
5423 X
-= TempRect
.Width() / 2;
5424 Y
-= TempRect
.Height() / 2;
5425 MoveTo (ceilf (X
), ceilf (Y
));
5429 /* Wait for the window to go away. */
5431 WindowThreadID
= Thread ();
5432 if (WindowThreadID
>= 0)
5433 // Delay until the window thread has died, presumably window deleted now.
5434 wait_for_thread (WindowThreadID
, &ErrorCode
);
5439 /******************************************************************************
5440 * Implementation of the ClassificationChoicesView class, constructor,
5441 * destructor and the rest of the member functions in mostly alphabetical
5445 ClassificationChoicesView::ClassificationChoicesView (
5447 const char *FileName
,
5449 : BView (FrameRect
, "ClassificationChoicesView",
5450 B_FOLLOW_TOP
| B_FOLLOW_LEFT
, B_WILL_DRAW
| B_NAVIGABLE_JUMP
),
5451 m_FileName (FileName
),
5452 m_NumberOfFiles (NumberOfFiles
),
5453 m_PreferredBottomY (ceilf (g_ButtonHeight
* 10))
5459 ClassificationChoicesView::AttachedToWindow ()
5461 BButton
*ButtonPntr
;
5462 BCheckBox
*CheckBoxPntr
;
5463 ClassificationTypes Classification
;
5467 BTextView
*TextViewPntr
;
5469 char TempString
[2048];
5473 SetViewColor (ui_color (B_PANEL_BACKGROUND_COLOR
));
5475 RowHeight
= g_ButtonHeight
;
5476 if (g_CheckBoxHeight
> RowHeight
)
5477 RowHeight
= g_CheckBoxHeight
;
5478 RowHeight
= ceilf (RowHeight
* 1.1);
5480 TempRect
= Bounds ();
5481 RowTop
= TempRect
.top
;
5483 /* Show the file name text. */
5485 Margin
= ceilf ((RowHeight
- g_StringViewHeight
) / 2);
5486 TempRect
= Bounds ();
5487 TempRect
.top
= RowTop
+ Margin
;
5488 TextRect
= TempRect
;
5489 TextRect
.OffsetTo (0, 0);
5490 TextRect
.InsetBy (g_MarginBetweenControls
, 2);
5491 sprintf (TempString
, "How do you want to classify the file named \"%s\"?",
5493 TextViewPntr
= new BTextView (TempRect
, "FileText", TextRect
,
5494 B_FOLLOW_TOP
| B_FOLLOW_LEFT
, B_WILL_DRAW
| B_FULL_UPDATE_ON_RESIZE
);
5495 AddChild (TextViewPntr
);
5496 TextViewPntr
->SetText (TempString
);
5497 TextViewPntr
->MakeEditable (false);
5498 TextViewPntr
->SetViewColor (ui_color (B_PANEL_BACKGROUND_COLOR
));
5499 TextViewPntr
->ResizeTo (TempRect
.Width (),
5500 3 + TextViewPntr
->TextHeight (0, sizeof (TempString
)));
5501 RowTop
= TextViewPntr
->Frame().bottom
+ Margin
;
5503 /* Make the classification buttons. */
5505 Margin
= ceilf ((RowHeight
- g_ButtonHeight
) / 2);
5506 TempRect
= Bounds ();
5507 TempRect
.top
= RowTop
+ Margin
;
5508 X
= Bounds().left
+ g_MarginBetweenControls
;
5509 for (Classification
= (ClassificationTypes
) 0; Classification
< CL_MAX
;
5510 Classification
= (ClassificationTypes
) ((int) Classification
+ 1))
5512 TempRect
= Bounds ();
5513 TempRect
.top
= RowTop
+ Margin
;
5515 sprintf (TempString
, "%s Button",
5516 g_ClassificationTypeNames
[Classification
]);
5517 ButtonPntr
= new BButton (TempRect
, TempString
,
5518 g_ClassificationTypeNames
[Classification
], new BMessage (
5519 ClassificationChoicesWindow::MSG_CLASS_BUTTONS
+ Classification
));
5520 AddChild (ButtonPntr
);
5521 ButtonPntr
->ResizeToPreferred ();
5522 X
= ButtonPntr
->Frame().right
+ 3 * g_MarginBetweenControls
;
5524 RowTop
+= ceilf (RowHeight
* 1.2);
5526 /* Make the Cancel button. */
5528 Margin
= ceilf ((RowHeight
- g_ButtonHeight
) / 2);
5529 TempRect
= Bounds ();
5530 TempRect
.top
= RowTop
+ Margin
;
5531 TempRect
.left
+= g_MarginBetweenControls
;
5532 ButtonPntr
= new BButton (TempRect
, "Cancel Button",
5533 "Cancel", new BMessage (ClassificationChoicesWindow::MSG_CANCEL_BUTTON
));
5534 AddChild (ButtonPntr
);
5535 ButtonPntr
->ResizeToPreferred ();
5536 X
= ButtonPntr
->Frame().right
+ g_MarginBetweenControls
;
5538 /* Make the checkbox for bulk operations. */
5540 if (m_NumberOfFiles
> 1)
5542 Margin
= ceilf ((RowHeight
- g_CheckBoxHeight
) / 2);
5543 TempRect
= Bounds ();
5544 TempRect
.top
= RowTop
+ Margin
;
5546 sprintf (TempString
, "Mark all %d remaining messages the same way.",
5547 m_NumberOfFiles
- 1);
5548 CheckBoxPntr
= new BCheckBox (TempRect
, "BulkBox", TempString
,
5549 new BMessage (ClassificationChoicesWindow::MSG_BULK_CHECKBOX
));
5550 AddChild (CheckBoxPntr
);
5551 CheckBoxPntr
->ResizeToPreferred ();
5553 RowTop
+= RowHeight
;
5555 m_PreferredBottomY
= RowTop
;
5560 ClassificationChoicesView::GetPreferredSize (float *width
, float *height
)
5563 *width
= Bounds().Width();
5565 *height
= m_PreferredBottomY
;
5570 /******************************************************************************
5571 * Implementation of the CommanderLooper class, constructor, destructor and the
5572 * rest of the member functions in mostly alphabetical order.
5575 CommanderLooper::CommanderLooper ()
5576 : BLooper ("CommanderLooper", B_NORMAL_PRIORITY
),
5582 CommanderLooper::~CommanderLooper ()
5584 g_CommanderLooperPntr
= NULL
;
5585 delete g_CommanderMessenger
;
5586 g_CommanderMessenger
= NULL
;
5590 /* Process some command line arguments. Basically just send a message to this
5591 looper itself to do the work later. That way the caller can continue doing
5592 whatever they're doing, particularly if it's the BApplication. */
5595 CommanderLooper::CommandArguments (int argc
, char **argv
)
5598 BMessage InternalMessage
;
5600 InternalMessage
.what
= MSG_COMMAND_ARGUMENTS
;
5601 for (i
= 0; i
< argc
; i
++)
5602 InternalMessage
.AddString ("arg", argv
[i
]);
5604 PostMessage (&InternalMessage
);
5608 /* Copy the refs out of the given message and stuff them into an internal
5609 message to ourself (so that the original message can be returned to the caller,
5610 and if it is Tracker, it can close the file handles it has open). Optionally
5611 allow preset classification rather than asking the user (set BulkMode to TRUE
5612 and specify the class with BulkClassification). */
5615 CommanderLooper::CommandReferences (
5616 BMessage
*MessagePntr
,
5618 ClassificationTypes BulkClassification
)
5622 BMessage InternalMessage
;
5624 InternalMessage
.what
= MSG_COMMAND_FILE_REFS
;
5625 for (i
= 0; MessagePntr
->FindRef ("refs", i
, &EntryRef
) == B_OK
; i
++)
5626 InternalMessage
.AddRef ("refs", &EntryRef
);
5627 InternalMessage
.AddBool ("BulkMode", BulkMode
);
5628 InternalMessage
.AddInt32 ("BulkClassification", BulkClassification
);
5630 PostMessage (&InternalMessage
);
5634 /* This function is called by other threads to see if the CommanderLooper is
5635 busy working on something. */
5638 CommanderLooper::IsBusy ()
5643 if (IsLocked () || !MessageQueue()->IsEmpty ())
5652 CommanderLooper::MessageReceived (BMessage
*MessagePntr
)
5656 if (MessagePntr
->what
== MSG_COMMAND_ARGUMENTS
)
5657 ProcessArgs (MessagePntr
);
5658 else if (MessagePntr
->what
== MSG_COMMAND_FILE_REFS
)
5659 ProcessRefs (MessagePntr
);
5661 BLooper::MessageReceived (MessagePntr
);
5667 /* Process the command line by converting it into a series of scripting
5668 messages (possibly thousands) and sent them to the BApplication synchronously
5669 (so we can print the result). */
5672 CommanderLooper::ProcessArgs (BMessage
*MessagePntr
)
5675 const char **argv
= NULL
;
5678 const char *CommandWord
;
5680 const char *ErrorTitle
= "ProcessArgs";
5683 BMessage ReplyMessage
;
5684 BMessage ScriptMessage
;
5685 struct property_info
*PropInfoPntr
;
5686 const char *PropertyName
;
5690 const char *TempStringPntr
;
5692 const char *ValuePntr
;
5694 /* Get the argument count and pointers to arguments out of the message and
5695 into our argc and argv. */
5697 ErrorCode
= MessagePntr
->GetInfo ("arg", &TypeCode
, &argc
);
5698 if (ErrorCode
!= B_OK
|| TypeCode
!= B_STRING_TYPE
)
5700 DisplayErrorMessage ("Unable to find argument strings in message",
5701 ErrorCode
, ErrorTitle
);
5708 DisplayErrorMessage ("You need to specify a command word, like GET, SET "
5709 "and so on followed by a property, like DatabaseFile, and maybe "
5710 "followed by a value of some sort", -1, ErrorTitle
);
5714 argv
= (const char **) malloc (sizeof (char *) * argc
);
5717 DisplayErrorMessage ("Out of memory when allocating argv array",
5718 ENOMEM
, ErrorTitle
);
5722 for (i
= 0; i
< argc
; i
++)
5724 if ((ErrorCode
= MessagePntr
->FindString ("arg", i
, &argv
[i
])) != B_OK
)
5726 DisplayErrorMessage ("Unable to find argument in the BMessage",
5727 ErrorCode
, ErrorTitle
);
5732 CommandWord
= argv
[1];
5734 /* Special case for the Quit command since it isn't a scripting command. */
5736 if (strcasecmp (CommandWord
, "quit") == 0)
5738 g_QuitCountdown
= 10;
5742 /* Find the corresponding scripting command. */
5744 if (strcasecmp (CommandWord
, "set") == 0)
5745 CommandCode
= B_SET_PROPERTY
;
5746 else if (strcasecmp (CommandWord
, "get") == 0)
5747 CommandCode
= B_GET_PROPERTY
;
5748 else if (strcasecmp (CommandWord
, "count") == 0)
5749 CommandCode
= B_COUNT_PROPERTIES
;
5750 else if (strcasecmp (CommandWord
, "create") == 0)
5751 CommandCode
= B_CREATE_PROPERTY
;
5752 else if (strcasecmp (CommandWord
, "delete") == 0)
5753 CommandCode
= B_DELETE_PROPERTY
;
5755 CommandCode
= B_EXECUTE_PROPERTY
;
5757 if (CommandCode
== B_EXECUTE_PROPERTY
)
5759 PropertyName
= CommandWord
;
5760 ArgumentIndex
= 2; /* Arguments to the command start at this index. */
5764 if (CommandCode
== B_SET_PROPERTY
)
5766 /* SET commands require at least one argument value. */
5770 DisplayErrorMessage ("SET commands require at least one "
5771 "argument value after the property name", -1, ErrorTitle
);
5779 DisplayErrorMessage ("You need to specify a property to act on",
5783 PropertyName
= argv
[2];
5787 /* See if it is one of our commands. */
5789 for (PropInfoPntr
= g_ScriptingPropertyList
+ 0; true; PropInfoPntr
++)
5791 if (PropInfoPntr
->name
== 0)
5794 DisplayErrorMessage ("The property specified isn't known or "
5795 "doesn't support the requested action (usually means it is an "
5796 "unknown command)", -1, ErrorTitle
);
5797 goto ErrorExit
; /* Unrecognized command. */
5800 if (PropInfoPntr
->commands
[0] == CommandCode
&&
5801 strcasecmp (PropertyName
, PropInfoPntr
->name
) == 0)
5805 /* Make the equivalent command message. For commands with multiple
5806 arguments, repeat the message for each single argument and just change the
5807 data portion for each extra argument. Send the command and wait for a reply,
5808 which we'll print out. */
5810 ScriptMessage
.MakeEmpty ();
5811 ScriptMessage
.what
= CommandCode
;
5812 ScriptMessage
.AddSpecifier (PropertyName
);
5815 if (ArgumentIndex
< argc
) /* If there are arguments to be added. */
5817 ValuePntr
= argv
[ArgumentIndex
];
5819 /* Convert the value into the likely kind of data. */
5821 if (strcasecmp (ValuePntr
, "yes") == 0 ||
5822 strcasecmp (ValuePntr
, "true") == 0)
5823 ScriptMessage
.AddBool (g_DataName
, true);
5824 else if (strcasecmp (ValuePntr
, "no") == 0 ||
5825 strcasecmp (ValuePntr
, "false") == 0)
5826 ScriptMessage
.AddBool (g_DataName
, false);
5829 /* See if it is a number. */
5830 i
= strtol (ValuePntr
, &EndPntr
, 0);
5832 ScriptMessage
.AddInt32 (g_DataName
, i
);
5833 else /* Nope, it's just a string. */
5834 ScriptMessage
.AddString (g_DataName
, ValuePntr
);
5838 ErrorCode
= be_app_messenger
.SendMessage (&ScriptMessage
, &ReplyMessage
);
5839 if (ErrorCode
!= B_OK
)
5841 DisplayErrorMessage ("Unable to send scripting command",
5842 ErrorCode
, ErrorTitle
);
5846 /* Print the reply to the scripting command. Even in server mode. To
5849 if (ReplyMessage
.FindString ("CommandText", &TempStringPntr
) == B_OK
)
5852 if (ReplyMessage
.FindInt32 ("error", &TempInt32
) == B_OK
&&
5855 /* It's a successful reply to one of our scripting messages. Print out
5856 the returned values code for command line users to see. */
5858 cout
<< "Result of command to " << TempStringPntr
<< " is:\t";
5859 if (ReplyMessage
.FindString (g_ResultName
, &TempStringPntr
) == B_OK
)
5860 cout
<< "\"" << TempStringPntr
<< "\"";
5861 else if (ReplyMessage
.FindInt32 (g_ResultName
, &TempInt32
) == B_OK
)
5863 else if (ReplyMessage
.FindFloat (g_ResultName
, &TempFloat
) == B_OK
)
5865 else if (ReplyMessage
.FindBool (g_ResultName
, &TempBool
) == B_OK
)
5866 cout
<< (TempBool
? "true" : "false");
5868 cout
<< "just plain success";
5869 if (ReplyMessage
.FindInt32 ("count", &TempInt32
) == B_OK
)
5870 cout
<< "\t(count " << TempInt32
<< ")";
5871 for (i
= 0; (i
< 50) &&
5872 ReplyMessage
.FindString ("words", i
, &TempStringPntr
) == B_OK
&&
5873 ReplyMessage
.FindFloat ("ratios", i
, &TempFloat
) == B_OK
;
5877 cout
<< "\twith top words:\t";
5880 cout
<< TempStringPntr
<< "/" << TempFloat
;
5884 else /* An error reply, print out the error, even in server mode. */
5886 cout
<< "Failure of command " << TempStringPntr
<< ", error ";
5887 cout
<< TempInt32
<< " (" << strerror (TempInt32
) << ")";
5888 if (ReplyMessage
.FindString ("message", &TempStringPntr
) == B_OK
)
5889 cout
<< ", message: " << TempStringPntr
;
5890 cout
<< "." << endl
;
5894 /* Advance to the next argument and its scripting message. */
5896 ScriptMessage
.RemoveName (g_DataName
);
5897 if (++ArgumentIndex
>= argc
)
5906 /* Given a bunch of references to files, open the files. If it's a database
5907 file, switch to using it as a database. Otherwise, treat them as text files
5908 and add them to the database. Prompt the user for the spam or genuine or
5909 uncertain (declassification) choice, with the option to bulk mark many files at
5913 CommanderLooper::ProcessRefs (BMessage
*MessagePntr
)
5915 bool BulkMode
= false;
5916 ClassificationTypes BulkClassification
= CL_GENUINE
;
5917 ClassificationChoicesWindow
*ChoiceWindowPntr
;
5921 const char *ErrorTitle
= "CommanderLooper::ProcessRefs";
5922 int32 NumberOfRefs
= 0;
5925 BMessage ReplyMessage
;
5926 BMessage ScriptingMessage
;
5930 char TempString
[PATH_MAX
+ 1024];
5933 // Wait for ReadyToRun to finish initializing the globals with the sizes of
5934 // the controls, since they are needed when we show the custom alert box for
5935 // choosing the message type.
5938 while (!g_AppReadyToRunCompleted
&& TempInt32
++ < 10)
5941 ErrorCode
= MessagePntr
->GetInfo ("refs", &TypeCode
, &NumberOfRefs
);
5942 if (ErrorCode
!= B_OK
|| TypeCode
!= B_REF_TYPE
|| NumberOfRefs
<= 0)
5944 DisplayErrorMessage ("Unable to get refs from the message",
5945 ErrorCode
, ErrorTitle
);
5949 if (MessagePntr
->FindBool ("BulkMode", &TempBool
) == B_OK
)
5950 BulkMode
= TempBool
;
5951 if (MessagePntr
->FindInt32 ("BulkClassification", &TempInt32
) == B_OK
&&
5952 TempInt32
>= 0 && TempInt32
< CL_MAX
)
5953 BulkClassification
= (ClassificationTypes
) TempInt32
;
5956 MessagePntr
->FindRef ("refs", RefIndex
, &EntryRef
) == B_OK
;
5959 ScriptingMessage
.MakeEmpty ();
5960 ScriptingMessage
.what
= 0; /* Haven't figured out what to do yet. */
5962 /* See if the entry is a valid file or directory or other thing. */
5964 ErrorCode
= Entry
.SetTo (&EntryRef
, true /* traverse symbolic links */);
5965 if (ErrorCode
!= B_OK
||
5966 ((ErrorCode
= /* assignment */ B_ENTRY_NOT_FOUND
) != 0 /* this pacifies
5967 mwcc -nwhitehorn */ && !Entry
.Exists ()) ||
5968 ((ErrorCode
= Entry
.GetPath (&Path
)) != B_OK
))
5970 DisplayErrorMessage ("Bad entry reference encountered, will skip it",
5971 ErrorCode
, ErrorTitle
);
5973 continue; /* Bad file reference, try the next one. */
5976 /* If it's a file, check if it is a spam database file. Go by the magic
5977 text at the start of the file, in case someone has edited the file with a
5978 spreadsheet or other tool and lost the MIME type. */
5980 if (Entry
.IsFile ())
5982 ErrorCode
= TempFile
.SetTo (&Entry
, B_READ_ONLY
);
5983 if (ErrorCode
!= B_OK
)
5985 sprintf (TempString
, "Unable to open file \"%s\" for reading, will "
5986 "skip it", Path
.Path ());
5987 DisplayErrorMessage (TempString
, ErrorCode
, ErrorTitle
);
5991 if (TempFile
.Read (TempString
, strlen (g_DatabaseRecognitionString
)) ==
5992 (int) strlen (g_DatabaseRecognitionString
) && strncmp (TempString
,
5993 g_DatabaseRecognitionString
, strlen (g_DatabaseRecognitionString
)) == 0)
5995 ScriptingMessage
.what
= B_SET_PROPERTY
;
5996 ScriptingMessage
.AddSpecifier (g_PropertyNames
[PN_DATABASE_FILE
]);
5997 ScriptingMessage
.AddString (g_DataName
, Path
.Path ());
6002 /* Not a database file. Could be a directory or a file. Submit it as
6003 something to be marked spam or genuine. */
6005 if (ScriptingMessage
.what
== 0)
6007 if (!Entry
.IsFile ())
6009 sprintf (TempString
, "\"%s\" is not a file, can't do anything with it",
6011 DisplayErrorMessage (TempString
, -1, ErrorTitle
);
6016 if (!BulkMode
) /* Have to ask the user. */
6018 ChoiceWindowPntr
= new ClassificationChoicesWindow (
6019 BRect (40, 40, 40 + 50 * g_MarginBetweenControls
,
6020 40 + g_ButtonHeight
* 5), Path
.Path (), NumberOfRefs
- RefIndex
);
6021 ChoiceWindowPntr
->Go (&BulkMode
, &BulkClassification
);
6022 if (BulkClassification
== CL_MAX
)
6023 break; /* Cancel was picked. */
6026 /* Format the command for classifying the file. */
6028 ScriptingMessage
.what
= B_SET_PROPERTY
;
6030 if (BulkClassification
== CL_GENUINE
)
6031 ScriptingMessage
.AddSpecifier (g_PropertyNames
[PN_GENUINE
]);
6032 else if (BulkClassification
== CL_SPAM
)
6033 ScriptingMessage
.AddSpecifier (g_PropertyNames
[PN_SPAM
]);
6034 else if (BulkClassification
== CL_UNCERTAIN
)
6035 ScriptingMessage
.AddSpecifier (g_PropertyNames
[PN_UNCERTAIN
]);
6036 else /* Broken code */
6038 ScriptingMessage
.AddString (g_DataName
, Path
.Path ());
6041 /* Tell the BApplication to do the work, and wait for it to finish. The
6042 BApplication will display any error messages for us. */
6045 be_app_messenger
.SendMessage (&ScriptingMessage
, &ReplyMessage
);
6046 if (ErrorCode
!= B_OK
)
6048 DisplayErrorMessage ("Unable to send scripting command",
6049 ErrorCode
, ErrorTitle
);
6053 /* If there was an error, allow the user to stop by switching off bulk
6054 mode. The message will already have been displayed in an alert box, if
6055 server mode is off. */
6057 if (ReplyMessage
.FindInt32 ("error", &TempInt32
) != B_OK
||
6065 /******************************************************************************
6066 * Implementation of the ControlsView class, constructor, destructor and the
6067 * rest of the member functions in mostly alphabetical order.
6070 ControlsView::ControlsView (BRect NewBounds
)
6071 : BView (NewBounds
, "ControlsView", B_FOLLOW_TOP
| B_FOLLOW_LEFT_RIGHT
,
6072 B_WILL_DRAW
| B_PULSE_NEEDED
| B_NAVIGABLE_JUMP
| B_FRAME_EVENTS
),
6073 m_AboutButtonPntr (NULL
),
6074 m_AddExampleButtonPntr (NULL
),
6075 m_BrowseButtonPntr (NULL
),
6076 m_BrowseFilePanelPntr (NULL
),
6077 m_CreateDatabaseButtonPntr (NULL
),
6078 m_DatabaseFileNameTextboxPntr (NULL
),
6079 m_DatabaseLoadDone (false),
6080 m_EstimateSpamButtonPntr (NULL
),
6081 m_EstimateSpamFilePanelPntr (NULL
),
6082 m_GenuineCountTextboxPntr (NULL
),
6083 m_IgnorePreviousClassCheckboxPntr (NULL
),
6084 m_InstallThingsButtonPntr (NULL
),
6085 m_PurgeAgeTextboxPntr (NULL
),
6086 m_PurgeButtonPntr (NULL
),
6087 m_PurgePopularityTextboxPntr (NULL
),
6088 m_ResetToDefaultsButtonPntr (NULL
),
6089 m_ScoringModeMenuBarPntr (NULL
),
6090 m_ScoringModePopUpMenuPntr (NULL
),
6091 m_ServerModeCheckboxPntr (NULL
),
6092 m_SpamCountTextboxPntr (NULL
),
6093 m_TimeOfLastPoll (0),
6094 m_TokenizeModeMenuBarPntr (NULL
),
6095 m_TokenizeModePopUpMenuPntr (NULL
),
6096 m_WordCountTextboxPntr (NULL
)
6101 ControlsView::~ControlsView ()
6103 if (m_BrowseFilePanelPntr
!= NULL
)
6105 delete m_BrowseFilePanelPntr
;
6106 m_BrowseFilePanelPntr
= NULL
;
6109 if (m_EstimateSpamFilePanelPntr
!= NULL
)
6111 delete m_EstimateSpamFilePanelPntr
;
6112 m_EstimateSpamFilePanelPntr
= NULL
;
6118 ControlsView::AttachedToWindow ()
6120 float BigPurgeButtonTop
;
6121 BMessage CommandMessage
;
6122 const char *EightDigitsString
= " 12345678 ";
6127 ScoringModes ScoringMode
;
6128 const char *StringPntr
;
6129 BMenuItem
*TempMenuItemPntr
;
6131 char TempString
[PATH_MAX
];
6132 TokenizeModes TokenizeMode
;
6136 SetViewColor (ui_color (B_PANEL_BACKGROUND_COLOR
));
6138 TempRect
= Bounds ();
6140 RowTop
= TempRect
.top
;
6141 RowHeight
= g_ButtonHeight
;
6142 if (g_TextBoxHeight
> RowHeight
)
6143 RowHeight
= g_TextBoxHeight
;
6144 RowHeight
= ceilf (RowHeight
* 1.1);
6146 /* Make the Create button at the far right of the first row of controls,
6147 which are all database file related. */
6149 Margin
= ceilf ((RowHeight
- g_ButtonHeight
) / 2);
6150 TempRect
= Bounds ();
6151 TempRect
.top
= RowTop
+ Margin
;
6152 TempRect
.bottom
= TempRect
.top
+ g_ButtonHeight
;
6154 CommandMessage
.MakeEmpty ();
6155 CommandMessage
.what
= B_CREATE_PROPERTY
;
6156 CommandMessage
.AddSpecifier (g_PropertyNames
[PN_DATABASE_FILE
]);
6157 m_CreateDatabaseButtonPntr
= new BButton (TempRect
, "Create Button",
6158 "Create", new BMessage (CommandMessage
), B_FOLLOW_RIGHT
| B_FOLLOW_TOP
);
6159 if (m_CreateDatabaseButtonPntr
== NULL
) goto ErrorExit
;
6160 AddChild (m_CreateDatabaseButtonPntr
);
6161 m_CreateDatabaseButtonPntr
->SetTarget (be_app
);
6162 m_CreateDatabaseButtonPntr
->ResizeToPreferred ();
6163 m_CreateDatabaseButtonPntr
->GetPreferredSize (&Width
, &Height
);
6164 m_CreateDatabaseButtonPntr
->MoveTo (X
- Width
, TempRect
.top
);
6165 X
-= Width
+ g_MarginBetweenControls
;
6167 /* Make the Browse button, middle of the first row. */
6169 Margin
= ceilf ((RowHeight
- g_ButtonHeight
) / 2);
6170 TempRect
= Bounds ();
6171 TempRect
.top
= RowTop
+ Margin
;
6172 TempRect
.bottom
= TempRect
.top
+ g_ButtonHeight
;
6174 m_BrowseButtonPntr
= new BButton (TempRect
, "Browse Button",
6175 "Browse…", new BMessage (MSG_BROWSE_BUTTON
), B_FOLLOW_RIGHT
| B_FOLLOW_TOP
);
6176 if (m_BrowseButtonPntr
== NULL
) goto ErrorExit
;
6177 AddChild (m_BrowseButtonPntr
);
6178 m_BrowseButtonPntr
->SetTarget (this);
6179 m_BrowseButtonPntr
->ResizeToPreferred ();
6180 m_BrowseButtonPntr
->GetPreferredSize (&Width
, &Height
);
6181 m_BrowseButtonPntr
->MoveTo (X
- Width
, TempRect
.top
);
6182 X
-= Width
+ g_MarginBetweenControls
;
6184 /* Fill the rest of the space on the first row with the file name box. */
6186 Margin
= ceilf ((RowHeight
- g_TextBoxHeight
) / 2);
6187 TempRect
= Bounds ();
6188 TempRect
.top
= RowTop
+ Margin
;
6189 TempRect
.bottom
= TempRect
.top
+ g_TextBoxHeight
;
6192 StringPntr
= "Word Database:";
6193 strcpy (m_DatabaseFileNameCachedValue
, "Unknown...");
6194 m_DatabaseFileNameTextboxPntr
= new BTextControl (TempRect
,
6196 StringPntr
/* label */,
6197 m_DatabaseFileNameCachedValue
/* text */,
6198 new BMessage (MSG_DATABASE_NAME
),
6199 B_FOLLOW_LEFT_RIGHT
| B_FOLLOW_TOP
,
6200 B_WILL_DRAW
| B_NAVIGABLE
| B_NAVIGABLE_JUMP
);
6201 AddChild (m_DatabaseFileNameTextboxPntr
);
6202 m_DatabaseFileNameTextboxPntr
->SetTarget (this);
6203 m_DatabaseFileNameTextboxPntr
->SetDivider (
6204 be_plain_font
->StringWidth (StringPntr
) + g_MarginBetweenControls
);
6206 /* Second row contains the purge age, and a long line explaining it. There
6207 is space to the right where the top half of the big purge button will go. */
6209 RowTop
+= RowHeight
/* previous row's RowHeight */;
6210 BigPurgeButtonTop
= RowTop
;
6211 TempRect
= Bounds ();
6213 RowHeight
= g_TextBoxHeight
;
6214 RowHeight
= ceilf (RowHeight
* 1.1);
6216 StringPntr
= "Number of occurrences needed to store a word:";
6217 m_PurgeAgeCachedValue
= 12345678;
6219 Margin
= ceilf ((RowHeight
- g_TextBoxHeight
) / 2);
6220 TempRect
.top
= RowTop
+ Margin
;
6221 TempRect
.bottom
= TempRect
.top
+ g_TextBoxHeight
;
6223 TempRect
.right
= TempRect
.left
+
6224 be_plain_font
->StringWidth (StringPntr
) +
6225 be_plain_font
->StringWidth (EightDigitsString
) +
6226 3 * g_MarginBetweenControls
;
6228 sprintf (TempString
, "%d", (int) m_PurgeAgeCachedValue
);
6229 m_PurgeAgeTextboxPntr
= new BTextControl (TempRect
,
6231 StringPntr
/* label */,
6232 TempString
/* text */,
6233 new BMessage (MSG_PURGE_AGE
),
6234 B_FOLLOW_LEFT
| B_FOLLOW_TOP
,
6235 B_WILL_DRAW
| B_NAVIGABLE
);
6236 AddChild (m_PurgeAgeTextboxPntr
);
6237 m_PurgeAgeTextboxPntr
->SetTarget (this);
6238 m_PurgeAgeTextboxPntr
->SetDivider (
6239 be_plain_font
->StringWidth (StringPntr
) + g_MarginBetweenControls
);
6241 /* Third row contains the purge popularity and bottom half of the purge
6244 RowTop
+= RowHeight
/* previous row's RowHeight */;
6245 TempRect
= Bounds ();
6247 RowHeight
= g_TextBoxHeight
;
6248 RowHeight
= ceilf (RowHeight
* 1.1);
6250 StringPntr
= "Number of messages to store words from:";
6251 m_PurgePopularityCachedValue
= 87654321;
6252 Margin
= ceilf ((RowHeight
- g_TextBoxHeight
) / 2);
6253 TempRect
.top
= RowTop
+ Margin
;
6254 TempRect
.bottom
= TempRect
.top
+ g_TextBoxHeight
;
6256 TempRect
.right
= TempRect
.left
+
6257 be_plain_font
->StringWidth (StringPntr
) +
6258 be_plain_font
->StringWidth (EightDigitsString
) +
6259 3 * g_MarginBetweenControls
;
6260 X
= TempRect
.right
+ g_MarginBetweenControls
;
6262 sprintf (TempString
, "%d", (int) m_PurgePopularityCachedValue
);
6263 m_PurgePopularityTextboxPntr
= new BTextControl (TempRect
,
6265 StringPntr
/* label */,
6266 TempString
/* text */,
6267 new BMessage (MSG_PURGE_POPULARITY
),
6268 B_FOLLOW_LEFT
| B_FOLLOW_TOP
,
6269 B_WILL_DRAW
| B_NAVIGABLE
);
6270 AddChild (m_PurgePopularityTextboxPntr
);
6271 m_PurgePopularityTextboxPntr
->SetTarget (this);
6272 m_PurgePopularityTextboxPntr
->SetDivider (
6273 be_plain_font
->StringWidth (StringPntr
) + g_MarginBetweenControls
);
6275 /* Make the purge button, which will take up space in the 2nd and 3rd rows,
6276 on the right side. Twice as tall as a regular button too. */
6278 StringPntr
= "Remove Old Words";
6279 Margin
= ceilf ((((RowTop
+ RowHeight
) - BigPurgeButtonTop
) -
6280 2 * g_TextBoxHeight
) / 2);
6281 TempRect
.top
= BigPurgeButtonTop
+ Margin
;
6282 TempRect
.bottom
= TempRect
.top
+ 2 * g_TextBoxHeight
;
6284 TempRect
.right
= X
+ ceilf (2 * be_plain_font
->StringWidth (StringPntr
));
6286 CommandMessage
.MakeEmpty ();
6287 CommandMessage
.what
= B_EXECUTE_PROPERTY
;
6288 CommandMessage
.AddSpecifier (g_PropertyNames
[PN_PURGE
]);
6289 m_PurgeButtonPntr
= new BButton (TempRect
, "Purge Button",
6290 StringPntr
, new BMessage (CommandMessage
), B_FOLLOW_LEFT
| B_FOLLOW_TOP
);
6291 if (m_PurgeButtonPntr
== NULL
) goto ErrorExit
;
6292 m_PurgeButtonPntr
->ResizeToPreferred();
6293 AddChild (m_PurgeButtonPntr
);
6294 m_PurgeButtonPntr
->SetTarget (be_app
);
6296 /* The fourth row contains the ignore previous classification checkbox. */
6298 RowTop
+= RowHeight
/* previous row's RowHeight */;
6299 TempRect
= Bounds ();
6301 RowHeight
= g_CheckBoxHeight
;
6302 RowHeight
= ceilf (RowHeight
* 1.1);
6304 StringPntr
= "Allow Retraining on a Message";
6305 m_IgnorePreviousClassCachedValue
= false;
6307 Margin
= ceilf ((RowHeight
- g_CheckBoxHeight
) / 2);
6308 TempRect
.top
= RowTop
+ Margin
;
6309 TempRect
.bottom
= TempRect
.top
+ g_CheckBoxHeight
;
6311 m_IgnorePreviousClassCheckboxPntr
= new BCheckBox (TempRect
,
6314 new BMessage (MSG_IGNORE_CLASSIFICATION
),
6315 B_FOLLOW_TOP
| B_FOLLOW_LEFT
);
6316 if (m_IgnorePreviousClassCheckboxPntr
== NULL
) goto ErrorExit
;
6317 AddChild (m_IgnorePreviousClassCheckboxPntr
);
6318 m_IgnorePreviousClassCheckboxPntr
->SetTarget (this);
6319 m_IgnorePreviousClassCheckboxPntr
->ResizeToPreferred ();
6320 m_IgnorePreviousClassCheckboxPntr
->GetPreferredSize (&Width
, &Height
);
6321 X
+= Width
+ g_MarginBetweenControls
;
6323 /* The fifth row contains the server mode checkbox. */
6325 RowTop
+= RowHeight
/* previous row's RowHeight */;
6326 TempRect
= Bounds ();
6327 RowHeight
= g_CheckBoxHeight
;
6328 RowHeight
= ceilf (RowHeight
* 1.1);
6330 StringPntr
= "Print errors to Terminal";
6331 m_ServerModeCachedValue
= false;
6333 Margin
= ceilf ((RowHeight
- g_CheckBoxHeight
) / 2);
6334 TempRect
.top
= RowTop
+ Margin
;
6335 TempRect
.bottom
= TempRect
.top
+ g_CheckBoxHeight
;
6336 m_ServerModeCheckboxPntr
= new BCheckBox (TempRect
,
6339 new BMessage (MSG_SERVER_MODE
),
6340 B_FOLLOW_TOP
| B_FOLLOW_LEFT
);
6341 if (m_ServerModeCheckboxPntr
== NULL
) goto ErrorExit
;
6342 AddChild (m_ServerModeCheckboxPntr
);
6343 m_ServerModeCheckboxPntr
->SetTarget (this);
6344 m_ServerModeCheckboxPntr
->ResizeToPreferred ();
6345 m_ServerModeCheckboxPntr
->GetPreferredSize (&Width
, &Height
);
6347 /* This row just contains a huge pop-up menu which shows the tokenize mode
6348 and an explanation of what each mode does. */
6350 RowTop
+= RowHeight
/* previous row's RowHeight */;
6351 TempRect
= Bounds ();
6352 RowHeight
= g_PopUpMenuHeight
;
6353 RowHeight
= ceilf (RowHeight
* 1.1);
6355 Margin
= ceilf ((RowHeight
- g_PopUpMenuHeight
) / 2);
6356 TempRect
.top
= RowTop
+ Margin
;
6357 TempRect
.bottom
= TempRect
.top
+ g_PopUpMenuHeight
;
6359 m_TokenizeModeCachedValue
= TM_MAX
; /* Illegal value will force redraw. */
6360 m_TokenizeModeMenuBarPntr
= new BMenuBar (TempRect
, "TokenizeModeMenuBar",
6361 B_FOLLOW_LEFT_RIGHT
| B_FOLLOW_TOP
, B_ITEMS_IN_COLUMN
,
6362 false /* resize to fit items */);
6363 if (m_TokenizeModeMenuBarPntr
== NULL
) goto ErrorExit
;
6364 m_TokenizeModePopUpMenuPntr
= new BPopUpMenu ("TokenizeModePopUpMenu");
6365 if (m_TokenizeModePopUpMenuPntr
== NULL
) goto ErrorExit
;
6367 for (TokenizeMode
= (TokenizeModes
) 0;
6368 TokenizeMode
< TM_MAX
;
6369 TokenizeMode
= (TokenizeModes
) ((int) TokenizeMode
+ 1))
6371 /* Each different tokenize mode gets its own menu item. Selecting the item
6372 will send a canned command to the application to switch to the appropriate
6373 tokenize mode. An optional explanation of each mode is added to the mode
6376 CommandMessage
.MakeEmpty ();
6377 CommandMessage
.what
= B_SET_PROPERTY
;
6378 CommandMessage
.AddSpecifier (g_PropertyNames
[PN_TOKENIZE_MODE
]);
6379 CommandMessage
.AddString (g_DataName
, g_TokenizeModeNames
[TokenizeMode
]);
6380 strcpy (TempString
, g_TokenizeModeNames
[TokenizeMode
]);
6381 switch (TokenizeMode
)
6384 strcat (TempString
, " - Scan everything");
6388 strcat (TempString
, " - Scan e-mail body text except rich text");
6391 case TM_PLAIN_TEXT_HEADER
:
6392 strcat (TempString
, " - Scan entire e-mail text except rich text");
6396 strcat (TempString
, " - Scan e-mail body text and text attachments");
6399 case TM_ANY_TEXT_HEADER
:
6400 strcat (TempString
, " - Scan entire e-mail text and text attachments (recommended)");
6404 strcat (TempString
, " - Scan e-mail body and all attachments");
6407 case TM_ALL_PARTS_HEADER
:
6408 strcat (TempString
, " - Scan all parts of the e-mail");
6411 case TM_JUST_HEADER
:
6412 strcat (TempString
, " - Scan just the header (mail routing information)");
6419 new BMenuItem (TempString
, new BMessage (CommandMessage
));
6420 if (TempMenuItemPntr
== NULL
) goto ErrorExit
;
6421 TempMenuItemPntr
->SetTarget (be_app
);
6422 m_TokenizeModePopUpMenuPntr
->AddItem (TempMenuItemPntr
);
6424 m_TokenizeModeMenuBarPntr
->AddItem (m_TokenizeModePopUpMenuPntr
);
6425 AddChild (m_TokenizeModeMenuBarPntr
);
6427 /* This row just contains a huge pop-up menu which shows the scoring mode
6428 and an explanation of what each mode does. */
6430 RowTop
+= RowHeight
/* previous row's RowHeight */;
6431 TempRect
= Bounds ();
6432 RowHeight
= g_PopUpMenuHeight
;
6433 RowHeight
= ceilf (RowHeight
* 1.1);
6435 Margin
= ceilf ((RowHeight
- g_PopUpMenuHeight
) / 2);
6436 TempRect
.top
= RowTop
+ Margin
;
6437 TempRect
.bottom
= TempRect
.top
+ g_PopUpMenuHeight
;
6439 m_ScoringModeCachedValue
= SM_MAX
; /* Illegal value will force redraw. */
6440 m_ScoringModeMenuBarPntr
= new BMenuBar (TempRect
, "ScoringModeMenuBar",
6441 B_FOLLOW_LEFT_RIGHT
| B_FOLLOW_TOP
, B_ITEMS_IN_COLUMN
,
6442 false /* resize to fit items */);
6443 if (m_ScoringModeMenuBarPntr
== NULL
) goto ErrorExit
;
6444 m_ScoringModePopUpMenuPntr
= new BPopUpMenu ("ScoringModePopUpMenu");
6445 if (m_ScoringModePopUpMenuPntr
== NULL
) goto ErrorExit
;
6447 for (ScoringMode
= (ScoringModes
) 0;
6448 ScoringMode
< SM_MAX
;
6449 ScoringMode
= (ScoringModes
) ((int) ScoringMode
+ 1))
6451 /* Each different scoring mode gets its own menu item. Selecting the item
6452 will send a canned command to the application to switch to the appropriate
6453 scoring mode. An optional explanation of each mode is added to the mode
6456 CommandMessage
.MakeEmpty ();
6457 CommandMessage
.what
= B_SET_PROPERTY
;
6458 CommandMessage
.AddSpecifier (g_PropertyNames
[PN_SCORING_MODE
]);
6459 CommandMessage
.AddString (g_DataName
, g_ScoringModeNames
[ScoringMode
]);
6461 strcpy (TempString, g_ScoringModeNames[ScoringMode]);
6462 switch (ScoringMode)
6465 strcat (TempString, " - Learning Method 1: Naive Bayesian");
6469 strcat (TempString, " - Learning Method 2: Chi-Squared");
6476 switch (ScoringMode
)
6479 strcpy (TempString
, "Learning method 1: Naive Bayesian");
6483 strcpy (TempString
, "Learning method 2: Chi-Squared");
6490 new BMenuItem (TempString
, new BMessage (CommandMessage
));
6491 if (TempMenuItemPntr
== NULL
) goto ErrorExit
;
6492 TempMenuItemPntr
->SetTarget (be_app
);
6493 m_ScoringModePopUpMenuPntr
->AddItem (TempMenuItemPntr
);
6495 m_ScoringModeMenuBarPntr
->AddItem (m_ScoringModePopUpMenuPntr
);
6496 AddChild (m_ScoringModeMenuBarPntr
);
6498 /* The next row has the install MIME types button and the reset to defaults
6499 button, one on the left and the other on the right. */
6501 RowTop
+= RowHeight
/* previous row's RowHeight */;
6502 TempRect
= Bounds ();
6503 RowHeight
= g_ButtonHeight
;
6504 RowHeight
= ceilf (RowHeight
* 1.1);
6506 Margin
= ceilf ((RowHeight
- g_ButtonHeight
) / 2);
6507 TempRect
.top
= RowTop
+ Margin
;
6508 TempRect
.bottom
= TempRect
.top
+ g_ButtonHeight
;
6510 CommandMessage
.MakeEmpty ();
6511 CommandMessage
.what
= B_EXECUTE_PROPERTY
;
6512 CommandMessage
.AddSpecifier (g_PropertyNames
[PN_INSTALL_THINGS
]);
6513 m_InstallThingsButtonPntr
= new BButton (TempRect
, "Install Button",
6514 "Install spam types",
6515 new BMessage (CommandMessage
),
6516 B_FOLLOW_LEFT
| B_FOLLOW_TOP
);
6517 if (m_InstallThingsButtonPntr
== NULL
) goto ErrorExit
;
6518 AddChild (m_InstallThingsButtonPntr
);
6519 m_InstallThingsButtonPntr
->SetTarget (be_app
);
6520 m_InstallThingsButtonPntr
->ResizeToPreferred ();
6522 /* The Reset to Defaults button. On the right side of the row. */
6524 Margin
= ceilf ((RowHeight
- g_ButtonHeight
) / 2);
6525 TempRect
= Bounds ();
6526 TempRect
.top
= RowTop
+ Margin
;
6527 TempRect
.bottom
= TempRect
.top
+ g_ButtonHeight
;
6529 CommandMessage
.MakeEmpty ();
6530 CommandMessage
.what
= B_EXECUTE_PROPERTY
;
6531 CommandMessage
.AddSpecifier (g_PropertyNames
[PN_RESET_TO_DEFAULTS
]);
6532 m_ResetToDefaultsButtonPntr
= new BButton (TempRect
, "Reset Button",
6533 "Default settings", new BMessage (CommandMessage
),
6534 B_FOLLOW_RIGHT
| B_FOLLOW_TOP
);
6535 if (m_ResetToDefaultsButtonPntr
== NULL
) goto ErrorExit
;
6536 AddChild (m_ResetToDefaultsButtonPntr
);
6537 m_ResetToDefaultsButtonPntr
->SetTarget (be_app
);
6538 m_ResetToDefaultsButtonPntr
->ResizeToPreferred ();
6539 m_ResetToDefaultsButtonPntr
->GetPreferredSize (&Width
, &Height
);
6540 m_ResetToDefaultsButtonPntr
->MoveTo (TempRect
.right
- Width
, TempRect
.top
);
6542 /* The next row contains the Estimate, Add Examples and About buttons. */
6544 RowTop
+= RowHeight
/* previous row's RowHeight */;
6545 TempRect
= Bounds ();
6547 RowHeight
= g_ButtonHeight
;
6548 RowHeight
= ceilf (RowHeight
* 1.1);
6550 Margin
= ceilf ((RowHeight
- g_ButtonHeight
) / 2);
6551 TempRect
.top
= RowTop
+ Margin
;
6552 TempRect
.bottom
= TempRect
.top
+ g_ButtonHeight
;
6555 m_EstimateSpamButtonPntr
= new BButton (TempRect
, "Estimate Button",
6557 new BMessage (MSG_ESTIMATE_BUTTON
),
6558 B_FOLLOW_LEFT
| B_FOLLOW_TOP
);
6559 if (m_EstimateSpamButtonPntr
== NULL
) goto ErrorExit
;
6560 AddChild (m_EstimateSpamButtonPntr
);
6561 m_EstimateSpamButtonPntr
->SetTarget (this);
6562 m_EstimateSpamButtonPntr
->ResizeToPreferred ();
6563 X
= m_EstimateSpamButtonPntr
->Frame().right
+ g_MarginBetweenControls
;
6565 /* The Add Example button in the middle. Does the same as the browse button,
6566 but don't tell anyone that! */
6568 Margin
= ceilf ((RowHeight
- g_ButtonHeight
) / 2);
6569 TempRect
.top
= RowTop
+ Margin
;
6570 TempRect
.bottom
= TempRect
.top
+ g_ButtonHeight
;
6573 m_AddExampleButtonPntr
= new BButton (TempRect
, "Example Button",
6574 "Train spam filter on a message",
6575 new BMessage (MSG_BROWSE_BUTTON
),
6576 B_FOLLOW_LEFT_RIGHT
| B_FOLLOW_TOP
,
6577 B_WILL_DRAW
| B_NAVIGABLE
| B_FULL_UPDATE_ON_RESIZE
);
6578 if (m_AddExampleButtonPntr
== NULL
) goto ErrorExit
;
6579 AddChild (m_AddExampleButtonPntr
);
6580 m_AddExampleButtonPntr
->SetTarget (this);
6581 m_AddExampleButtonPntr
->ResizeToPreferred ();
6582 X
= m_AddExampleButtonPntr
->Frame().right
+ g_MarginBetweenControls
;
6584 /* Add the About button on the right. */
6586 Margin
= ceilf ((RowHeight
- g_ButtonHeight
) / 2);
6587 TempRect
= Bounds ();
6588 TempRect
.top
= RowTop
+ Margin
;
6589 TempRect
.bottom
= TempRect
.top
+ g_ButtonHeight
;
6592 m_AboutButtonPntr
= new BButton (TempRect
, "About Button",
6594 new BMessage (B_ABOUT_REQUESTED
),
6595 B_FOLLOW_RIGHT
| B_FOLLOW_TOP
);
6596 if (m_AboutButtonPntr
== NULL
) goto ErrorExit
;
6597 AddChild (m_AboutButtonPntr
);
6598 m_AboutButtonPntr
->SetTarget (be_app
);
6600 /* This row displays various counters. Starting with the genuine messages
6601 count on the left. */
6603 RowTop
+= RowHeight
/* previous row's RowHeight */;
6604 TempRect
= Bounds ();
6605 RowHeight
= g_TextBoxHeight
;
6606 RowHeight
= ceilf (RowHeight
* 1.1);
6608 StringPntr
= "Genuine messages:";
6609 m_GenuineCountCachedValue
= 87654321;
6610 sprintf (TempString
, "%d", (int) m_GenuineCountCachedValue
);
6612 Margin
= ceilf ((RowHeight
- g_TextBoxHeight
) / 2);
6613 TempRect
= Bounds ();
6614 TempRect
.top
= RowTop
+ Margin
;
6615 TempRect
.bottom
= TempRect
.top
+ g_TextBoxHeight
;
6616 TempRect
.right
= TempRect
.left
+
6617 be_plain_font
->StringWidth (StringPntr
) +
6618 be_plain_font
->StringWidth (TempString
) +
6619 3 * g_MarginBetweenControls
;
6621 m_GenuineCountTextboxPntr
= new BTextControl (TempRect
,
6623 StringPntr
/* label */,
6624 TempString
/* text */,
6625 NULL
/* no message */,
6626 B_FOLLOW_LEFT
| B_FOLLOW_TOP
,
6627 B_WILL_DRAW
/* not B_NAVIGABLE */);
6628 AddChild (m_GenuineCountTextboxPntr
);
6629 m_GenuineCountTextboxPntr
->SetTarget (this); /* Not that it matters. */
6630 m_GenuineCountTextboxPntr
->SetDivider (
6631 be_plain_font
->StringWidth (StringPntr
) + g_MarginBetweenControls
);
6632 m_GenuineCountTextboxPntr
->SetEnabled (false); /* For display only. */
6634 /* The word count in the center. */
6636 StringPntr
= "Word count:";
6637 m_WordCountCachedValue
= 87654321;
6638 sprintf (TempString
, "%d", (int) m_WordCountCachedValue
);
6640 Margin
= ceilf ((RowHeight
- g_TextBoxHeight
) / 2);
6641 TempRect
= Bounds ();
6642 TempRect
.top
= RowTop
+ Margin
;
6643 TempRect
.bottom
= TempRect
.top
+ g_TextBoxHeight
;
6644 Width
= be_plain_font
->StringWidth (StringPntr
) +
6645 be_plain_font
->StringWidth (TempString
) +
6646 3 * g_MarginBetweenControls
;
6647 TempRect
.left
= ceilf ((TempRect
.right
- TempRect
.left
) / 2 - Width
/ 2);
6648 TempRect
.right
= TempRect
.left
+ Width
;
6650 m_WordCountTextboxPntr
= new BTextControl (TempRect
,
6652 StringPntr
/* label */,
6653 TempString
/* text */,
6654 NULL
/* no message */,
6655 B_FOLLOW_H_CENTER
| B_FOLLOW_TOP
,
6656 B_WILL_DRAW
/* not B_NAVIGABLE */);
6657 AddChild (m_WordCountTextboxPntr
);
6658 m_WordCountTextboxPntr
->SetTarget (this); /* Not that it matters. */
6659 m_WordCountTextboxPntr
->SetDivider (
6660 be_plain_font
->StringWidth (StringPntr
) + g_MarginBetweenControls
);
6661 m_WordCountTextboxPntr
->SetEnabled (false); /* For display only. */
6663 /* The spam count on the far right. */
6665 StringPntr
= "Spam messages:";
6666 m_SpamCountCachedValue
= 87654321;
6667 sprintf (TempString
, "%d", (int) m_SpamCountCachedValue
);
6669 Margin
= ceilf ((RowHeight
- g_TextBoxHeight
) / 2);
6670 TempRect
= Bounds ();
6671 TempRect
.top
= RowTop
+ Margin
;
6672 TempRect
.bottom
= TempRect
.top
+ g_TextBoxHeight
;
6673 TempRect
.left
= TempRect
.right
-
6674 be_plain_font
->StringWidth (StringPntr
) -
6675 be_plain_font
->StringWidth (TempString
) -
6676 3 * g_MarginBetweenControls
;
6678 m_SpamCountTextboxPntr
= new BTextControl (TempRect
,
6680 StringPntr
/* label */,
6681 TempString
/* text */,
6682 NULL
/* no message */,
6683 B_FOLLOW_RIGHT
| B_FOLLOW_TOP
,
6684 B_WILL_DRAW
/* not B_NAVIGABLE */);
6685 AddChild (m_SpamCountTextboxPntr
);
6686 m_SpamCountTextboxPntr
->SetTarget (this); /* Not that it matters. */
6687 m_SpamCountTextboxPntr
->SetDivider (
6688 be_plain_font
->StringWidth (StringPntr
) + g_MarginBetweenControls
);
6689 m_SpamCountTextboxPntr
->SetEnabled (false); /* For display only. */
6691 /* Change the size of our view so it only takes up the space needed by the
6694 RowTop
+= RowHeight
/* previous row's RowHeight */;
6695 ResizeTo (Bounds().Width(), RowTop
- Bounds().top
+ 1);
6697 return; /* Successful. */
6700 DisplayErrorMessage ("Unable to initialise the controls view.");
6705 ControlsView::BrowseForDatabaseFile ()
6707 if (m_BrowseFilePanelPntr
== NULL
)
6709 BEntry DirectoryEntry
;
6710 entry_ref DirectoryEntryRef
;
6711 BMessage GetDatabasePathCommand
;
6712 BMessage GetDatabasePathResult
;
6713 const char *StringPntr
= NULL
;
6715 /* Create a new file panel. First set up the entry ref stuff so that the
6716 file panel can open to show the initial directory (the one where the
6717 database file currently is). Note that we have to create it after the
6718 window and view are up and running, otherwise the BMessenger won't point to
6719 a valid looper/handler. First find out the current database file name to
6720 use as a starting point. */
6722 GetDatabasePathCommand
.what
= B_GET_PROPERTY
;
6723 GetDatabasePathCommand
.AddSpecifier (g_PropertyNames
[PN_DATABASE_FILE
]);
6724 be_app_messenger
.SendMessage (&GetDatabasePathCommand
,
6725 &GetDatabasePathResult
, 5000000 /* delivery timeout */,
6726 5000000 /* reply timeout */);
6727 if (GetDatabasePathResult
.FindString (g_ResultName
, &StringPntr
) != B_OK
||
6728 DirectoryEntry
.SetTo (StringPntr
) != B_OK
||
6729 DirectoryEntry
.GetParent (&DirectoryEntry
) != B_OK
)
6730 DirectoryEntry
.SetTo ("."); /* Default directory if we can't find it. */
6731 if (DirectoryEntry
.GetRef (&DirectoryEntryRef
) != B_OK
)
6733 DisplayErrorMessage (
6734 "Unable to set up the file requestor starting directory. Sorry.");
6738 m_BrowseFilePanelPntr
= new BFilePanel (
6739 B_OPEN_PANEL
/* mode */,
6740 &be_app_messenger
/* target for event messages */,
6741 &DirectoryEntryRef
/* starting directory */,
6743 true /* true for multiple selections */,
6744 NULL
/* canned message */,
6745 NULL
/* ref filter */,
6746 false /* true for modal */,
6747 true /* true to hide when done */);
6750 if (m_BrowseFilePanelPntr
!= NULL
)
6751 m_BrowseFilePanelPntr
->Show (); /* Answer returned later in RefsReceived. */
6756 ControlsView::BrowseForFileToEstimate ()
6758 if (m_EstimateSpamFilePanelPntr
== NULL
)
6760 BEntry DirectoryEntry
;
6761 entry_ref DirectoryEntryRef
;
6763 BMessenger
MessengerToSelf (this);
6764 BPath PathToMailDirectory
;
6766 /* Create a new file panel. First set up the entry ref stuff so that the
6767 file panel can open to show the initial directory (the user's mail
6768 directory). Note that we have to create the panel after the window and
6769 view are up and running, otherwise the BMessenger won't point to a valid
6772 ErrorCode
= find_directory (B_USER_DIRECTORY
, &PathToMailDirectory
);
6773 if (ErrorCode
== B_OK
)
6775 PathToMailDirectory
.Append ("mail");
6776 ErrorCode
= DirectoryEntry
.SetTo (PathToMailDirectory
.Path(),
6777 true /* traverse symbolic links*/);
6778 if (ErrorCode
!= B_OK
|| !DirectoryEntry
.Exists ())
6780 /* If no mail directory, try home directory. */
6781 find_directory (B_USER_DIRECTORY
, &PathToMailDirectory
);
6782 ErrorCode
= DirectoryEntry
.SetTo (PathToMailDirectory
.Path(), true);
6785 if (ErrorCode
!= B_OK
)
6786 PathToMailDirectory
.SetTo (".");
6788 DirectoryEntry
.SetTo (PathToMailDirectory
.Path(), true);
6789 if (DirectoryEntry
.GetRef (&DirectoryEntryRef
) != B_OK
)
6791 DisplayErrorMessage (
6792 "Unable to set up the file requestor starting directory. Sorry.");
6796 m_EstimateSpamFilePanelPntr
= new BFilePanel (
6797 B_OPEN_PANEL
/* mode */,
6798 &MessengerToSelf
/* target for event messages */,
6799 &DirectoryEntryRef
/* starting directory */,
6801 true /* true for multiple selections */,
6802 new BMessage (MSG_ESTIMATE_FILE_REFS
) /* canned message */,
6803 NULL
/* ref filter */,
6804 false /* true for modal */,
6805 true /* true to hide when done */);
6808 if (m_EstimateSpamFilePanelPntr
!= NULL
)
6809 m_EstimateSpamFilePanelPntr
->Show (); /* Answer sent via a message. */
6813 /* The display has been resized. Have to manually adjust the popup menu bar to
6814 show the new size (the sub-items need to be resized too). Then make it redraw.
6815 Well, actually just resetting the mark on the current item will resize it
6819 ControlsView::FrameResized (float, float)
6821 m_ScoringModeCachedValue
= SM_MAX
; /* Force it to reset the mark. */
6822 m_TokenizeModeCachedValue
= TM_MAX
; /* Force it to reset the mark. */
6827 ControlsView::MessageReceived (BMessage
*MessagePntr
)
6829 BMessage CommandMessage
;
6833 switch (MessagePntr
->what
)
6835 case MSG_BROWSE_BUTTON
:
6836 BrowseForDatabaseFile ();
6839 case MSG_DATABASE_NAME
:
6840 if (strcmp (m_DatabaseFileNameCachedValue
,
6841 m_DatabaseFileNameTextboxPntr
->Text ()) != 0)
6842 SubmitCommandString (PN_DATABASE_FILE
, B_SET_PROPERTY
,
6843 m_DatabaseFileNameTextboxPntr
->Text ());
6846 case MSG_ESTIMATE_BUTTON
:
6847 BrowseForFileToEstimate ();
6850 case MSG_ESTIMATE_FILE_REFS
:
6851 EstimateRefFilesAndDisplay (MessagePntr
);
6854 case MSG_IGNORE_CLASSIFICATION
:
6855 TempBool
= (m_IgnorePreviousClassCheckboxPntr
->Value() == B_CONTROL_ON
);
6856 if (m_IgnorePreviousClassCachedValue
!= TempBool
)
6857 SubmitCommandBool (PN_IGNORE_PREVIOUS_CLASSIFICATION
,
6858 B_SET_PROPERTY
, TempBool
);
6862 TempUint32
= strtoul (m_PurgeAgeTextboxPntr
->Text (), NULL
, 10);
6863 if (m_PurgeAgeCachedValue
!= TempUint32
)
6864 SubmitCommandInt32 (PN_PURGE_AGE
, B_SET_PROPERTY
, TempUint32
);
6867 case MSG_PURGE_POPULARITY
:
6868 TempUint32
= strtoul (m_PurgePopularityTextboxPntr
->Text (), NULL
, 10);
6869 if (m_PurgePopularityCachedValue
!= TempUint32
)
6870 SubmitCommandInt32 (PN_PURGE_POPULARITY
, B_SET_PROPERTY
, TempUint32
);
6873 case MSG_SERVER_MODE
:
6874 TempBool
= (m_ServerModeCheckboxPntr
->Value() == B_CONTROL_ON
);
6875 if (m_ServerModeCachedValue
!= TempBool
)
6876 SubmitCommandBool (PN_SERVER_MODE
, B_SET_PROPERTY
, TempBool
);
6880 BView::MessageReceived (MessagePntr
);
6885 /* Check the server for changes in the state of the database, and if there are
6886 any changes, update the displayed values. Since this is a read only
6887 examination of the server, we go directly to the application rather than
6888 sending it messages. Also, when sending messages, we can't find out what it is
6889 doing while it is busy with a batch of spam additions (all the spam add
6890 commands will be in the queue ahead of our requests for info). Instead, we
6891 lock the BApplication (so it isn't changing things while we're looking) and
6892 retrieve our values. */
6895 ControlsView::PollServerForChanges ()
6898 BMenuItem
*TempMenuItemPntr
;
6899 char TempString
[PATH_MAX
];
6900 BWindow
*WindowPntr
;
6902 /* We need a pointer to our window, for changing the title etc. */
6904 WindowPntr
= Window ();
6905 if (WindowPntr
== NULL
)
6906 return; /* No window, no point in updating the display! */
6908 /* Check the server mode flag. If the mode is off, then the window has to be
6909 minimized. Similarly, if it gets turned on, maximize the window. Note that
6910 the user can maximize the window manually, even while still in server mode.
6913 if (g_ServerMode
!= m_ServerModeCachedValue
&&
6914 m_ServerModeCheckboxPntr
!= NULL
)
6916 m_ServerModeCachedValue
= g_ServerMode
;
6917 m_ServerModeCheckboxPntr
->SetValue (
6918 m_ServerModeCachedValue
? B_CONTROL_ON
: B_CONTROL_OFF
);
6919 WindowPntr
->Minimize (m_ServerModeCachedValue
);
6922 if (WindowPntr
->IsMinimized ())
6923 return; /* Window isn't visible, don't waste time updating it. */
6925 /* So that people don't stare at a blank screen, request a database load if
6926 nothing is there. But only do it once, so the user doesn't get a lot of
6927 invalid database messages if one doesn't exist yet. In server mode, we never
6928 get this far so it is only loaded when the user wants to see something. */
6930 if (!m_DatabaseLoadDone
)
6932 m_DatabaseLoadDone
= true;
6933 /* Counting the number of words will load the database. */
6934 SubmitCommandString (PN_DATABASE_FILE
, B_COUNT_PROPERTIES
, "");
6937 /* Check various read only values, which can be read from the BApplication
6938 without having to lock it. This is useful for displaying the number of words
6939 as it is changing. First up is the purge age setting. */
6941 MyAppPntr
= dynamic_cast<ABSApp
*> (be_app
);
6942 if (MyAppPntr
== NULL
)
6943 return; /* Doesn't exist or is the wrong class. Not likely! */
6945 if (MyAppPntr
->m_PurgeAge
!= m_PurgeAgeCachedValue
&&
6946 m_PurgeAgeTextboxPntr
!= NULL
)
6948 m_PurgeAgeCachedValue
= MyAppPntr
->m_PurgeAge
;
6949 sprintf (TempString
, "%lu", m_PurgeAgeCachedValue
);
6950 m_PurgeAgeTextboxPntr
->SetText (TempString
);
6953 /* Check the purge popularity. */
6955 if (MyAppPntr
->m_PurgePopularity
!= m_PurgePopularityCachedValue
&&
6956 m_PurgePopularityTextboxPntr
!= NULL
)
6958 m_PurgePopularityCachedValue
= MyAppPntr
->m_PurgePopularity
;
6959 sprintf (TempString
, "%lu", m_PurgePopularityCachedValue
);
6960 m_PurgePopularityTextboxPntr
->SetText (TempString
);
6963 /* Check the Ignore Previous Classification flag. */
6965 if (MyAppPntr
->m_IgnorePreviousClassification
!=
6966 m_IgnorePreviousClassCachedValue
&&
6967 m_IgnorePreviousClassCheckboxPntr
!= NULL
)
6969 m_IgnorePreviousClassCachedValue
=
6970 MyAppPntr
->m_IgnorePreviousClassification
;
6971 m_IgnorePreviousClassCheckboxPntr
->SetValue (
6972 m_IgnorePreviousClassCachedValue
? B_CONTROL_ON
: B_CONTROL_OFF
);
6975 /* Update the genuine count. */
6977 if (MyAppPntr
->m_TotalGenuineMessages
!= m_GenuineCountCachedValue
&&
6978 m_GenuineCountTextboxPntr
!= NULL
)
6980 m_GenuineCountCachedValue
= MyAppPntr
->m_TotalGenuineMessages
;
6981 sprintf (TempString
, "%lu", m_GenuineCountCachedValue
);
6982 m_GenuineCountTextboxPntr
->SetText (TempString
);
6985 /* Update the spam count. */
6987 if (MyAppPntr
->m_TotalSpamMessages
!= m_SpamCountCachedValue
&&
6988 m_SpamCountTextboxPntr
!= NULL
)
6990 m_SpamCountCachedValue
= MyAppPntr
->m_TotalSpamMessages
;
6991 sprintf (TempString
, "%lu", m_SpamCountCachedValue
);
6992 m_SpamCountTextboxPntr
->SetText (TempString
);
6995 /* Update the word count. */
6997 if (MyAppPntr
->m_WordCount
!= m_WordCountCachedValue
&&
6998 m_WordCountTextboxPntr
!= NULL
)
7000 m_WordCountCachedValue
= MyAppPntr
->m_WordCount
;
7001 sprintf (TempString
, "%lu", m_WordCountCachedValue
);
7002 m_WordCountTextboxPntr
->SetText (TempString
);
7005 /* Update the tokenize mode pop-up menu. */
7007 if (MyAppPntr
->m_TokenizeMode
!= m_TokenizeModeCachedValue
&&
7008 m_TokenizeModePopUpMenuPntr
!= NULL
)
7010 m_TokenizeModeCachedValue
= MyAppPntr
->m_TokenizeMode
;
7012 m_TokenizeModePopUpMenuPntr
->ItemAt ((int) m_TokenizeModeCachedValue
);
7013 if (TempMenuItemPntr
!= NULL
)
7014 TempMenuItemPntr
->SetMarked (true);
7017 /* Update the scoring mode pop-up menu. */
7019 if (MyAppPntr
->m_ScoringMode
!= m_ScoringModeCachedValue
&&
7020 m_ScoringModePopUpMenuPntr
!= NULL
)
7022 m_ScoringModeCachedValue
= MyAppPntr
->m_ScoringMode
;
7024 m_ScoringModePopUpMenuPntr
->ItemAt ((int) m_ScoringModeCachedValue
);
7025 if (TempMenuItemPntr
!= NULL
)
7026 TempMenuItemPntr
->SetMarked (true);
7029 /* Lock the application. This will stop it from processing any further
7030 messages until we are done. Or if it is busy, the lock will fail. */
7032 if (MyAppPntr
->LockWithTimeout (100000) != B_OK
)
7033 return; /* It's probably busy doing something. */
7035 /* See if the database file name has changed. */
7037 if (strcmp (MyAppPntr
->m_DatabaseFileName
.String (),
7038 m_DatabaseFileNameCachedValue
) != 0 &&
7039 m_DatabaseFileNameTextboxPntr
!= NULL
)
7041 strcpy (m_DatabaseFileNameCachedValue
,
7042 MyAppPntr
->m_DatabaseFileName
.String ());
7043 m_DatabaseFileNameTextboxPntr
->SetText (m_DatabaseFileNameCachedValue
);
7044 WindowPntr
->SetTitle (m_DatabaseFileNameCachedValue
);
7047 /* Done. Let the BApplication continue processing messages. */
7049 MyAppPntr
->Unlock ();
7054 ControlsView::Pulse ()
7056 if (system_time () > m_TimeOfLastPoll
+ 200000)
7058 PollServerForChanges ();
7059 m_TimeOfLastPoll
= system_time ();
7065 /******************************************************************************
7066 * Implementation of the DatabaseWindow class, constructor, destructor and the
7067 * rest of the member functions in mostly alphabetical order.
7070 DatabaseWindow::DatabaseWindow ()
7071 : BWindow (BRect (30, 30, 620, 400),
7072 "Haiku spam filter server",
7073 B_DOCUMENT_WINDOW
, B_ASYNCHRONOUS_CONTROLS
)
7077 /* Add the controls view. */
7079 m_ControlsViewPntr
= new ControlsView (Bounds ());
7080 if (m_ControlsViewPntr
== NULL
)
7082 AddChild (m_ControlsViewPntr
);
7084 /* Add the word view in the remaining space under the controls view. */
7087 TempRect
= Bounds ();
7088 TempRect
.top
= m_ControlsViewPntr
->Frame().bottom
+ 1;
7089 m_WordsViewPntr
= new WordsView (TempRect
);
7090 if (m_WordsViewPntr
== NULL
)
7092 AddChild (m_WordsViewPntr
);
7094 /* Minimize the window if we are starting up in server mode. This is done
7095 before the window is open so it doesn't flash onto the screen, and possibly
7096 steal a keystroke or two. The ControlsView will further update the minimize
7097 mode when it detects changes in the server mode. */
7098 Minimize (g_ServerMode
);
7103 DisplayErrorMessage ("Unable to initialise the window contents.");
7108 DatabaseWindow::MessageReceived (BMessage
*MessagePntr
)
7110 if (MessagePntr
->what
== B_MOUSE_WHEEL_CHANGED
)
7112 /* Pass the mouse wheel stuff down to the words view, since that's the only
7113 one which does scrolling so we don't need to worry about whether it has
7116 if (m_WordsViewPntr
!= NULL
)
7117 m_WordsViewPntr
->MessageReceived (MessagePntr
);
7120 BWindow::MessageReceived (MessagePntr
);
7125 DatabaseWindow::QuitRequested ()
7127 be_app
->PostMessage (B_QUIT_REQUESTED
);
7133 /******************************************************************************
7134 * Implementation of the word display view.
7137 WordsView::WordsView (BRect NewBounds
)
7138 : BView (NewBounds
, "WordsView", B_FOLLOW_ALL_SIDES
,
7139 B_WILL_DRAW
| B_FULL_UPDATE_ON_RESIZE
| B_NAVIGABLE
| B_PULSE_NEEDED
),
7140 m_ArrowLineDownPntr (NULL
),
7141 m_ArrowLineUpPntr (NULL
),
7142 m_ArrowPageDownPntr (NULL
),
7143 m_ArrowPageUpPntr (NULL
),
7144 m_LastTimeAKeyWasPressed (0)
7146 font_height TempFontHeight
;
7148 GetFont (&m_TextFont
); /* Modify the default font to be our own. */
7149 m_TextFont
.SetSize (ceilf (m_TextFont
.Size() * 1.1));
7150 m_TextFont
.GetHeight (&TempFontHeight
);
7151 SetFont (&m_TextFont
);
7153 m_LineHeight
= ceilf (TempFontHeight
.ascent
+
7154 TempFontHeight
.descent
+ TempFontHeight
.leading
);
7155 m_AscentHeight
= ceilf (TempFontHeight
.ascent
);
7156 m_TextHeight
= ceilf (TempFontHeight
.ascent
+
7157 TempFontHeight
.descent
);
7159 m_FocusedColour
.red
= 255;
7160 m_FocusedColour
.green
= 255;
7161 m_FocusedColour
.blue
= 255;
7162 m_FocusedColour
.alpha
= 255;
7164 m_UnfocusedColour
.red
= 245;
7165 m_UnfocusedColour
.green
= 245;
7166 m_UnfocusedColour
.blue
= 255;
7167 m_UnfocusedColour
.alpha
= 255;
7169 m_BackgroundColour
= m_UnfocusedColour
;
7170 SetViewColor (m_BackgroundColour
);
7171 SetLowColor (m_BackgroundColour
);
7172 SetHighColor (0, 0, 0);
7174 strcpy (m_FirstDisplayedWord
, "a");
7179 WordsView::AttachedToWindow ()
7181 BPolygon
DownLinePolygon (g_DownLinePoints
,
7182 sizeof (g_DownLinePoints
) /
7183 sizeof (g_DownLinePoints
[0]));
7185 BPolygon
DownPagePolygon (g_DownPagePoints
,
7186 sizeof (g_DownPagePoints
) /
7187 sizeof (g_DownPagePoints
[0]));
7189 BPolygon
UpLinePolygon (g_UpLinePoints
,
7190 sizeof (g_UpLinePoints
) /
7191 sizeof (g_UpLinePoints
[0]));
7193 BPolygon
UpPagePolygon (g_UpPagePoints
,
7194 sizeof (g_UpPagePoints
) /
7195 sizeof (g_UpPagePoints
[0]));
7197 BPicture TempOffPicture
;
7198 BPicture TempOnPicture
;
7201 /* Make the buttons and associated polygon images for the forward and
7202 backwards a word or a page of words buttons. They're the width of the scroll
7203 bar area on the right, but twice as tall as usual, since there is no scroll
7204 bar and that will make it easier to use them. First the up a line button. */
7206 SetHighColor (0, 0, 0);
7207 BeginPicture (&TempOffPicture
);
7208 FillPolygon (&UpLinePolygon
);
7209 SetHighColor (180, 180, 180);
7210 StrokePolygon (&UpLinePolygon
);
7213 SetHighColor (128, 128, 128);
7214 BeginPicture (&TempOnPicture
);
7215 FillPolygon (&UpLinePolygon
);
7218 TempRect
= Bounds ();
7219 TempRect
.bottom
= TempRect
.top
+ 2 * B_H_SCROLL_BAR_HEIGHT
;
7220 TempRect
.left
= TempRect
.right
- B_V_SCROLL_BAR_WIDTH
;
7221 m_ArrowLineUpPntr
= new BPictureButton (TempRect
, "Up Line",
7222 &TempOffPicture
, &TempOnPicture
,
7223 new BMessage (MSG_LINE_UP
), B_ONE_STATE_BUTTON
,
7224 B_FOLLOW_RIGHT
| B_FOLLOW_TOP
, B_WILL_DRAW
| B_NAVIGABLE
);
7225 if (m_ArrowLineUpPntr
== NULL
) goto ErrorExit
;
7226 AddChild (m_ArrowLineUpPntr
);
7227 m_ArrowLineUpPntr
->SetTarget (this);
7229 /* Up a page button. */
7231 SetHighColor (0, 0, 0);
7232 BeginPicture (&TempOffPicture
);
7233 FillPolygon (&UpPagePolygon
);
7234 SetHighColor (180, 180, 180);
7235 StrokePolygon (&UpPagePolygon
);
7238 SetHighColor (128, 128, 128);
7239 BeginPicture (&TempOnPicture
);
7240 FillPolygon (&UpPagePolygon
);
7243 TempRect
= Bounds ();
7244 TempRect
.top
+= 2 * B_H_SCROLL_BAR_HEIGHT
+ 1;
7245 TempRect
.bottom
= TempRect
.top
+ 2 * B_H_SCROLL_BAR_HEIGHT
;
7246 TempRect
.left
= TempRect
.right
- B_V_SCROLL_BAR_WIDTH
;
7247 m_ArrowPageUpPntr
= new BPictureButton (TempRect
, "Up Page",
7248 &TempOffPicture
, &TempOnPicture
,
7249 new BMessage (MSG_PAGE_UP
), B_ONE_STATE_BUTTON
,
7250 B_FOLLOW_RIGHT
| B_FOLLOW_TOP
, B_WILL_DRAW
| B_NAVIGABLE
);
7251 if (m_ArrowPageUpPntr
== NULL
) goto ErrorExit
;
7252 AddChild (m_ArrowPageUpPntr
);
7253 m_ArrowPageUpPntr
->SetTarget (this);
7255 /* Down a page button. */
7257 SetHighColor (0, 0, 0);
7258 BeginPicture (&TempOffPicture
);
7259 FillPolygon (&DownPagePolygon
);
7260 SetHighColor (180, 180, 180);
7261 StrokePolygon (&DownPagePolygon
);
7264 SetHighColor (128, 128, 128);
7265 BeginPicture (&TempOnPicture
);
7266 FillPolygon (&DownPagePolygon
);
7269 TempRect
= Bounds ();
7270 TempRect
.bottom
-= 3 * B_H_SCROLL_BAR_HEIGHT
+ 1;
7271 TempRect
.top
= TempRect
.bottom
- 2 * B_H_SCROLL_BAR_HEIGHT
;
7272 TempRect
.left
= TempRect
.right
- B_V_SCROLL_BAR_WIDTH
;
7273 m_ArrowPageDownPntr
= new BPictureButton (TempRect
, "Down Page",
7274 &TempOffPicture
, &TempOnPicture
,
7275 new BMessage (MSG_PAGE_DOWN
), B_ONE_STATE_BUTTON
,
7276 B_FOLLOW_RIGHT
| B_FOLLOW_BOTTOM
, B_WILL_DRAW
| B_NAVIGABLE
);
7277 if (m_ArrowPageDownPntr
== NULL
) goto ErrorExit
;
7278 AddChild (m_ArrowPageDownPntr
);
7279 m_ArrowPageDownPntr
->SetTarget (this);
7281 /* Down a line button. */
7283 SetHighColor (0, 0, 0);
7284 BeginPicture (&TempOffPicture
);
7285 FillPolygon (&DownLinePolygon
);
7286 SetHighColor (180, 180, 180);
7287 StrokePolygon (&DownLinePolygon
);
7290 SetHighColor (128, 128, 128);
7291 BeginPicture (&TempOnPicture
);
7292 FillPolygon (&DownLinePolygon
);
7295 TempRect
= Bounds ();
7296 TempRect
.bottom
-= B_H_SCROLL_BAR_HEIGHT
;
7297 TempRect
.top
= TempRect
.bottom
- 2 * B_H_SCROLL_BAR_HEIGHT
;
7298 TempRect
.left
= TempRect
.right
- B_V_SCROLL_BAR_WIDTH
;
7299 m_ArrowLineDownPntr
= new BPictureButton (TempRect
, "Down Line",
7300 &TempOffPicture
, &TempOnPicture
,
7301 new BMessage (MSG_LINE_DOWN
), B_ONE_STATE_BUTTON
,
7302 B_FOLLOW_RIGHT
| B_FOLLOW_BOTTOM
, B_WILL_DRAW
| B_NAVIGABLE
);
7303 if (m_ArrowLineDownPntr
== NULL
) goto ErrorExit
;
7304 AddChild (m_ArrowLineDownPntr
);
7305 m_ArrowLineDownPntr
->SetTarget (this);
7310 DisplayErrorMessage ("Problems while making view displaying the words.");
7314 /* Draw the words starting with the one at or after m_FirstDisplayedWord. This
7315 requires looking at the database in the BApplication, which may or may not be
7316 available (if it isn't, don't draw, a redraw will usually be requested by the
7317 Pulse member function when it keeps on noticing that the stuff on the display
7318 doesn't match the database). */
7321 WordsView::Draw (BRect UpdateRect
)
7323 float AgeDifference
;
7324 float AgeProportion
;
7326 float ColumnLeftCenterX
;
7327 float ColumnMiddleCenterX
;
7328 float ColumnRightCenterX
;
7329 float CompensatedRatio
;
7330 StatisticsMap::iterator DataIter
;
7331 StatisticsMap::iterator EndIter
;
7332 rgb_color FillColour
;
7333 float GenuineProportion
;
7334 uint32 GenuineSpamSum
;
7336 float HeightProportion
;
7341 float OneFifthTotalGenuine
;
7342 float OneFifthTotalSpam
;
7343 double RawProbabilityRatio
;
7345 float SpamProportion
;
7346 StatisticsPointer StatisticsPntr
;
7348 char TempString
[PATH_MAX
];
7349 float TotalGenuineMessages
= 1.0; /* Avoid divide by 0. */
7350 float TotalSpamMessages
= 1.0;
7354 /* Lock the application. This will stop it from processing any further
7355 messages until we are done. Or if it is busy, the lock will fail. */
7357 MyAppPntr
= dynamic_cast<ABSApp
*> (be_app
);
7358 if (MyAppPntr
== NULL
|| MyAppPntr
->LockWithTimeout (100000) != B_OK
)
7359 return; /* It's probably busy doing something. */
7361 /* Set up various loop invariant variables. */
7363 if (MyAppPntr
->m_TotalGenuineMessages
> 0)
7364 TotalGenuineMessages
= MyAppPntr
->m_TotalGenuineMessages
;
7365 OneFifthTotalGenuine
= TotalGenuineMessages
/ 5;
7367 if (MyAppPntr
->m_TotalSpamMessages
> 0)
7368 TotalSpamMessages
= MyAppPntr
->m_TotalSpamMessages
;
7369 OneFifthTotalSpam
= TotalSpamMessages
/ 5;
7371 EndIter
= MyAppPntr
->m_WordMap
.end ();
7373 OldestAge
= MyAppPntr
->m_OldestAge
;
7374 NewestAge
= /* actually newest age plus one */
7375 MyAppPntr
->m_TotalGenuineMessages
+ MyAppPntr
->m_TotalSpamMessages
;
7378 goto NormalExit
; /* No words to display, or something is badly wrong. */
7380 NewestAge
--; /* The newest message has age NewestAge. */
7381 AgeDifference
= NewestAge
- OldestAge
; /* Can be zero if just one message. */
7383 LeftBounds
= Bounds().left
;
7384 RightBounds
= Bounds().right
- B_V_SCROLL_BAR_WIDTH
;
7385 Width
= RightBounds
- LeftBounds
;
7386 FillColour
.alpha
= 255;
7388 CenterX
= ceilf (LeftBounds
+ Width
* 0.5);
7389 ColumnLeftCenterX
= ceilf (LeftBounds
+ Width
* 0.05);
7390 ColumnMiddleCenterX
= CenterX
;
7391 ColumnRightCenterX
= ceilf (LeftBounds
+ Width
* 0.95);
7393 for (DataIter
= MyAppPntr
->m_WordMap
.lower_bound (m_FirstDisplayedWord
),
7395 DataIter
!= EndIter
&& Y
< UpdateRect
.bottom
;
7396 DataIter
++, Y
+= m_LineHeight
)
7398 if (Y
+ m_LineHeight
< UpdateRect
.top
)
7399 continue; /* Not in the visible area yet, don't actually draw. */
7401 /* Draw the colour bar behind the word. It reflects the spamness or
7402 genuineness of that particular word, plus the importance of the word and
7403 the age of the word.
7405 First calculate the compensated spam ratio (described elsewhere). It is
7406 close to 0.0 for genuine words and close to 1.0 for pure spam. It is drawn
7407 as a blue bar to the left of center if it is less than 0.5, and a red bar
7408 on the right of center if it is greater than 0.5. At exactly 0.5 nothing
7409 is drawn; the word is worthless as an indicator.
7411 The height of the bar corresponds to the number of messages the word was
7412 found in. Make the height proportional to the total of spam and genuine
7413 messages for the word divided by the sum of the most extreme spam and
7414 genuine counts in the database.
7416 The staturation of the colour corresponds to the age of the word, with old
7417 words being almost white rather than solid blue or red. */
7419 StatisticsPntr
= &DataIter
->second
;
7421 SpamProportion
= StatisticsPntr
->spamCount
/ TotalSpamMessages
;
7422 GenuineProportion
= StatisticsPntr
->genuineCount
/ TotalGenuineMessages
;
7423 if (SpamProportion
+ GenuineProportion
> 0.0f
)
7424 RawProbabilityRatio
=
7425 SpamProportion
/ (SpamProportion
+ GenuineProportion
);
7427 RawProbabilityRatio
= g_RobinsonX
;
7429 /* The compensated ratio leans towards 0.5 (RobinsonX) more for fewer
7430 data points, with a weight of 0.45 (RobinsonS). */
7433 StatisticsPntr
->spamCount
+ StatisticsPntr
->genuineCount
;
7435 (g_RobinsonS
* g_RobinsonX
+ GenuineSpamSum
* RawProbabilityRatio
) /
7436 (g_RobinsonS
+ GenuineSpamSum
);
7438 /* Used to use the height based on the most frequent word, but some words,
7439 like "From", show up in all messages which made most other words just
7440 appear as a thin line. I did a histogram plot of the sizes in my test
7441 database, and figured that you get better coverage of 90% of the messages
7442 if you use 1/5 of the total number as the count which gives you 100%
7443 height. The other 10% get a full height bar, but most people wouldn't care
7444 that they're super frequently used. */
7446 HeightProportion
= 0.5f
* (StatisticsPntr
->genuineCount
/
7447 OneFifthTotalGenuine
+ StatisticsPntr
->spamCount
/ OneFifthTotalSpam
);
7449 if (HeightProportion
> 1.0f
)
7450 HeightProportion
= 1.0f
;
7451 HeightPixels
= ceilf (HeightProportion
* m_TextHeight
);
7453 if (AgeDifference
<= 0.0f
)
7454 AgeProportion
= 1.0; /* New is 1.0, old is 0.0 */
7456 AgeProportion
= (StatisticsPntr
->age
- OldestAge
) / AgeDifference
;
7458 TempRect
.top
= ceilf (Y
+ m_TextHeight
/ 2 - HeightPixels
/ 2);
7459 TempRect
.bottom
= TempRect
.top
+ HeightPixels
;
7461 if (CompensatedRatio
< 0.5f
)
7463 TempRect
.left
= ceilf (
7464 CenterX
- 1.6f
* (0.5f
- CompensatedRatio
) * (CenterX
- LeftBounds
));
7465 TempRect
.right
= CenterX
;
7466 FillColour
.red
= 230 - (int) (AgeProportion
* 230.0f
);
7467 FillColour
.green
= FillColour
.red
;
7468 FillColour
.blue
= 255;
7470 else /* Ratio >= 0.5, red spam block. */
7472 TempRect
.left
= CenterX
;
7473 TempRect
.right
= ceilf (
7474 CenterX
+ 1.6f
* (CompensatedRatio
- 0.5f
) * (RightBounds
- CenterX
));
7475 FillColour
.blue
= 230 - (int) (AgeProportion
* 230.0f
);
7476 FillColour
.green
= FillColour
.blue
;
7477 FillColour
.red
= 255;
7479 SetHighColor (FillColour
);
7480 SetDrawingMode (B_OP_COPY
);
7481 FillRect (TempRect
);
7483 /* Print the text centered in columns of various widths. The number of
7484 genuine messages in the left 10% of the width, the word in the middle 80%,
7485 and the number of spam messages using the word in the right 10%. */
7487 SetHighColor (0, 0, 0);
7488 SetDrawingMode (B_OP_OVER
); /* So that antialiased text mixes better. */
7490 sprintf (TempString
, "%lu", StatisticsPntr
->genuineCount
);
7491 Width
= m_TextFont
.StringWidth (TempString
);
7492 MovePenTo (ceilf (ColumnLeftCenterX
- Width
/ 2), Y
+ m_AscentHeight
);
7493 DrawString (TempString
);
7495 strcpy (TempString
, DataIter
->first
.c_str ());
7496 Width
= m_TextFont
.StringWidth (TempString
);
7497 MovePenTo (ceilf (ColumnMiddleCenterX
- Width
/ 2), Y
+ m_AscentHeight
);
7498 DrawString (TempString
);
7500 sprintf (TempString
, "%lu", StatisticsPntr
->spamCount
);
7501 Width
= m_TextFont
.StringWidth (TempString
);
7502 MovePenTo (ceilf (ColumnRightCenterX
- Width
/ 2), Y
+ m_AscentHeight
);
7503 DrawString (TempString
);
7506 /* Draw the first word (the one which the user types in to select the first
7507 displayed word) on the right, in the scroll bar margin, rotated 90 degrees to
7508 fit between the page up and page down buttons. */
7510 Width
= m_TextFont
.StringWidth (m_FirstDisplayedWord
);
7513 TempRect
= Bounds ();
7514 TempRect
.top
+= 4 * B_H_SCROLL_BAR_HEIGHT
+ 1;
7515 TempRect
.bottom
-= 5 * B_H_SCROLL_BAR_HEIGHT
+ 1;
7517 MovePenTo (TempRect
.right
- m_TextHeight
+ m_AscentHeight
- 1,
7518 ceilf ((TempRect
.bottom
+ TempRect
.top
) / 2 + Width
/ 2));
7519 m_TextFont
.SetRotation (90);
7520 SetFont (&m_TextFont
, B_FONT_ROTATION
);
7521 DrawString (m_FirstDisplayedWord
);
7522 m_TextFont
.SetRotation (0);
7523 SetFont (&m_TextFont
, B_FONT_ROTATION
);
7528 /* Successfully finished drawing. Update the cached values to match what we
7530 m_CachedTotalGenuineMessages
= MyAppPntr
->m_TotalGenuineMessages
;
7531 m_CachedTotalSpamMessages
= MyAppPntr
->m_TotalSpamMessages
;
7532 m_CachedWordCount
= MyAppPntr
->m_WordCount
;
7534 /* Done. Let the BApplication continue processing messages. */
7535 MyAppPntr
->Unlock ();
7539 /* When the user presses keys, they select the first word to be displayed in
7540 the view (it's the word at or lexicographically after the word typed in). The
7541 keys are appended to the starting word, until the user stops typing for a
7542 while, then the next key will be the first letter of a new starting word. */
7545 WordsView::KeyDown (const char *BufferPntr
, int32 NumBytes
)
7548 bigtime_t CurrentTime
;
7549 char TempString
[40];
7551 CurrentTime
= system_time ();
7553 if (NumBytes
< (int32
) sizeof (TempString
))
7555 memcpy (TempString
, BufferPntr
, NumBytes
);
7556 TempString
[NumBytes
] = 0;
7557 CharLength
= strlen (TempString
); /* So NUL bytes don't get through. */
7559 /* Check for arrow keys, which move the view up and down. */
7561 if (CharLength
== 1 &&
7562 (TempString
[0] == B_UP_ARROW
||
7563 TempString
[0] == B_DOWN_ARROW
||
7564 TempString
[0] == B_PAGE_UP
||
7565 TempString
[0] == B_PAGE_DOWN
))
7567 MoveTextUpOrDown ((TempString
[0] == B_UP_ARROW
) ? MSG_LINE_UP
:
7568 ((TempString
[0] == B_DOWN_ARROW
) ? MSG_LINE_DOWN
:
7569 ((TempString
[0] == B_PAGE_UP
) ? MSG_PAGE_UP
: MSG_PAGE_DOWN
)));
7571 else if (CharLength
> 1 ||
7572 (CharLength
== 1 && 32 <= (uint8
) TempString
[0]))
7574 /* Have a non-control character, or some sort of multibyte char. Add it
7575 to the word and mark things for redisplay starting at the resulting word.
7578 if (CurrentTime
- m_LastTimeAKeyWasPressed
>= 1000000 /* microseconds */)
7579 strcpy (m_FirstDisplayedWord
, TempString
); /* Starting a new word. */
7580 else if (strlen (m_FirstDisplayedWord
) + CharLength
<= g_MaxWordLength
)
7581 strcat (m_FirstDisplayedWord
, TempString
); /* Append to existing. */
7587 m_LastTimeAKeyWasPressed
= CurrentTime
;
7588 BView::KeyDown (BufferPntr
, NumBytes
);
7592 /* Change the background colour to show that we have the focus. When we have
7593 it, keystrokes will select the word to be displayed at the top of the list. */
7596 WordsView::MakeFocus (bool Focused
)
7599 m_BackgroundColour
= m_FocusedColour
;
7601 m_BackgroundColour
= m_UnfocusedColour
;
7602 SetViewColor (m_BackgroundColour
);
7603 SetLowColor (m_BackgroundColour
);
7605 /* Also need to set the background colour for the scroll buttons, since they
7606 can't be made transparent. */
7608 if (m_ArrowLineDownPntr
!= NULL
)
7610 m_ArrowLineDownPntr
->SetViewColor (m_BackgroundColour
);
7611 m_ArrowLineDownPntr
->Invalidate ();
7614 if (m_ArrowLineUpPntr
!= NULL
)
7616 m_ArrowLineUpPntr
->SetViewColor (m_BackgroundColour
);
7617 m_ArrowLineUpPntr
->Invalidate ();
7620 if (m_ArrowPageDownPntr
!= NULL
)
7622 m_ArrowPageDownPntr
->SetViewColor (m_BackgroundColour
);
7623 m_ArrowPageDownPntr
->Invalidate ();
7626 if (m_ArrowPageUpPntr
!= NULL
)
7628 m_ArrowPageUpPntr
->SetViewColor (m_BackgroundColour
);
7629 m_ArrowPageUpPntr
->Invalidate ();
7634 BView::MakeFocus (Focused
);
7639 WordsView::MessageReceived (BMessage
*MessagePntr
)
7642 float DeltaY
; /* Usually -1.0, 0.0 or +1.0. */
7643 type_code TypeFound
;
7645 switch (MessagePntr
->what
)
7647 case B_MOUSE_WHEEL_CHANGED
:
7648 if (MessagePntr
->FindFloat ("be:wheel_delta_y", &DeltaY
) != 0) break;
7650 MoveTextUpOrDown (MSG_LINE_UP
);
7651 else if (DeltaY
> 0)
7652 MoveTextUpOrDown (MSG_LINE_DOWN
);
7659 MoveTextUpOrDown (MessagePntr
->what
);
7662 case B_SIMPLE_DATA
: /* Something has been dropped in our view. */
7663 if (MessagePntr
->GetInfo ("refs", &TypeFound
, &CountFound
) == B_OK
&&
7664 CountFound
> 0 && TypeFound
== B_REF_TYPE
)
7666 RefsDroppedHere (MessagePntr
);
7669 /* Else fall through to the default case, in case it is something else
7670 dropped that the system knows about. */
7673 BView::MessageReceived (MessagePntr
);
7678 /* If the user clicks on our view, take over the focus. */
7681 WordsView::MouseDown (BPoint
)
7689 WordsView::MoveTextUpOrDown (uint32 MovementType
)
7691 StatisticsMap::iterator DataIter
;
7696 /* Lock the application. This will stop it from processing any further
7697 messages until we are done (we need to look at the word list directly). Or
7698 if it is busy, the lock will fail. */
7700 MyAppPntr
= dynamic_cast<ABSApp
*> (be_app
);
7701 if (MyAppPntr
== NULL
|| MyAppPntr
->LockWithTimeout (2000000) != B_OK
)
7702 return; /* It's probably busy doing something. */
7704 PageSize
= (int) (Bounds().Height() / m_LineHeight
- 1);
7708 DataIter
= MyAppPntr
->m_WordMap
.lower_bound (m_FirstDisplayedWord
);
7710 switch (MovementType
)
7713 if (DataIter
!= MyAppPntr
->m_WordMap
.begin ())
7718 if (DataIter
!= MyAppPntr
->m_WordMap
.end ())
7723 for (i
= 0; i
< PageSize
; i
++)
7725 if (DataIter
== MyAppPntr
->m_WordMap
.begin ())
7732 for (i
= 0; i
< PageSize
; i
++)
7734 if (DataIter
== MyAppPntr
->m_WordMap
.end ())
7741 if (DataIter
!= MyAppPntr
->m_WordMap
.end ())
7742 strcpy (m_FirstDisplayedWord
, DataIter
->first
.c_str ());
7746 MyAppPntr
->Unlock ();
7750 /* This function periodically polls the BApplication to see if anything has
7751 changed. If the word list is different or the display has changed in some
7752 other way, it will then try to refresh the display, repeating the attempt until
7753 it gets successfully drawn. */
7760 /* Probe the BApplication to see if it has changed. */
7762 MyAppPntr
= dynamic_cast<ABSApp
*> (be_app
);
7763 if (MyAppPntr
== NULL
)
7764 return; /* Something is wrong, give up. */
7766 if (MyAppPntr
->m_TotalGenuineMessages
!= m_CachedTotalGenuineMessages
||
7767 MyAppPntr
->m_TotalSpamMessages
!= m_CachedTotalSpamMessages
||
7768 MyAppPntr
->m_WordCount
!= m_CachedWordCount
)
7773 /* The user has dragged and dropped some file references on the words view. If
7774 it is in the left third, add the file(s) as examples of genuine messages, right
7775 third for spam messages and if it is in the middle third then evaluate the
7776 file(s) for spaminess. */
7779 WordsView::RefsDroppedHere (BMessage
*MessagePntr
)
7782 bool SpamExample
= true; /* TRUE if example is of spam, FALSE genuine. */
7784 BPoint WhereDropped
;
7786 /* Find out which third of the view it was dropped into. */
7788 if (MessagePntr
->FindPoint ("_drop_point_", &WhereDropped
) != B_OK
)
7789 return; /* Need to know where it was dropped. */
7790 ConvertFromScreen (&WhereDropped
);
7791 Third
= Bounds().Width() / 3;
7792 Left
= Bounds().left
;
7793 if (WhereDropped
.x
< Left
+ Third
)
7794 SpamExample
= false;
7795 else if (WhereDropped
.x
< Left
+ 2 * Third
)
7797 /* In the middle third, evaluate all files for spaminess. */
7798 EstimateRefFilesAndDisplay (MessagePntr
);
7802 if (g_CommanderLooperPntr
!= NULL
)
7803 g_CommanderLooperPntr
->CommandReferences (
7804 MessagePntr
, true /* BulkMode */, SpamExample
? CL_SPAM
: CL_GENUINE
);
7809 /******************************************************************************
7810 * Finally, the main program which drives it all.
7813 int main (int argc
, char**)
7815 g_CommandLineMode
= (argc
> 1);
7816 if (!g_CommandLineMode
)
7817 cout
<< PrintUsage
; /* In case no arguments specified. */
7819 g_CommanderLooperPntr
= new CommanderLooper
;
7820 if (g_CommanderLooperPntr
!= NULL
)
7822 g_CommanderMessenger
= new BMessenger (NULL
, g_CommanderLooperPntr
);
7823 g_CommanderLooperPntr
->Run ();
7828 if (MyApp
.InitCheck () == 0)
7830 MyApp
.LoadSaveSettings (true /* DoLoad */);
7834 if (g_CommanderLooperPntr
!= NULL
)
7836 g_CommanderLooperPntr
->PostMessage (B_QUIT_REQUESTED
);
7837 snooze (100000); /* Let the CommanderLooper thread run so it quits. */
7840 cerr
<< "SpamDBM shutting down..." << endl
;
7841 return 0; /* And implicitly destroys MyApp, which writes out the database. */