 $Id: NEWS,v 1.33 2004/04/21 03:37:18 popovich Exp $

CRM114 NEWS - user visible changes (and some other changes also.)

Refer to ChangeLog for detailed per-file info.

Version crm114-2004018-BlameEasterBunny.src, 2004-04-20

* updated for BlameEasterBunny (minor files added, removed) (PEP)

Version crm114-20040409-BlameMarys.src, 2004-04-12

* updates for BlameMarys (a couple new expr*.c files)
* added tests/testscript.sh to run all tests

Version crm114-20040328-BlameStPatrick-auto.1, 2004-04-08

* major surgery on sledgehammer and prebootstrap
* cookfunc.sh (and splitfunc.awk) no longer needed
* patching sysincluds seems to not be needed
* first autoconfiscated release of Blame St. P.
* (surgery by Peter Popovich, some other changes by Raul Miller)

Version 20040221-BlameYokohama-auto.2, 2004-02-25

* Finetuning of crm114 manpage.  Thanks Seth Hanford. (JvB)
* No longer ship QUICKREF.txt since it is in crm114(1).

Version 20040221-BlameYokohama-auto.1, 2004-02-25

* Build from new public release
  crm114-20040221-BlameYokohama.src.tar.gz. (JvB)

Version 20040212-BlameJetlag, some time around 2004-02-15

(notes from Raul Miller, about autoconfiscated version)

* Build from BlameJetlag

(other notes)

* README.1st updated. Link to list archive added.

version 20040207-BlamePaoloMore-auto.1, 2004-02-07

(notes from Joost van Baal, about autoconfiscated version)

* Build from new hidden release
  crm114-20040207-BlamePaoloMore-auto.1.src.tar.gz.
* Make sure manpages are retypeset when shipping new tarball:
  MAINTAINERCLEANFILES in man/Makefile.am added.
* Added Bill's original NEWS, as shipped in README file, to this
  file.  Trimmed README accordingly.

version 20040206-BlamePaolo-auto.1, 2004-02-07

(notes from Joost van Baal, about autoconfiscated version)

* Build from new hidden release crm114-20040206-BlamePaolo.src.tar.gz.

* misc/crm114.spec and misc/rename-gz are gone: RPM will be maintained,
  build and published by Peter Popovich.

version February 2, 2004

   This is a release of CRM114 and Mailfilter.  The last few known
   bugs have been stomped (including a moderately good infinite loop detector
   for string rewrites, and a "you-didn't-set-your-password" safety check),
   the classifier algorithms have been tuned (default is full Markovian),
   and it's been moderately well tested.
   Accuracies over 99.95% are documented on real-time mail streams,
   and the overall speed is 3 to 4x faster than SpamAssassin.

   My thanks to all of you whose contributions of brain-cycles made this
   code as good as it is.

                -Bill Yerazunis

version 20040118-BlameEric-auto.1, 2004-01-22

(notes from Joost van Baal, about autoconfiscated version)

* Build from new hidden release crm114-20040118-BlameEric.src.tar.gz: New
  in this release from Bill:

   This one needed a major change to the way configuration worked in order
   to let the CAMRAM people do what they need.  There's nothing really
   scarey in it, so it's a "virtual sanity check" version.K

   However, it DOES introduce some new code and I'm not comfortable
   with putting it out there till it's better tested.

   There is ONE MAJOR CHANGE.  BE WARNED:

      mailfilterconfig.crm IS GONE!
      mailfilter.cf IS NEW AND HERE!

   (this was forced in order to support --userdir)

      WHAT's NEW IN THE NEW RELEASE:

   0) --userdir is now supported.

    Usage is

       mailfilter.crm --userdir=/home/my/user/dir/
                                                 ^
                                                 |
   DON'T FORGET THE CLOSING SLASH!  This is a genuine prefix and
   if you leave off the slash you will get _nothing_.

   This doesn't do a "cd wherever" like -u does; instead this
   changes the prefix of all files referenced by mailfilter.crm
   (and it does it the hard way, in the script.  Not with a "cd")

   1) mailfilterconfig.crm is No More.  It's totally deprecated.
   1a) the new configuration file is "mailfilter.cf"

   2) likewise, camrammer.crm is totally deprecated.  Use --stats_only
   in mailfilter instead.

   3) mailfilter.cf is NOT an insert file.  Instead, there's a little
   mini-parser that runs in mailfilter.crm that reads mailfilter.cf and
   sets up variables.

   The syntax is extremely obvious when you read the "mailfilter.cf" file.  :)

   4) learnspam.crm and learnnonspam.crm are likewise out of date and
   totally deprecated.  Use mailfilter --learnspam and --learnnonspam
   respectively instead.

   5) some file permissions ought to be better than they were before.
   It seems right to me, but let me know.

                                         -Bill Yerazunis

* Changed prebootstrap to add a leading ``#include config.h'' to
  src/crmregex_tre.c.  This fixes a bug in 20040111a-v1.0-SanityCheck-auto.1:
  that release failed to build on systems with libtre installed.

version 20040118

(Notes from Bill Yerazunis, on non-autoconfiscated version)

   It turns out that CAMRAM needs (as in is a virtual showstopper) the
   ability to specify which user directory all of the files are to be 
   found in.  Since #insert _cannot_ do this (it's compile time, not
   run time), mailfilter.crm (and classifymail.crm) now have a new
   --fileprefix=/somewhere/  option.

   To use it, put all of the files (the .css's, the .mfp's etc) that are
   on a per-user basis in one directory, then specify

     mailfilter.crm --fileprefix=/where/the/files/are/

   Note that this is a true prefix- you must put a trailing slash
   on to specify a directory by that name.  On the other hand, you can
   specify a particular prefix on a per-user basis, e.g.:

     mailfilter.crm --fileprefix=/var/spool/mail/crm.conf/joe-

   so that user "joe" will use mailfilter.crm with these files:

        /var/spool/mail/crm.conf/joe-mailfilter.cf
        /var/spool/mail/crm.conf/joe-rewrites.mfp
        /var/spool/mail/crm.conf/joe-spam.css
        /var/spool/mail/crm.conf/joe-nonspam.css

   and so on.  Note that this does NOT override --spamcss and --nonspamcss 
   options; rather, the actual .css filenames are the concatenation of 
   the fileprefix and spamcss (or nonspamcss) names.

version 20040111a-v1.0-SanityCheck-auto.1, 2004-01-14

(notes from Joost van Baal, about autoconfiscated version)

* Build from new hidden release crm114-20040111a-v1.0-SanityCheck.src.tar.gz
  (missed crm114-20040107-1.0-SanityCheck.src.tar.gz)
* We now support both GNU regexp library as shipped with libc, as well as Ville
  Laurikari's libtre.  On systems lacking a libtre, we fallback to using the GNU
  regexp library.
  Since GNU regex needs ``#include <regex.h>'', we patched
  src/crm114_sysincludes.h, to take ./configure's output into account.
  We now ship Bill's crmregex_gnu.c source file, in order to facilitate this.
* README.1st updated: note on status of this sideproject, not on relevance of
  original build instructions.
* megatest.log is renamed to megatest_knowngood.log by Bill

Version 20040105 (recheck)

(notes from Bill)

   The only fixes here are to make the Makefile
   a little more bulletproof and lets you know how to fix a messed-up 
   /etc/ld.so.conf, and of course this document has been updated.

   Otherwise this version should be the same as
   the December 27 2003 (SanityCheck) version, which has no reported
   reproducible bugs higher than a P4 (documentation and feature request).

   For the last two weeks, I had _one_ outright error and two that I
   myself found borderline out of about 5000 messages.  That's 2x
   better than a human at the same task.

   My thanks to all of you whose contributions of brain-cycles made this
   code as good as it is.

        -Bill Yerazunis

version 20040102-1.0-SanityCheck-auto.1, 2004-01-06

* Build from new official release crm114-20040102-1.0-SanityCheck.src.tar.gz

version 20031229-1.0-SanityCheck-auto.2, 2004-01-01

(notes from Joost van Baal, about autoconfiscated version)

* cssdiff(1), cssmerge(1), cssutil(1) manpages added.
* HACKING gets distributed and installed now.
* All .crm scripts have a flexible !#-line: path/to/crm gets adapted, according
  to ./configure's --prefix.
* A symlink bindir/crm to crm114 gets created.

version 20031229-1.0-SanityCheck-auto.1, 2003-12-30

(notes from Joost van Baal, about autoconfiscated version)

* Build from new official release crm114-20031229-1.0-SanityCheck.src.tar.gz
* New C sourcefile splitting: changed the prefix on the split files from
  "crm114_" to just "crm_".  The only ones that stay as "crm114_" are the .h
  files.
* Install libexec/crm114/* and doc/crm114/examples/tests/* as executables.
* Install pad.crm, pad.dat, shroud.crm (these got lost in the file reshuffling)
* Start building support for ./configure-time expanding of shebangs in crm
  scripts.
* Added explicit run-time check to configure.ac, in order to catch host with
  half-setup libtre early.

Version 20031227 (SanityCheck)

(notes from Bill)

   This  is (hopefully) the last test version before V1.0, and bug
   fixes are minimal.  This is really a sanity check release for V1.0 .

   It is now time to triage what needs to be fixed
   versus what doesn't, and very few things NEED to be fixed.
   Things that changed (or not) are:

      1) BUGS ACTUALLY FIXED:

         removed the arglist feature from mailfilter.crm; there's a
         poorly understood bug in NetBSD versus Linux that breaks things.

         allmail.txt flag control wasn't being done correctly.  That's
         fixed.

         a couple of misleading comments in the code are fixed.


     2) THINGS THAT ARE NOT CHANGED IN THIS VERSION BUT ARE V1.1 CANDIDATES:

         the install location fix is NOT in V1.0.  This will move
         the location of the actual binary (/usr/bin/crm versus
         /usr/local/bin/crm-<version> and then add a symlink
         /usr/bin/crm --> /usr/local/bin/crm-<favored version> )

         the --mydir feature of mailfilter.crm is not yet implemented
         and won't be in V1.0 .  Expect it in V1.1

   Other than that and a few documentation fixes, this version is identical
   to 20031217.  It's just the final sanity check before we do V1.0

version 20031219-RC12.6, 2003-12-22

(notes from Joost van Baal, about autoconfiscated version)

* Oops, configure.ac was looking for c++: this bug is fixed.  Furthermore,
  ./configure now exits in absence of lib TRE.  Cleaned up configure.ac:
  removed some bogus AC_CHECK_FUNCS checks, a.o.
  *.mfp files now get installed in doc/crm114/examples/crmfilter/ .

version 20031219-RC12.5, 2003-12-21

(notes from Joost van Baal, about autoconfiscated version)

* Splitted crm144.c in multiple source files, using Paolo P's
  scripts.

version 20031219-RC12.4, 2003-12-20

(notes from Joost van Baal, about autoconfiscated version)

* Restructured tarball layout according to Paolo P's ideas.  (Splitting
  sources still to do.)

version 20031219-RC12.3, 2003-12-19

(notes from Joost van Baal, about autoconfiscated version)

* Ships and install example .crm's too, as well as documentation.  New layout
  of tarball: split stuff among directories.

version 20031219-RC12.2, 2003-12-19

(notes from Joost van Baal, about autoconfiscated version)

* Now installs some docs.

version 20031219-RC12.1, 2003-12-19

(notes from Joost van Baal, about autoconfiscated version)

* Autoconfiscated test release.

version 20031219-RC12, 2003-12-19

* Release by Bill Yerazunis.

Version 20031215-RC11

(notes from Bill)

 Minor bugs smashed.  Math evaluation now works decently (but be nice
 to it).  Mailfilter accuracy is up past 99.9% (less than 1 error per
 thousand, usually when a spammer joins a well-credentialed list and
 spams the list, or a seldom-heard-from friend sends a one-line message
 with a URL wrapped in HTML).  Command line features for CAMRAM added
 ("--spamcss" and "--nonspamcss"; these will probably become unified to
 a --mydir).  Lots of documentation updates; if it says something in 
 the documentation, there's actually a good chance it works as described.

version 20031129-RC11.1, 2003-12-18

(notes from Joost van Baal, about autoconfiscated version)

* First test release of autoconfiscated branch.

version 20031129-RC11, 2003-11-29

* Release by Bill Yerazunis.

Version 20031111-RC7

(notes from Bill)

 More bugs smashed- there are still a few outstanding bugs here and
 there, but you aren't likely to find them unless you're really pushing
 the limits.  Improvements are everywhere; You can now embed the
 classical C escape chars in a var-expanded string (e.g. \n for a
 newline) as well as hex and octal characters like \xFF and \o132.)
 EVAL now can do string length and some RPN arithmetic/comparisons;
 approximate regexing is now available by default, and the command line
 input is improved.

Version 20031101-RC4  (November 1, 2003)

(notes from Bill)

 The only changes this release are some edge-condition bugfixes (thanks
 to Paolo and JSkud, among others) and the inclusion of Ville
 Laurikari's new TRE 0.6.0-PRE3 regex module.  This regex module is
 tres-cool because it actually has a useful approximate matcher built
 right in, dovetailed into the REGEX syntax for #-of-matches.

 Consider the regex /aaa(foo){1,3}zzz/ .  This matches "foo", "foofoo", or 
 "foofoofoo".  Cognitively anything in a regex's {} doesn't say what
 to match, just how to match it.

 The cognitive jump you hve to take here is /foo{bar}/ can have a {bar}
 that says _how accurately_ to match foo.  For instance:

        foo{~} 

 finds the _closest_ match to "foo" (and it always succeeds).

 The full details of approximate matching are in the quickref.  

 Read and Enjoy.

 (for your convenience, we also include the well-proven 0.5.3 TRE library,
 so you should install ONE and ONLY one of these.  Realize that
 0.6.0-PRE3 is still a fairly moderately tested library; install
 whichever one meets your need to bleed.  :-)  )

version Oct 23, 2003  ( version 20031023-RC3 )

(notes from Bill)

  Yes, we're now at RC3.  Changes are that EVAL now works right, lots
  of bugfixes, and the latent code for RFC-compliant inoculation is 
  now in the shipped mailfilter.crm (but turned off in mailfilter.cf)
  All big changes are being deferred to V1.1 now; this is bugfix city.  

  Make it bleed, folks, make it _bleed_.

     -Bill Yerazunis

version October 15, 2003

(notes from Bill)

 It's been a long road, but here it is - RC1, as in Release Candidate 1.
 WINDOW and HASH have been made symmetrical, the polynomials have been
 optimized, and it's ready.  Accuracy is steady at around 3 nines.
 Because of all the bugfixes, upgrading to this version (compatible with the
 BETA series) is recommended.

        -Bill Yerazunis

Version This is the September 25th 2003 BETA-2

(notes from Bill)

 What's new: a few dozen bugs stomped, and new functionality
 everywhere.  Command line args can now be restricted to acceptable
 sets; <microgroom> will keep your .css files nicely trimmed; ISOLATE
 will copy preexisting captures, --learnspam and --learnnonspam in
 mailfilter.crm will perform exactly the same configured mucking as
 filtering would, and then learn; --stats_only will generate ONLY the
 'pR' value (this is mostly for CAMRAM users), positional args will be
 assigned :_posN: variables, the kit has been split so you don't have
 to download 8 megs of .css if you are building your .css locally, and
 it's working well enough that this is a full BETA release.

Version August 07, 2003 bugfix release.

(notes from Bill)

 Changes: lots and lots of bugfixes.  Really.  The only new code is
 experimental code in mailfilter (to add 'append verbosity as
 attachment') and getting WINDOW to work on any variable, everything
 else is bugstomping or enhanced testing (megatest.sh runs a lot of tests
 automatically now).

 There's still a bug or dozen out there, so keep sending me bug reports!

 (and has anyone else done the cssutil --> cssmerge to build small .css files
 for fast running?)

Version This is the July 23, 2003 alpha release.

(notes from Bill)

 This release is a bugfix release for the July 20 secret release.

 Fixes include: configuration toggles for allmail.txt and rejected_mail.txt,
 execution time profiling works, (-p generates an execution time profile, 
 -P now limits number of statements in program), 

 Good news: the new .css file format seems to be working very well;
 although we spend a little more time in .css evaluation, the accuracy
 increase is well worth it (I've had _one_ error since 07-20, a false
 accept to a mailing list that came back as "marginally nonspam" because
 the mailing list is usually squeaky clean).   

 Merging works well; you can now make your .css files as big (or small)
 as you dare (within reason; you'll need to throw away features if 
 you want to compress the heck out of it and you'll use lots of memory or
 page like crazy if you make them too big).  If experiment shows that
 this memory usage is excessive, let me know and I'll see if I can do
 a less-space-for-more-time tradeoff.

 Profiling indicates that we spend more time in blacklist processing
 than in the whole SBPH/BCR evaluator, (which isn't that surprising,
 when you get down to it), so maybe trimming the blacklist to people
 who spam _you_ would be a good performance improvement.

 Anyway, here you go; this is a _recommended_ release.  Grab it and
 have fun. :)

 As usual, prior news and updates are at the end of this file.

Version 2003 July 19

(notes from Bill)

 This is the July 19, 2003 SECRET alpha release.  It won't be linked on
 the webpage- the only people who will know about it are the ones who
 get this email.  Y'all are special, you know that?  :-)  

 Since this is a SECRET release, you all have a "need to know".  That
 need is simple: I'd like to get a little more intense testing on this
 new setup before I put it out for general release.

 Enough has changed that you _need_ to read ALL the news before
 you go off and install this version.  Be AFRAID.  :)

 LOTS of changes have occurred - the biggest being that the new,
 totally incompatible but far better .css format has been implemented.
 The new version has everything you all wanted- both for people who
 want huge .css files, and for people who want _smaller_ .css files.

 This new stuff has necessitated scouring cssutil and cssdiff 
 so don't use the old versions for the new format files.  

 Lastly, because the old bucket max was 255 and the new is 4 gigs, the
 renormalization math changed a little.  Expect pRs to be closer to 0
 until you train some more.  Accuracy should be better, even _before_
 training, so overall it's a net win.

 There's also string rewriting in the pre-classification stage (who
 wanted that?  Somebody did....) and since term rewriting is so darn
 useful, I'm releasing an expurgated version of the string rewriter I
 use to scrub my spam and nonspam text of words that should not be
 learned.  This scrubber automatically gets used if you "make cssfiles".

 Here's the details:

  1) The format of the .css files has changed drastically.  What used to
  be a collisionful (and error-accepting) hash is now a 64-bit hash
  that is (probably) nearly error free, as it's also tagged with the
  full 64-bit feature value; if two values clash as to what bucket
  they would like to use, proper overflow techniques keep them from
  both using the same bucket.  Bucket values were maxxed at 255 (they
  were bytes) now they're 32-bit longs, so you are _highly_ _unlikely_ to max
  out a bucket.  These two changes make things significantly more
  robust.  

  These changes also make it possible (in fact, trivial) to
  resize (both upward and downward!), compress, optimize, and
  do other very useful things to .css files.  Right now, the only
  supported operation is to _merge_ one .css file onto another... but
  the good news is that now these files can be of different sizes!
  So, the VERY good news is that you can look at your .css files with
  cssutil, decide if (or where) you want to zero out less significant
  data, and then use dd to create a blank, new outfile.css file that will
  be about half to 2/3 full, then use cssmerge outfile.css infile.css to 
  merge your infile.css into the outfile.css.  

  This will be a real help for people who have (or need) very large OR very 
  small .css files.  :)

  You can create the blank .css file with the command 'dd' as in:

 dd bs=12 count=<number of feature buckets desired> if=/dev/zero of=mynew.css

  (the bs=12 is because the new feature buckets are 12 bytes long)

  Because chain overflowing is done "in table, in sequence" you can't 
  have more features than your table has feature buckets.  You'll get a
  trappable error if you try to exceed it. 

  Minor nit- right now, feature bucket 0 is reserved for version 
  info- but it's never used (left as all 0's).  That's no major 
  hassle, but just-so-you-know... :)

  2) A major error in error trapping has been corrected.  TRAPs can now
  nest at least vaguely correctly; a nonfatal trap that is bounced does
  not turn into a fatal.  Also, the :_fault: variable is gone, each
  TRAP now specifies it's own fault code.

  This isn't to say that error trapping is now perfect, but it's a 
  darn sight better than it was before.

  3) term rewriting on the matched or learned text is now supported; this
  will mean significant gains in out-of-the-box accuracy as well as keeping
  your mail gateway name from becoming a spam word.  :)  Far more fancy
  rewritings can be implemented, if you should choose.

  The rewriting rules are in rewrites.mfp - YOU must edit this to match
  your local and network mailer service configuration, so that your
  email address, email name, local email router, and local mail router
  IP all get mapped to the same strings as the ones I built the
  distribution .css files with.

  4) Minor bugs - a minor bug (inaccurate edge on matching) for 
  the polynomial; annoying segfault on insert files that ended with
  '#' that were immeidately followed by a { in the main program was fixed;

  5) a new utility is provided - rewriteutil.crm.  This utility can do
  string rewriting for whatever purpose you need.  I personally use it 
  to "scrub" the spam and nonspam text files; the file
  scrub_mailfile_rewrites.mfp contains an (expurgated) set of rewrite
  rules that I use.  You will need to edit scrub_mailfile_rewrites.mfp
  to put your account name and password in, otherwise you'll be using
  mine (and losing accuracy)

  For examples on the term rewriting, both in the mailfilter and in 
  the standalone utility rewriteutil.crm, just look at the example/test
  code in rewritetest.crm (which uses the rewrite rules in test_rewrites.mfp)

Version July 1, 2003 alpha release.

 This is a further major bugstomping release.  The .css files are
 expanded to 8 megabytes to decrease the massive hash-clashing that has
 occurred.  UNION and INTERSECTION now work as described in the
 (updated) quickref.txt, with the (:out:) var in parens and the [:in1:
 :in2: ...] vars in boxes.  A major bug in LEARN and CLASSIFY has been
 stomped; however this is a "sorta incompatible" change and you are
 encouraged to rebuild your .css files with a hundred Kbytees or so of
 prime-grade spam and nonspam (which has been stored for you in
 spamtext.txt and nonspamtext.txt).  The included spam.css and
 nonspam.css files are already rebuilt for the corrected bug in LEARN
 and CLASSIFY.  These .css files are also completely fresh and new;
 I restarted learning about a week ago and they're well into the 99.5% 
 accuracy range.  

Version June 23, 2003 alpha release.

 This is a major bugstomping release.  <fromstart> <fromcurrent> and
 <backwards> now seem to work more like they are described to work.
 The backslash escapes now are cleaner; you may find yuor programs work
 "differnently" but it _should_ be backward_compatible.  The
 preprocessor no longer inserts random carriage returns.  A '\' at the
 end of a line is a continuation onto the next line.  Mailfilter now
 can be configured for separate exit codes on "nonspam", "spam" and
 "problem with the program".  Exit codes on CRM114 itself have been
 made more appropriate; compiler errors and untrapped fatal faults now
 give an error exit code.  Additionally, FAULT and TRAP are scrubbed,
 and the documentation made more accurate.

June 10, 2003 news:

        This new version implements the new FAULT / TRAP semantics, 
        so user programs can now do their own error catching and
        hopefully error fixups.  Incomplete statements are now flagged
        a (little bit) better.

        Texts are now Base64-expanded and decommented before being learned

        There's a bunch of other bugfixes as well.

        Default window size is dropped to 8 megs, for compatiblity
        with HPUX (change this in crm114_config.h).

June 01, 2003 news:

        the ALIUS statement - provides if/then/else and switch/case
        capabilities to CRM114 programmers.  See the example code in
        aliustest.crm to get some understaning of the ALIUS statement.

        the ISOLATE statement - now takes a /:*:initial: value / for the
        freshly isolated variable.  

        Mailfilter.crm is now MUCH more configurable, including inserting
        X-CRM114-Status: headers and passthru modes for Procmail, configurable
        verbosity on statistics and expansions, inserting trigger 'ADV:' tags
        into the subject line, and other good integration stuff.

        Overall speed has improved significantly - mailfilter is now about
        four times FASTER than SpamAssassin with no loss of accuracy.

        bugfix - we now include Ville Laurikari's TRE regexlib version 0.5.3
        with CRM114; using it is still optional ("make experimental") 
        but it's the recommended system if your inputs include NULL bytes.

        bugfix - OUTPUT to non-local files now goes where it claims, 
        it should no longer be necessary to pad with a bunch of spaces.

        yet more additions to the .css files


April 7th, 2003 version:

 0) We're now up to "beta test quality"... no more "alpha"
 quality level.   This is good.  :-)

 1) As always, lots of bugfixes.  And LOTS of thanks from all of you
 poor victims out there.  We've reached critical mass to the point now
 where I'm even getting bug _fix_ suggestions; this is great!  

 If you do make a bug report or a bugfix suggestion, please include not
 only the version of CRM114 you're running, but also the OS and version
 of that OS you're running.  I've seen people porting CRM114 to Debian,
 to BSD, to Solaris, and even to VMS... sp please let me know what
 you're running when you make a bug report.  PLEASE PUT AT LEAST THE
 CRM114 VERSION IN THE SUBJECT LINE.

 2) We now have an even better 'mailfilter.crm' .  Even with the highly
 evolved spam in the last couple of, we're still solidly above 99%
 (averaging around 99.5%).  (it's clear that the evolution is due
 to the pressures brought by Bayesian filters like CRM114)... some
 of these new spams are very, VERY good.  But we chomp 'em anyway.  :-)

 3) The new metaflag "--" in a CRM1114 command line flags the
 demarcation between "flags for CRM114" and "flags for the user program
 to see as :_argN:". Command line arguments before the "--" are seen
 only by CRM114; arguments after the "--" are seen only by the user
 program.

 4) EXPERIMENTAL DEPARTMENT: We now have better support for the
 8-bit-clean, approximate-capable TRE regex engine.  It's still
 experimental, but we now include TRE 0.5.1 directory in this kit; you
 can just go into that subdirectory, do a .configure, a make, and a
 make install there, and you'll have the TRE regex engine installed
 onto your machine (you need to be root to do this).  Then go back up
 to the main install directory, and do a "make experimental" to compile
 and install the experimental version as /usr/bin/crma (the 'a' is for
 'approximate regex support'.

 Using the experimental version 'crma' WILL NOT AFFECT the 
 main-line version 'crm'; both can coexist without any problems.

 To use the approximate regex support (only in version 'crma') just add a
 second slashed string to the MATCH command.  This string should contain 
 four numbers, in the order SIMD (which every computer hacker should
 remember easily enough).  The four integers are the:

        Substitution cost,
        Insertion cost
        Maximum cost
        Deletion cost

 in an approximate regex match.  If you don't add the second slash-delimited
 string, you get ordinary matching.

 Example:

        match  /foobar/    /1 1 1 1/

 means match for the string "foobar" with at most one substitution,
 insertion, or deletion.

 This syntax will eventually improve- like the makefile says, this
 is an experimental option.  DO NOT ASSUME that this syntax will not
 change TOTALLY in the near future.  

 DO NOT USE THIS for production code.

 4) Yet futher improvements to the debugger.

 5) Further improvements to the classifier and the shipped .css files.

 6) The "stats" variable in a CLASSIFY statement now gives you an extra
 value- the pR number.  It's pR for the same reason pH is pH - it gives an
 easy way to express very large numeric ratios conveniently.

 The pR number is the log base 10 of the .css matchfile signal strength
 ratios; it typically ranges from +350 or so to -350 or so.  If you're
 writing a system that uses CRM114 as a classifier, you should use pR
 as your decision criterion ( as used by mailfilter.crm and
 classifymail.crm, pR values > 0 indicate nonspam, <0 indicates spam )

 If you want to add a third classification, say "SPAM/UNSURE/NONSPAM",
 use something like pR > 100 for nonspam, between +100 and -100 for
 unsure, and < -100 for spam.  CAMRAM users, take note.  :)

 6) The functionality of 'procmailfilter.crm' has been merged back into
 mailfilter.crm, classifymail.crm, learnspam.crm and learnnonspam.crm.
 Do NOT use the old "procmailfilter.crm" any more - it's buggy,
 booger-filled, and unsupported from now on.  PLEASE PLEASE
 PLEASE don't use it, and if you have been using it, please stop now!


Jan 28th release news

 Many thanks to all of you who sent in fixes, and taught me some nice
 programming tricks on the side.

 0) INCOMPATIBLE CHANGES: 

   a) INCOMPATIBLE (but regularizing) change: Input took from the file
      [this-file.txt] but output went to (that-file.txt); this was a
      wart and is now fixed; INPUT and OUTPUT both now use the form of

        INPUT  [the-file-in-boxes.txt]

      and

        OUTPUT [the-file-in-boxes.txt]

   b) INCOMPATIBLE (but often-requested) change: You don't need to say
      "#insert" any more.  Now it's just ' insert ', with no '#' .  Too many
      people were saying that #insert was bogus, and it was too easy to
      get it wrong.  Now, insert looks like all other statements;
    
            insert yourfilenamehere.crm

   c) The gzip file no longer unpacks into "installdir", but into
      a directory named crm114-<versionnumber> .   

 1) BUGFIXES: bugs stomped all over the place - debugger bugs (now the
 debugger doesn't go into lalaland if an error occurs in a batch file),
 infinite loop on bogus statements fixed, debugger "n" not doing the
 right thing), window statement cleaned and now works better, '\' now
 works correctly even in /match patterns/, default buffer length is now 16 
 megabytes (!), the program source file is now opened readonly.

 2) 8-BIT-CLEAN: code cleanups and reorganizations to make CRM114
 8-bit-cleaner; There may be bugs in this (may?  MAY?) but it's a
 start.  (note- you won't get much use of this unless you also turn on
 the TRE engine, see next item.)

 3) REGEX ENGINES: the default regex engine is still GNU REGEX (which
 is not 8-bit-clean) but we include the TRE regex engine as well (which
 is not only 8-bit-clean, but also does approximate regexes.  TRE is
 still experimental, you will need to edit crm114_config.h to turn it
 on and then rebuild from sources.  Do searches of www.freshmeat.net to
 see when the next rev of TRE comes out. 

 4) SUBPROCESSES: Spawned minion buffers now set as a fraction of the
 data window size, so programs don't die on overlength buffers if they
 copy a full minion output buffer into a non-empty main data window.
 The current default size is scaled to the size of the main data buffers,
 currently 1/8th of a data buffer, with the new default of a 16-meg
 allocate-on-the-fly data buffer that means your subprocesses can 
 spout up to 2 megs of data before you need to think about using
 asynchronous processes.

 5) The debugger now talks to your tty even if you've redirected stdin
 to come from a data file.  EOF on the controlling tty exits the
 program, so -d nnnn sets an upper limit on the number of cycles an
 unattended batch process will run before it exits.  (this added because
 I totally hosed my mailserver with an infinite loop.  Quite the
 "learning experience", but I advise against it. )

 6) An improved tokenizer for mail filtering.  You can pick any of 

 7) Option for exit codes for easy ProcMail integration, so the old 
 "procmailfilter.crm" file goes away, it's no longer necessary to 
 have that code fork.,

 8) For those of you who want eaiser integration with your local mail
 delivery software, without all the hassle of configuring
 mailfilter.crm, there's three new very bare-bones programs, meant to
 be called from Procmail.  These do NOT use the blacklist or whitelist
 files, nor can they be remotely commanded like the full
 mailfilter.crm:

        learnspam.crm
        learnnonspam.crm
        classifymail.crm

  * learnspam.crm < some-spam.txt 

       will learn that spam into your current spam.css database.  Old
        spam stays there, so this is an "incremental" learn.

  * learnnonspam.crm < some-non-spam.txt 

       will learn that nonspam into your current nonspam.css database.  Old
        nonspam stays there, so this is an "incremental" learn.

  * classifymail.crm < mail-message.txt 

       will do basic classification of text.  This code doesn't
        do all the advanced things like base-64 armor-piercing nor
        html comment removal that mailfilter.crm does, and so it
        isn't as accurate, but it's easier to understand how to set
        it up and use it.  Classifymail.crm returns a 0 exit code on
        nonspam, and a 1 exit code on spam.  Simple, eh?  Classifymail
        does NOT return the full text of the message, you need to 
        get that another way (or modify classifymail.crm to
        output it- just put an "accept" statement right before 
        the two "output ..." statements and you'll get the full 
        incoming text, unaltered.

November 26, 2002:

 NEW Built-in Debugger - the "-d" flag at the end of the command line
 puts you into a line-oriented high-level debugger for CRM114 programs.

 Improved Classifier - the new classifier math is giving me > 99.92%
 accuracy (N+1 scaling).  In other words, once the classifier is
 trained to your errors, you should see less than one spam per
 thousand sneak through.

 Bug fixes - the code base now should compile more cleanly on newer
 systems that have IEEE float.h defined.

 Security fix- a non-exploitable buffer overflow fixed

 Documentation fixes - Serious doc errors were fixed

Nov 8th 2002 version

        *) Procmail users: a version of mailfilter.crm specifically
           set up for calling from inside procmail is included-
           see the file "procmailfilter.crm" for the filter, and
           "procmailrc.recipe" for an example recipe of how to call it.
           (courtesy Craig Hagan) 

        *) Bayesian Chain Rule implemented - scoring is now done
           in a much more mathematically well-founded way.  
           Because of this, you may see some retraining required,
           but it shouldn't be a lot.  Users that couldn't 
           use my pre-supplied .css files should delete the supplied
           .css files and retrain from their own spamtext.txt and 
           nonspamtext.txt files.

        *) classifier polynomial calculation has been improved but is
           compatible with previous .css files. 

        *) -s will let you change the default size for creating new 
           .css files (needed only if you have HUGE training sets.)
           Rule of thumb: the .css files should be at least 4x the
           size of the training set.

        *) Multiple .css files will now combine correctly - that is,
           if you have categorized your mail into more than "spam" and
           "nonspam", it now works correctly.  Ex: You might create categories
           "beer", "flames", "rants", "kernel", "parties",
           and "spam", and all of these categories will plug-and-play 
           together in a reasonable way, 

        *) speed and correctness improvements - some previously fatal
           errors can now be corrected automagically.

Oct 31 2002

 Bayesian Chain Rule implemented - scoring is now done in a much more
 mathematically well-founded way.  Because of this, you may see some
 retraining required, but it shouldn't be a lot.  Users that couldn't
 use my pre-supplied .css files should delete the supplied .css files
 and retrain from their own spamtext.txt and . nonspamtext.txt files.
 Classifier polynomial calculation has been improved but is compatible
 with previous .css files.  -s will let you change the default size for
 creating new .css files (needed only if you have HUGE training sets.)
 Rule of thumb: the .css files should be at least 4x the size of the
 training set.  Multiple .css files will now combine correctly - that
 is, if you have categorized your mail into more than "spam" and
 "nonspam", it now works correctly.  Ex: You might create categories
 "beer", "flames", "rants", "kernel", "parties", and "spam", and all of
 these categories will plug-and-play together in a reasonable way, e.g.

  classify (flames.css rants.css spam.css | beer.css parties.css kernel.css)

 will split out flames, rants, and spam from beer, parties, and
 linux-kernel email.  (I don't supply .css files for anything but spam
 and nonspam, though.)  Lastly, there are some new speed and correctness
 improvements - some previously fatal errors can now be corrected
 automagically.

Oct 21 2002

    Improvements everywhere - a new symmetric declensional parser,
    a much more powerful and accurate sparse binary polynomial hash 
    system ( sadly, incompatible; - if you LEARNed new data into the
    .css files, you must use learntest.crm to LEARN the new data into
    the new .css files as the old file used a less effective polynomial.)
    Also, many bugfixes including buffer overflows fixed, -u to change
    user, -e to ignore environment variables, optional [:domain:]
    restrictions allowed on LEARN and CLASSIFY, status output on
    CLASSIFY, and exit return codes.  Grotty code has
    been removed, the Remote LEARN invocation now cleaned up, and CSSUTIL 
    has been scrubbed up.

Oct 5 2002

    Craig Rowland points out a possible buffer exploit- it's
    been fixed.  In the process, the -w flag now boosts all intermediate
    calculation text buffers as well, so you can do some big big things
    without blowiing the gaskets.  :)


