Reference: File Scores.hst

Purpose:

The rules in file "Scores.hst" determine, which articles will be loaded immediately, which will be logged in the Killfile-Log and which will be ignored completely when pulling new news.

This filtering is done with a technique called "scoring". In Hamster, this means, that each article starts with a score-value of zero and then gains or loses points if it matches one or more of the given score-rules.

The final score-value then determines, if the article will be loaded. If the value is greater than or equal to zero (>=0), it will be loaded immediately. Otherwise (<0) it will not be loaded but saved in the Killfile-Log. To keep this logfile small and clear, you can further set a score-limit, which prevents log entries for articles with a "very low" score-value.

Syntax Overview:

	ScoreFile       = *( ScoreBlock / cEOL )

	ScoreBlock      = ScoreScope *( ScoreRule / cEOL )

	ScoreScope      = "[" ScopePattern *( 1*WSP ScopePattern ) "]" cEOL
	ScopePattern    = [ "+" / "-" ] Pattern

	ScoreRule       = ["?"] ["="] ScoreValue 1*WSP ScoreSelection cEOL
	ScoreValue      = ( "+" / "-" ) <Number>

	ScoreSelection  = [ScoreSpecial] ScoreDefField 1*( 1*WHSP ScorePattern )
	ScoreSpecial    = "unless" 1*WSP

	ScoreDefField   = [ "~" ] ScoreField
	ScorePattern    = ["+"/"-"] [ "@" ScoreField ":" ] Pattern
	ScoreField      = ( "Number" / "Subject" / "From" / "Date" /
	                    "Message-ID" / "References" / "Bytes" /
	                    "Lines" / "Xref" / "Xpost" / "Age" ) [":"]

	Pattern         = ( PatRegExp / PatSimple )
	PatRegExp       = "{" <PCRE-style regex-pattern> "}"
	PatSimple       = ( PatSimpleAll / PatSimpleText / PatSimpleNumber )
	PatSimpleAll    = "*"
	PatSimpleText   = """ <Text> """
	PatSimpleNumber = "%" ( "<" / "=" / ">" ) <Number>

	cEOL            = [ "#" <Comment> ] CRLF

Score Scope:

Each section starts with a "[...]"-header describing the newsgroup names for which the following score-lines should be tested:

	[*]
	# score-lines for all groups

	[* -".announce"]
	# score-lines for all groups except those containing ".announce"

	["news" "usenet"]
	# score-lines for all groups containing "news" or "usenet".

	[{^news\.} {^alt\.usenet\.}]
	# score-lines for all groups starting with "news." or "alt.usenet."

There are also two special keywords that have a specific meaning: $POST$ will be used for messages posted to a newsgroup with a newsreader and $FEED$ will be used for messages that were injected by other servers with feed commands:

	[$POST$]
	# score-lines for posted messages

	[$FEED$]
	# score-lines for feeded messages

The patterns within "[...]" follow the same rules as the "Score-Patterns" described below.

Score Rules (tested BEFORE loading an article):

The score-value for a tested article is raised with "+"- and lowered with "-"-values:

	+100 subject "hamster"
	-100 subject "make money fast"

If a matching score-line is preceded with "=", the score-value is set to the given value and no further tests will be made for this article:

	=+9999 from "my.mail@address"
	=-9999 from "spam.mail@address"

The scoreable fields depend on the overview-information returned by the newsserver ("XOVER"). In most cases, the following fields are available for scoring: Subject, From, Date, Age, Message-ID, References, Bytes, Lines, Xref, Xpost:

	+100 subject "hamster"
	-100 from {no.*spam}
	+500 message-id "my.unique.fqdn"
	+100 references "my.unique.fqdn"
	-100 bytes %>10000
	-100 lines %>250

The fictitious header-field Xpost is based on Xref and gives the number of groups, the article was crossposted to:

	-10 xpost %>2     # posted to more than 2 groups
	=-9999 xpost %>5  # posted to more than 5 groups

The fictitious header-field Age is based on Date and gives the age of the article in days:

	=-9999 age %>14   # ignore all articles older than 14 days

If a fieldname is preceded with "~", the value of the field is MIME-decoded before testing. If given, this decoding is also done for any additional "@"-fields in this line:

	+100 ~subject "hämstêr"
	-100 ~from "jürgen" +@subject:"hämstêr"

If the rules are preceded by keyword "unless", the result of the rule is inverted, i. e. the line matches, if the given rule does NOT match:

	-42 unless Subject: "Hamster"

 

Score Rules (tested AFTER loading an article):

Before loading an article, only a reduced set of header lines can be tested (see previous section). After the article was loaded, additional rules can be applied without this restriction. As all headers are available now, all of them (and even the body part of the message) can be tested.

Rules, that should be tested after the article was loaded, have to be marked with a leading question mark:

	?+42 User-Agent "hamster"
	?=-9999 Path "!known.spamm.er!not-for-mail"

Certainly, such rules cannot decide any more, if the article should be loaded, as it was already loaded. Their purpose is to decide, what to do with the loaded article.

Again, the score value starts with 0 and can be raised or lowered with such "?" rules. If the score value remains 0 or gets a value greater than 0, the article is stored in the database. But if the final score value becomes a value below 0, the article is not stored and is just dropped. So the main purpose for these special "?" rules is to get rid of really unwanted articles, that could not be detected by the normal rules described in previous section.

Besides the names of any header lines, you can also use some special names in "?" rules:

	?+42 Header "hamster"
	?+42 Body "hamster"
	?+42 Article "hamster"

Header checks, if any header line matches the given patterns, Body checks all body lines and Article checks all header and all body lines.

Score Patterns:

Patterns without a leading "+"- or "-"-sign mean, that one or more of them must match:

	# "hamster" or "newsserver" or "mailserver"
	+1 subject "hamster" "newsserver" "mailserver"

Patterns with a leading "+"-sign mean, that the field has to contain this value:

	# "hamster" in combination with "newsserver" or "mailserver"
	+1 subject +"hamster" "newsserver" "mailserver"

Patterns with a leading "-"-sign mean, that the field may not contain this value:

	# "newsserver" or "mailserver" not regarding "unix/linux/inn"
	+1 subject "newsserver" "mailserver" -"unix" -"linux" -"inn"
	# From-headers not containing "@"
	=-9999 from -"@"

To combine different header-fields in a score-line, you can qualify the pattern with its name:

	-1 subject "help" "urgent" "!!!" -@from:"my@address" -"SCNR"

If a score-pattern is placed within "{...}", it is treated as a PCRE-style regular expression[*]:

	# Ignore those who want to be ignored:
	-1 from {no.?spam} {(remove|delete|cut).*this}

[*] Perl-documentation for regular expressions can be found at:

http://www.perl.com/CPAN-local/doc/manual/html/pod/perlre.html

Example

# A section starting with "[*]" contains global score-entries, which will be
# used for all groups:

[*]

# Load "my" articles immediately:
=+9999 From "Your Name"
=+9999 Message-ID your.unique.fqdn

# Load articles referencing one of "my" articles:
=+5000 References your.unique.fqdn

# Certainly, we are very interested in articles regarding these funny little
# animals with small antennas on the head:
=+1000 Subject hamster "HELP! THERE'S A BIG FAT RAT ON MY SCREEN!" "SCNR ;-)"

# And certainly, we ignore really silly suggestions such as:
=-1000 Subject "MAKE HAMSTER FAST!!!!"
# (please notice, that this entry would never match, as subjects containing
# "hamster" would match the "="-entry above)

# The examples below use group-specific score-entries by starting a new
# section in the scorefile with a "[...]"-line.
# As Hamster builds an "individual" scorelist for each group before loading
# articles for it, it is more effective to define "individual" filters, if
# score-entries are only needed for some of the groups.


# Filter out "big" articles, that do not have "FAQ" in subject and are not
# posted in an announce-group:

[* -announce]
-10 Lines %>200
-10 Bytes %>10000
+20 Subject FAQ

# Ignore articles posted to more than three groups:
-10 Xpost %>3


# Ignore articles with subjects containing "!!!" in all groups except the
# newusers-groups:

[* -newusers -neubenutzer]
-1 Subject "!!!"
# Some groups may be more readable, if you filter out all articles and
# only load specific ones immediately, e.g.:

[group.name.one group.name.two group.name.three]
-1 Message-ID *
+1 Subject "interest1" "interest2" "interest3" "interest4"
+1 From "user1" "user2" "user3" "user4"

[www.elbiah.de Hamster Playground Documentation]