Message tagger
Mailman has a topics system which works like this: a mailing list
administrator sets up one or more topics, which is essentially a named regular
expression. The topic name can be any arbitrary string, and the name serves
double duty as the topic tag. Each message that flows the mailing list has
its Subject:
and Keywords:
headers compared against these regular
expressions. The message then gets tagged with the topic names of each hit.
>>> from mailman.app.lifecycle import create_list
>>> mlist = create_list('_xtest@example.com')
Topics must be enabled for Mailman to do any topic matching, even if topics are defined.
>>> mlist.topics = [('bar fight', '.*bar.*', 'catch any bars', False)]
>>> mlist.topics_enabled = False
>>> mlist.topics_bodylines_limit = 0
>>> from mailman.testing.helpers import (specialized_message_from_string
... as message_from_string)
>>> msg = message_from_string("""\
... Subject: foobar
... Keywords: barbaz
...
... """)
>>> msgdata = {}
>>> from mailman.handlers.tagger import process
>>> process(mlist, msg, msgdata)
>>> print(msg.as_string())
Subject: foobar
Keywords: barbaz
>>> msgdata
{}
However, once topics are enabled, message will be tagged. There are two
artifacts of tagging; an X-Topics:
header is added with the topic name,
and the message metadata gets a key with a list of matching topic names.
>>> mlist.topics_enabled = True
>>> msg = message_from_string("""\
... Subject: foobar
... Keywords: barbaz
...
... """)
>>> msgdata = {}
>>> process(mlist, msg, msgdata)
>>> print(msg.as_string())
Subject: foobar
Keywords: barbaz
X-Topics: bar fight
>>> msgdata['topichits']
['bar fight']
Scanning body lines
The tagger can also look at a certain number of body lines, but only for
Subject:
and Keyword:
header-like lines. When set to zero, no body
lines are scanned.
>>> msg = message_from_string("""\
... From: aperson@example.com
... Subject: nothing
... Keywords: at all
...
... X-Ignore: something else
... Subject: foobar
... Keywords: barbaz
... """)
>>> msgdata = {}
>>> process(mlist, msg, msgdata)
>>> print(msg.as_string())
From: aperson@example.com
Subject: nothing
Keywords: at all
X-Ignore: something else
Subject: foobar
Keywords: barbaz
>>> msgdata
{}
But let the tagger scan a few body lines and the matching headers will be found.
>>> mlist.topics_bodylines_limit = 5
>>> msg = message_from_string("""\
... From: aperson@example.com
... Subject: nothing
... Keywords: at all
...
... X-Ignore: something else
... Subject: foobar
... Keywords: barbaz
... """)
>>> msgdata = {}
>>> process(mlist, msg, msgdata)
>>> print(msg.as_string())
From: aperson@example.com
Subject: nothing
Keywords: at all
X-Topics: bar fight
X-Ignore: something else
Subject: foobar
Keywords: barbaz
>>> msgdata['topichits']
['bar fight']
However, scanning stops at the first body line that doesn’t look like a header.
>>> msg = message_from_string("""\
... From: aperson@example.com
... Subject: nothing
... Keywords: at all
...
... This is not a header
... Subject: foobar
... Keywords: barbaz
... """)
>>> msgdata = {}
>>> process(mlist, msg, msgdata)
>>> print(msg.as_string())
From: aperson@example.com
Subject: nothing
Keywords: at all
This is not a header
Subject: foobar
Keywords: barbaz
>>> msgdata
{}
When set to a negative number, all body lines will be scanned.
>>> mlist.topics_bodylines_limit = -1
>>> lots_of_headers = '\n'.join(['X-Ignore: zip'] * 100)
>>> msg = message_from_string("""\
... From: aperson@example.com
... Subject: nothing
... Keywords: at all
...
... %s
... Subject: foobar
... Keywords: barbaz
... """ % lots_of_headers)
>>> msgdata = {}
>>> process(mlist, msg, msgdata)
>>> # Rather than print out 100 X-Ignore: headers, let's just prove that
>>> # the X-Topics: header exists, meaning that the tagger did its job.
>>> print(msg['x-topics'])
bar fight
>>> msgdata['topichits']
['bar fight']
Scanning sub-parts
The tagger will also scan the body lines of text subparts in a multipart message, using the same rules as if all those body lines lived in a single text payload.
>>> msg = message_from_string("""\
... Subject: Was
... Keywords: Raw
... Content-Type: multipart/alternative; boundary="BOUNDARY"
...
... --BOUNDARY
... From: sabo
... To: obas
...
... Subject: farbaw
... Keywords: barbaz
...
... --BOUNDARY--
... """)
>>> msgdata = {}
>>> process(mlist, msg, msgdata)
>>> print(msg.as_string())
Subject: Was
Keywords: Raw
Content-Type: multipart/alternative; boundary="BOUNDARY"
X-Topics: bar fight
--BOUNDARY
From: sabo
To: obas
Subject: farbaw
Keywords: barbaz
--BOUNDARY--
>>> msgdata['topichits']
['bar fight']
But the tagger will not descend into non-text parts.
>>> msg = message_from_string("""\
... Subject: Was
... Keywords: Raw
... Content-Type: multipart/alternative; boundary=BOUNDARY
...
... --BOUNDARY
... From: sabo
... To: obas
... Content-Type: message/rfc822
...
... Subject: farbaw
... Keywords: barbaz
...
... --BOUNDARY
... From: sabo
... To: obas
... Content-Type: message/rfc822
...
... Subject: farbaw
... Keywords: barbaz
...
... --BOUNDARY--
... """)
>>> msgdata = {}
>>> process(mlist, msg, msgdata)
>>> print(msg['x-topics'])
None
>>> msgdata
{}