uce: Reporting Spam

by Stanislav Shalunov, August 2001

Why do spammers do it?

Let's say that there are at least 100M electronic mail users. Let's assume that you can reach 1M of them by sending email to all addresses on a spammer's kit CDROM (these kits are advertized all the time by spammers). The cost of spamming would be, roughly: $20 for the CDROM plus $20 for a throw-away large ISP account for a month plus several hours of kit user's time; upper bound for expenses is something like $100. This means that you can reach a large untargeted audience for 0.01c per person. With paper junk mail, the cost is more like $1 per person. Spam is 10,000 cheaper than paper junk mail. It must be profitable to send paper junk mail, or the direct marketing companies would go out of business. Something that's 10,000 times cheaper to deliver could be a gold mine, then.

Where does this lead us?

Well, $100 to reach a large global audience isn't a lot. That amount of money is within anybody's reach. Let's assume that 1% of electronic mail users have a message to deliver to the masses (these people could be entrepreneurs, political activists, people with any sort of a cause, wackos, religious proselyters, etc.). Let's assume that they would send one message per month on average (making spamming a few-dollars-a-month investment for them). This comes to over 300 spam messages reaching each mailbox per day, on average. Naturally, with the cost of spamming so low, you don't have to wait for an important message to arise to start spamming if you already have the kit and are set up with spamming. This could mean more spam. Further, the standard deviation of 300 messages/day would be high: people whose email addresses are on multiple web pages or who post to Usenet would get thousands and thousands of messages per day. All junk. The real mail would be lost in the noise.

Let's look at it another way: Currently, I get several pieces of junk paper mail per day. With the same budgets, the advertizers would be able to send me tens of thousands of email messages per day. And that doesn't even begin to count wackos, people with a political message, etc.

Either way, it's chilling. Email usefulness would be undermined for everybody (including spammers, but would they care?).

Why doesn't it happen?

At the same time, this simply isn't happening. My email addresses that I use to post to Usenet, publicly archived mailing lists, etc., have remained valid for many years, and yet I only get few spam messages per day.

Why don't all the direct marketing companies spend few percent of their budgets delivering junk email? Why don't all the wackos send their ramblings to the entire world? Why so little spam is sent to email users (and what's sent is usually sent by rogue idiots)?

I see the following factors that limit spam proliferation:

Why do ISPs take action against their paying customers? Why do direct marketing folks know better than to spam en masse? Why do legislators adopt laws that limit spam? Because people complain about spam.

Reporting spam is the deterrent factor that keeps email useful. If reporting were to stop, we'd be buried in a sea of junk mail.

Where to complain?

One can complain to the following parties:

Complaining to policy makers about individual instances of spam is silly. It might make sense to write a snail mail letter to your representatives that tells them how you feel about spam and urges them to supports good measures aimed at stopping it without hurting civil liberties. But whining about individual instances will only make the anti-spam cause look bad.

Hitting the spammer where it hurts and disconnecting their email address that they use to take orders may seem like the best idea. And it's effective. However, it only works for some fraction of spam. Firstly, there may be no means of contact (they may not be trying to sell anything using the message). Secondly, the means of contact may be hard to disconnect (try convincing a snail mail postmaster to not deliver mail to a valid address).

Complaining to the spammer himself can be efficient, if done, e.g., to their home telephone number or to their boss at work, but it's useless to try to send email to a spammer telling them to stop.

So, we're left with the delivering and originating parties as suitable complaint targets.

Determining the responsible parties

I propose a simple algorithm for determining the IP numbers of the responsible parties.

Sendmail dominance is over. A variety of MTAs are used, each using its own format for Received: lines. The only reliably parsable thing left are IP numbers. Luckily, the rest is just fluff anyway. So, we just look for IPs.

We start with a set of "trusted" IP numbers (it's unique for each user and is relatively immutable) and a set of bogus IP numbers (it's a hardcoded set of local, multicast, internal use, etc., numbers).

Any address that's not trusted and not bogus is interesting.

Read the Received: lines starting from the top. Look for things that parse as an IP number. While there are no interesting IP numbers in a line, skip the line.

The last interesting IP number in the line we're looking at, if we haven't skipped all lines, is the delivering party IP number.

Repeat reading until we find a line with interesting IP numbers. The last of them is the potential originating party number. (If it doesn't exist originating party is the same as delivering party.)

Do RBL, DUL, and RSS lookups of the delivering IP address (peer). If peer is in RLB or DUL, undefine origin, even if it's defined (if a known spammer or a dial-up user claims they received the message from somebody else we don't trust them). If peer is not in RSS, and origin is defined, report peer to RSS (and maybe other open relay lists).

Let's now get back to the set of trusted IP numbers. This is a set of IP networks that host mailservers that you trust and that do not originate spam delivered to you (such as your mailserver, mailserver for another address that's forwarded to you, mailserver of a bona fide email list you're subscribed to, etc.). These machines generate headers that you can trust; and they must always include the IP number of their peer.

Determining the email addresses to send complaints to is a more complicated business. Currently, a technique that I find reliable (but that only discovers some addresses for roughly 25% of cases) is to first do SOA DNS lookup of class C zone in in-addr.arpa and then run the obtained domain through whois.abuse.net.

The following proof-of-concept Perl code demostrates the use of this algorithm: spamtrak.