Archive for the 'Linux' Category

Spam filtering with SpamBayes

Alright, so I’ve been getting more and more spam in recent weeks, and they’ve been getting harder and harder to build basic filter rules for.

My mail works in a pretty round-about way:
I have multiple POP accounts all over the place, which have sort of accumulated over the years. It becomes a bit of a mission to always set up and check all these accounts, so what I have now is a small Python script that connects to each of the servers, grabs the mail, sorts them based on some simple filters (like, containing a [mailinglist] type subject), and places them within a Maildir structure based on that sorting. In addition, it does the same thing for deciding if it should delete a message - extremely basic spam filtering rules can be set up to check out certain headers for possible spam flags, etc.

The downloaded mail is then served via IMAP, using the Dovecot mail server. The great thing about that, is then every time I re-install any of the machines I use for mail access, or install a new one, I instantly have all my neatly sorted folders, all my mail from all my accounts, and only one IMAP account I need to set up.

Anyway, basically, the spam filtering of the above system was rather lame, so I went on the hunt for something a little more useful. Enter SpamBayes - a mail proxy application written in Python.

It’s already “in Debian”, so installing was as fun as always (aptitude install spambayes), after which I only needed to start the service, and then it’s off to a browser to configure it. Actually there wasone step before that - since I’m running this on my server, and SpamBayes is meant for use by a single user on their own PC, it doesn’t allow connections do it’s browser-based configuration from other hosts. Which is a bit of a problem when running a server which I have no interaction with beyond a command shell. Thanks to Lynx I was able to configure it to allow connections from my local network.

For starters, you need to tell it which POP3 servers you want to connect to, and assign local ports to each one, which will be stand-ins for port 110 when connecting to servers. The interface for this is a bit troublesome however, requireing you to enter each server into a single input field, separated by commas. The associated ports for each server are then entered into another input field in the same manner. It took me a while to get both the fields synced due tot he number of servers I intended using.

Next up, I fed it a few emails for training (saved emails out of Thunderbird as EML files, and these can be uploaded to the server for training via the browser interface) both ‘ham’ and spam.

Once it knew the basics, I simply updated the list of servers in my Python script to “localhost”, and whichever port each one was set to. Shortly thereafter, mail started passing through the system. Most of it was identified as “unsure”, as it hadn’t seen enough examples of ham or spam yet. Quite smartly, it keeps a record of each message that’s passed through, and you can easily train ham or spam from these.

Around 50 mails later, it was identifying almost every message perfectly. I’m going to leave it running for a day or two more, training everything that arrives, then I’ll just add a single filter to my mail fetching script, looking at the “X-Spambayes-Classification” header for “spam” (delete), or “ham” take no action.

I’m quite happy with this setup, looks like it’ll work quite well :D.

Technorati Tags: , , , , , ,

Debian Powered Notebook

Yay, on Friday, I decided to take the plunge, and install Debian on my laptop. I’ve always wanted to try working in a Linux desktop environment, considering I do practically no Delphi development any more, everything’s either PHP or Python. Since Debian has plenty of support for both of these, it seemed quite ideal.

I’m not dumping the existing Windows install though, so I had to partition my drive. Now, partitioning drives is about the most nerve-racking thing I’ve ever done, no matter what software I’m using, no matter how little data I stand to lose, I’m always scared as hell something will go wrong. Doubly so on this laptop since I use it constantly for work, and it’s the default installation HP put on, which I’m not really eager to bugger up. Luckily, that all went smoothly though :D.

When it came to installing Debian though, I had a problem with the netinst installer I was using, in that a package or two would not install due to missing some authentication files or something. So, since that installer didn’t work, it was suggested that I try an older Sarge installer and upgrade to the Unstable branch from there.

As expected, that installation went without issue. Unfortunately I just had to try my best to keep the machine cool while waiting for the installation to complete and the ACPI applications to kick in so the fans would do their thing. Thanks Korpse for warning me about that in advance ;).

So anyway, after the base installation was done, it was trivial (as I have come to expect from Debian,) to get Gnome, X.org and GDM installed and running. Apache, PHP and the rest were even easier. I was very pleased to see that the Firebird SQL server (open source’d Interbase fork thingey) was available in Debian right away, so I plonked that in as well. All that was left was to install Subversion, and check out my work stuff. Once done, I was 100% ready for work :D.

There are a few things not working yet - I haven’t had much luck with the wireless LAN. Gnome’s wireless LAN configuration applet thing is missing WPA-PSK authentication options (which I’ve configured my wireless AP to use), it only supports WEP. I also can’t seem to get the WPA tools (wap_supplicant) to work correctly. Based on the output I’m getting though, it *looks* like it’s connecting to the access point - it manages to find the AP’s MAC address and everything just fine, and it reports the authentication was successful, but beyond that, I can’t actually ping anything, and all traffic still seems to be trying to go through eth0 (wired LAN) rather than eth1 (wireless). Guess I need to learn a bit more about Linux networking… heh.

Another thing that’s not quite working is Bluetooth. Again, it *seems* to be working, the hardware is detected, and is working fine. My phone can pair with the laptop fine, using the PIN I’ve defined, but I can’t seem to transfer files or anything in either direction. I’ll admit that I haven’t played with this much yet, but I’m not really sure where to go next. I haven’t even tried looking into infra-red yet :).

And yes, I do use all these things, that’s why I spent so much on this laptop :P.

I also haven’t installed the ATI drivers yet, it looks like it’s going to be a bit of a mission on it’s own though.

Along the way I’ve also discovered some interesting new applications I haven’t seen or heard of before. gDesklets seem like a nice way to waste some CPU time and memory if you like to keep your desktop busy. Beep Media Player seems like a very nice alternative to XMMS, seems a lot more stable, and the general feel integrates better with Gnome. RapidSVN looks like it’s trying to be a nice enough SVN front-end, however I find the good old command-line a lot more friendly and efficient (and it crashes less). Sylpheed is a rather nice little mail client, and works well as an alternative to Thunderbird - assuming you’re unhappy with Thunderbird though. ibWebAdmin is a neat web-based tool for managing your Interbase databases, not as feature-packed or good looking as phpMyAdmin, but it does what it needs to do pretty well.

All that’s left now is to give it a shot at work tomorrow, and see how it all goes :).

Technorati Tags: , ,

SmoothWall

I must say, I’m rather dissapointed with my “SmoothWall experience” so far. I’ve been tasked with setting up a SmoothWall firewall/proxy machine at work, and from what I’ve read, it’s like the best thing since sliced bread.

Unfortunately I cannot agree.

The installation tends to go fine, it partitions the hard disk by itself, installs fairly fast, then steps through a simple setup ‘wizard’. Here we are prompted if we want to enable or disable ADSL. Now, I want SmoothWall to connect via our ADSL line. BUT, it seems the developer’s idea of “ADSL” is in fact “USB ADSL Modem”.

Anyway, after figuring that one out, and after much shuffeling of subnets and IPs between the router, SmoothWall, and my PC, I finally get it to use the router as a gateway. I try visiting some sites - DNS lookups fail. I take a look in all the log options on SmoothWall, and find the firewall is blocking DNS traffic, and is trying to route everything through the same (”Green”) NIC, rather than the second (”Red”) one.

Sooo, turns out I can fix this by running the “setup” tool again, and ‘pretending’ to change the IPs, so it resets everything (re-writes the firewall rules maybe?). Cool, everything’s working again. Not quite.

Seems after that, the proxy magically stops working alltogether, so from the web interface, I just disable it, and re-enable it. Cool, everything’s working now. Riiiiight.

A few hours later, suddenly the internet is dead. Hmm, seems the firewall is blocking all traffic again and routing though the same NIC. Sooo, I repeat the whole IP change/reset, proxy reset, etc, and everything’s cool.

A few hours later I find myself repeating the whole procedure again.

This is seriously lame, having to practically reboot the entire machine every few hours. So I think maybe I’ll try to set up a PPPoE connection. So I go and configure the router correctly, test ‘dialing up’ with my machine in XP, all’s cool. Now to set up SmoothWall. Running the setup tool again lets me set the “Red” interface to “PPPoE”, and that seems done. Now where do I put my username and password to dial up?

Aparrently the “ppp settings” page of the web GUI is where it’s done. Now excuse my ignorance, but this looks like a modem dial-up page, asking for phone numbers, which COM port my modem is on, etc, etc. A bit of searching around the rather un-helpful support forums, reveals that this is indeed where you need to configure PPPoE usernames and passwords. Just leave all settings alone except for login details.

I give it a shot, tell it to connect, nothing happens. Check the logs, and not surprisingly, it’s trying to connect via ttyS0 (COM1).

Now, aparrently there’s supposed to be an option to select the correct interface in the drop-list where you select which port your modem is on, on the “PPP Settings” page, but for some magical reason this does not exist for me.

Unfortunately their forums are also not very helpful it seems, and even after composing a very descriptive help request, I get a rather sarcastic “RTFM” response for a subject not covered in the manual.

Basically the manuals are not up to scratch, the support forums are full of leetbois, the options in both the setup tool and web UI are obscure, and the whole thing is bloody useless, needing a darn reboot every few hours. WTF.

I’d love to send the whole thing to hell, but unfortunately I have to get it to work. *sigh*

Check your email through Telnet

Ok so this is a little trick I picked up a few years ago when I developed the first version of ECheck and I started learning the POP3 protocol. It’s come in very handy when I’m away from my email client and don’t want to receive email anywhere and fragment my mailbox by spreading it across a few machines.

Firstly, this’ll work on both Linux and Windows systems, with no exrta software needed (assuming most Linux distros come with a Telnet client by default).

It’s a pretty useful thing everyone with an email account should know ;-).

Firstly, open a command prompt, and execute the following:

telnet <your.mail.server> 110

would obviously be replaced by the address (IP or hostname) of your POP3 server.

If you connect, you should be presended with a welcome message and a “+OK” message. You then enter the following commands to log in, replacing the contents of the “” with your details:

user <your@username>
pass <password>

After which, you should be greeted by another “+OK” assuming you managed to log in. If you make a typo, just send the line with the type - you usually cannot backspace and correct mistakes. Issue the correct command again.

Now that you’re in, let’s see your messages. To see how many messages and how big each of your messages is, send the following:

list

Once again a “+OK” line should be shown, followed by a very simple list of message IDs and file sizes (in bytes). Let’s preview a message, shall we?

top <id> <lines>

The headers for message , followed by up to number of lines from the message will be spammed to your console. You can find both the “Subject:” and “From:” header lines to decipher who the message is from and what it’s about. Of course you can also read the body…

Hmm? This message is junk mail or spam? Want to delete it before it hits your inbox?

dele <id>

… will delete the message with ID . It’s important to note that the message IDs are maintained - so if you delete message 1, message 2 will not fall into 1’s place. It’ll remain 2 for the remainder of the session.

If you’ve deleted the wrong message, all it not lost. You can ‘reset’ the mailbox status to how it was when you first connected:

rset

And once you’re done mucking around, disconnect nicely:

quit

It’s also worth noting that the commands are all case-insensitive, though I’m sure the ‘correct’ way of doing it would be to use all caps for commands, the server doesn’t seem to mind either way.

Have fun…

Dynamic IP hassles

Dunno if anyone would have noticed but the site was blinking on and off last week, with dynamic DNS issues.

I’ve been using an application which runs as a service on my Windows machine, but it seems to often give up if it can’t get a new IP or the udate fails, and sometimes it just doesn’t bother even trying :-).

Anyway I slapped up a quick Python script to be run from a cron job at 5 minute intervals to check a website which provides my IP (like http://checkip.dyndns.org), grab the first IP it finds, and updates my ZoneEdit account with the new IP.

Seems to have been running reliably the past few days now.

I’ve dumped it on the Files page if anyone would like to give it a go. It’s set up for ZoneEdit, but I’m sure it’s easy to adapt to other services as well.

Change your console resolution and colour depth

OK so not much is going on… Thought I might as well pass along some general knowledge.

Changing the resolution of a Linux console is a fairly simple task (and requires a reboot) and is generally a nice thing to do if you intend using the console a lot.

Start off by logging in as root, and open your Grub menu file (mine is in /boot/grub/menu.lst). Next, find the option that would normally boot your Linux system (probably looks something like the following):

kernel /vmlinuz-2.6.8-1-386 root=/dev/hda3 ro

Now, simply append to the end “vga=788″, so it ends up looking something like this:

kernel /vmlinuz-2.6.8-1-386 root=/dev/hda3 ro vga=788

The “788″ is a code which tells the console to be 800×600 with a 16bit colour depth. Check out a table ot codes for all resolutions and colour depths by clicking the “read more” link below this post.

Save the file, reboot, and enjoy :-).

For reference, here are some VGA codes:

   Colors ( depth) 640x480 800x600 1024x768 1280x1024 1600x1200
   ---------------+-------+-------+--------+---------+---------
   256    ( 8 bit)| 769    771     773      775       796
   32,768 (15 bit)| 784    787     790      793       797
   65,536 (16 bit)| 785    788     791      794       798
   16.8M  (24 bit)| 786    789     792      795       799

Squid

I thought that it’s about time I messed around with proxies, so yeasterday I set up Squid on my server, xan.

The configuration looked like a bit of a mission for a first-timer such as myself :P, so I whipped out Webmin and slapped on the Squid module. I’ll take a look at the config options it generated some other time and do it by hand in future.

I must say the veriety of options available is quite impressive. The access control lists are particularly exciting too, there’s a helluva lot that can be done with this stuff.

It’s only being used for HTTP at the moment, and is doing an excellent job. I’ve managed to get AWStats to do some basic reporting for it, so I can see who’s using how much bandwidth, viewing how many pages, what file types are being accessed, etc. I’m a bit of a stats junkie :P.

Overall I’m pretty impressed…

Quick project…

Well I installed dictd on xan (this server) yesterday, and it seems to be working great for Nooblet (my IRC bot, powered by Supybot - http://sypybot.com/) since my ADSL is capped. Accessing it over the LAN and internet works great as well, though I have been hunting for a decent Windows dict client - they don’t seem to exist.

Anyway, so I’m creating a quick little client in Python/wxPython with the dictclient module (http://erwin.complete.org/devel) . Not intended to be a great big feature packed client, it just needs tolook up words after all :D
Should be done in a few hours…

Ubuntu & Gnoppix

So I thought I’d try out the highly praised Ubuntu Linux. I thought to myself, what better way to try than with their LiveCD - no need to mess up any existing setups.

Anyway so I downloaded the ISO, and after burning the CD, noticed it had a Windows auto-run feature. So I ran it, and was presented with a nice little winow asking if I’d like to install Windows versions of OpenOffice.org, AbiWord, Audacity, Gimp, PDFCreator, Thunderbird or Firefox. That struck me as rather odd for a CD that’s supposed to be convincing you Linux is better (”Hey! No need to convert to Linux, just check out all this cool Windows software!”).

So I’m thinking oooookay, so I decide to try out the Linux bit of the disc. Everything boots up nicely, nice little GUI boot loader with a couple of options presented in easy-to-use menus, nice splash image hiding all the auto-detection of hardware and genreral stuff that goes on at boot time (pressing Esc kills the splash image so you can check that everything’s okay in the background). Once it booted up into Gnome, everything looked cool. Nice default desktop setup, theme, etc. I realised at this point that my router had DHCP disabled, so Ubunto had me offline. There’s a ‘Network Setup’ option in the “Actions” menu, which presented me with a nice little wizard for IP, DNS, gateway, etc options. Upon completeing this wizard and closing the application however, nothing would work at all. Icons on the panel did not launch applications, and neither did anything in the application menu. I’d have restarted X, but with LiveCDs, they seem to terminate and reboot as soon as X shuts down.

So anyway, I restarted the whole thing, but with the router’s DHCP enabled. Everything worked cool, the applications on the CD all worked as expected. At some point I entered the Network Setup again and needed another reboot though. Seems as soon as that is run it kills the setup…

It’s a very minimal system though, nothing really useful on it beyond OpenOffice.org - and who uses a LiveCD to do their general word processing. It even had Synaptic - but it prevents you from installing software or even updating the packages list.

I also tried Gnoppix, which is based off the Ubuntu LiveCD, but it suffered the same network configuration application problem, as well as lacking any interesting software. It also included all the Windows software Ubuntu had. In fact the only real difference I saw between Gnoppix and Ubuntu was the boot up splash image. Most of Gnoppix is still ‘branded’ as Ubuntu.

I think if they dumped the Windows software from these CDs, they’d be able to load on a LOT of extra Linux software to impress potential users more, as well as making it more useful as a general-use LiveCD.

The only thing that would make me want to install a proper Ubuntu system at some later date at the moment would be the fact that it’s a full desktop installation out-the-box, with the ability to install anything else on demand thanks to it’s Debian base.

For the moment I’ll be sticking to Knoppix when I need Linux-on-the-go, which it loaded with tons of useful and fun stuff (pity about KDE, though).