Back up forum's contents?

Discuss the talk-polywell site itself, including appearance, policies, and help-wanted requests from the administrators.

Moderators: tonybarry, MSimon

dch24
Posts: 142
Joined: Sat Oct 27, 2007 10:43 pm

Post by dch24 »

The thing is, it turns out wget isn't the right tool for the job. So I've got to write it up in perl when I can find a spare minute.

Then running it every two weeks would be completely automatic.

dch24
Posts: 142
Joined: Sat Oct 27, 2007 10:43 pm

Post by dch24 »

OK, I've modified the swish-e spider to crawl talk-polywell.org

I posted the backup of the forums here:
http://polywell.nfshost.com/2008_12_06_few_links.tar.bz2 (6,047,321 bytes)

The script I use is here:
http://polywell.nfshost.com/2008_12_06_phpbb_spider.txt

Joe, you'll see lots of accesses by the user agent 'swish-e', that was me developing the spider. I've now set the user agent to 'polywell-backup-bot-0.1'

It still has lots of trouble loading outside links, but it did get a few. I think some of them might be spam. I'll try to track those down and filter them out.

If anything here, at talk-polywell.org didn't get downloaded, please let me know.

Unlike previous backups, this file is only 5.8 Meg! It's because I improved the spidering a lot. So I can leave it posted for a while and not run out of money. :-)

Post Reply