A BlogStorm reader emailed me today,
Hello,
I am replying to your post on http://www.blogstorm.co.uk/how-to-scrape-pages-with-coldfusion/
Wanted to see your experience in page scraping, may need your help on a project. HOw much did you do beyond the above Code?
Thanks
That guest post was written over a year ago and I haven't written anything in Coldfusion for quite some time. After responding with this information, he replied with,
Hello Guy,
Thanks for the info. We don't need something fast and efficient, since Bash or perl is faster probably. Just something I can maintain. Do you have a script that can extract the WHois info?
Thanks
So I put together a little sh/bash script that will read a list of domain names from a text file, whois each domain, and insert the results into a MySQL table. Basically a MySQL insert from bash. Here's the finished (tested/working) script:
#!/bin/sh
WHOISME="/tmp/list-of-domains-to-whois.tmp"
# mysql username
SQLUSER="root"
# mysql password
SQLPASS="h0h0h0"
# mysql server hostname/IP
SQLHOST="mysql-server.localdomain"
# prints date in YYYY-MM-DD format
SQLDATE=$(date +%F)
# database to use (created with: mysql> create database db_whois;)
SQLDB="db_whois"
# table inside the database to use
# created with: mysql> CREATE TABLE tbl_scraped(id INT NOT NULL AUTO_INCREMENT,PRIMARY KEY(id),domain VARCHAR(50),whois_data TEXT,date DATE);
SQLTBL="tbl_scraped"
# mysql> describe tbl_scraped;
# +------------+-------------+------+-----+---------+----------------+
# | Field | Type | Null | Key | Default | Extra |
# +------------+-------------+------+-----+---------+----------------+
# | id | int(11) | NO | PRI | NULL | auto_increment |
# | domain | varchar(50) | YES | | NULL | |
# | whois_data | text | YES | | NULL | |
# | date | date | YES | | NULL | |
# +------------+-------------+------+-----+---------+----------------+
# 4 rows in set (0.00 sec)
# the magic
cat $WHOISME |while true
do read LINE || break
WHOISD=$(whois $LINE |sed "s/'/\\\'/")
echo "INSERT INTO $SQLTBL VALUES (NULL,'$LINE','${WHOISD}','${SQLDATE}');"\
|mysql -u$SQLUSER -p$SQLPASS -h$SQLHOST $SQLDB
done; echo "Job Done - Exiting..."
exit 0
Enjoy...
Word Count: 350


