Check Google PageRank for multiple pages

One of the many incredibly useful modules over at the Perl module database CPAN, is the WWW::Google::PageRank module. It gives a quick and easy way of checking PageRank for a number of pages without delving into the particulars of the header exchanges over at toolbarqueries.google.com.

In order to use the script below, first create a file named 'pages.txt', and enter the full URLs of the pages you want to check, separated by newline characters, and save it in the same directory as the script:

PERL:
  1. #!/usr/bin/perl -w
  2. use strict;
  3. use warnings;
  4. use Getopt::Std;
  5. use File::Basename;
  6. use WWW::Google::PageRank;
  7.  
  8. open(PRANK,">pr.csv") or die "could not open $!";
  9.  
  10. my $pr = WWW::Google::PageRank->new;
  11.  
  12. my %opts;
  13. getopts('uhsd:', \%opts);
  14. my $urlfile = 'pages.txt';
  15. &usage if ($opts{'u'} || $opts{'h'});
  16. die "Please supply a file containing URLs\n" unless $urlfile;
  17. die "No file found at '$urlfile'!\n" unless -e $urlfile;
  18. my $urls = get_urls($urlfile);
  19.  
  20. if ($opts{'s'}) {}
  21. foreach my $url (sort keys %$urls) {
  22.   if ($opts{'d'}) {
  23.     print "${url}$opts{'d'}" . $pr->get($url), "\n";
  24.     print PRANK "${url}$opts{'d'}," . $pr->get($url), "\n";
  25.   } else {
  26.     print "${url} " . $pr->get($url), "\n";
  27.     print PRANK "${url}," . $pr->get($url), "\n";
  28.   }
  29. }
  30.  
  31. # Subroutine to fetch URLs from 'pages.txt' file
  32. sub get_urls {
  33.   my $urlfile = shift;
  34.   my %urls;
  35.  
  36.   open(URLS, "<$urlfile") or die "Failed to open '$urlfile': $!";
  37.   while(<URLS>) {
  38.     my $url;
  39.     chomp;
  40.     next if /^#/;
  41.     next if /^\s*$/;
  42.     s/\s*(\S*)?\s*/$1/;
  43.     $url = $1;
  44.     unless($url =~ '^http://') {
  45.       $url = 'http://' . $url;
  46.     }
  47.     $urls{$url} = 0;
  48.   }
  49.   close URLS;
  50.   return \%urls;
  51. }
  52. close PRANK;

When you run the script at the command line ('perl script.pl'), it will generate a Comma Separated file with PageRank values next to each relevant URL.

Share and Enjoy:
  • Digg
  • Sphinn
  • del.icio.us
  • Google
  • StumbleUpon
  • Technorati
  • E-mail this story to a friend!

Tags: , ,

Leave a Reply