[philiptellis] /bb|[^b]{2}/
Never stop Grokking


Showing posts with label 404. Show all posts
Showing posts with label 404. Show all posts

Sunday, February 28, 2010

Convert IP to Geo info using YQL

For my missing kids hack, I needed to convert an IP address to a US or Canadian 2 letter state code. This should have been pretty straightforward, but it turned out to require a little more effort than I initially wanted to put in.

First, the easy way. Rasmus Lerdorf has a web service that takes in an IP address and based on the MaxMind data, returns a bunch of information including the country and state/region code. I initially decided to use this. His example page is pretty self-explanatory, so I won't re-document it here. The problem is that this service was really slow and increased page load time a lot, so I scrapped the idea.

I then started looking through YQL. YQL has a whole bunch of geo stuff, but nothing that specifically turns an IP address into a WoEID or a country/state code. I then looked at the community supported tables and found the ip.location table that uses the ipinfodb.com wrapper around the MaxMind database. This returned everything I needed, but the only problem was that the state was returned as a string rather than a two character code. This is the query:
SELECT * From ip.location Where ip=@ip
The output looks like this:
{
 "query":{
  "count":"1",
  "created":"2010-02-28T01:24:30Z",
  "lang":"en-US",
  "updated":"2010-02-28T01:24:30Z",
  "uri":"http://query.yahooapis.com/v1/yql?q=select+*+from+ip.location+where+ip%3D%27209.117.47.253%27",
  "results":{
   "Response":{
    "Ip":"209.117.47.253",
    "Status":"OK",
    "CountryCode":"US",
    "CountryName":"United States",
    "RegionCode":null,
    "RegionName":null,
    "City":null,
    "ZipPostalCode":null,
    "Latitude":"38",
    "Longitude":"-97",
    "Timezone":"-6",
    "Gmtoffset":"-6",
    "Dstoffset":"-5"
   }
  }
 }
}
Now it's pretty trivial to build an array that maps from state name to state code, but I'd have to keep growing that as I added support for more countries, so I decided against that route. Instead I started looking at how I could use the geo APIs to turn this information into what I wanted. Among other things, the data returned also contained the latitude and longitude of the location that the IP was in. I decided to do a reverse geo map from the lat/lon to the geo information. The only problem is that the geo API itself doesn't do this for you.

Tom Croucher then told me that the flickr.places API could turn a latitude and longitude pair into a WoEID, so I decided to explore that. This is the query that does it:
SELECT place.woeid From flickr.places
 Where lat=@lat And lon=@lon
Now I could tied the two queries together and get a single one that turns an IP address to a WoEID:
SELECT place.woeid From flickr.places
 Where (lat, lon) IN
   (
      SELECT Latitude, Longitude From ip.location
       Where ip=@ip
   )
This is what the output looks like:
{
 "query":{
  "count":"1",
  "created":"2010-02-28T01:25:34Z",
  "lang":"en-US",
  "updated":"2010-02-28T01:25:34Z",
  "uri":"http://query.yahooapis.com/v1/yql?q=SELECT+place.woeid+From+flickr.places%0A+Where+%28lat%2C+lon%29+IN%0A+++%28%0A++++++SELECT+Latitude%2C+Longitude+From+ip.location%0A+++++++Where+ip%3D%27209.117.47.253%27%0A+++%29",
  "results":{
   "places":{
    "place":{
     "woeid":"12588378"
    }
   }
  }
 }
}
The last step of the puzzle was to turn this WoEID into a country and state code. This I already knew how to do:
SELECT country.code, admin1.code
  From geo.places
 Where woeid=@woeid
country.code gets us the two letter ISO3166 country code while admin1.code gets us a code for the local administrative region. For the US and Canada, this is simply the country code followed by a hyphen, followed by the two letter state code. Once I got this information, I could strip out the country code and the hyphen from admin1.code and get the two letter state code.

My final query looks like this:
SELECT country.code, admin1.code From geo.places
 Where woeid IN
   (
      SELECT place.woeid From flickr.places
       Where (lat, lon) IN
         (
            SELECT Latitude, Longitude From ip.location
             Where ip=@ip
         )
   )
And the output is:
{
 "query":{
  "count":"1",
  "created":"2010-02-28T01:26:32Z",
  "lang":"en-US",
  "updated":"2010-02-28T01:26:32Z",
  "uri":"http://query.yahooapis.com/v1/yql?q=SELECT+country.code%2C+admin1.code+From+geo.places%0A+Where+woeid+IN%0A%28SELECT+place.woeid+From+flickr.places%0A+Where+%28lat%2C+lon%29+IN%0A+++%28%0A++++++SELECT+Latitude%2C+Longitude+From+ip.location%0A+++++++Where+ip%3D%27209.117.47.253%27%0A+++%29%29",
  "results":{
   "place":{
    "country":{
     "code":"US"
    },
    "admin1":{
     "code":"US-KS"
    }
   }
  }
 }
}
Paste this code into the YQL console, make sure you've selected "Show community tables" and get the REST API from there. It's a terribly roundabout way to get something that should be a single API call, but at least from my application's point of view, I only need to call a single web service. Now if only we could convince the guys at missingkidsmap.com to use WoEIDs instead of state codes, that would make this all a lot easier.

Have I mentioned how much I like YQL?

Friday, February 19, 2010

Missing kids on your 404 page

It's been a long time since I last posted, and unfortunately I've been unable to churn out a post every week. The month of February has been filled with travel, so I haven't had much time to write.

My report on FOSDEM is up on the YDN blog, so I haven't been completely dormant. I also did some stuff at our internal hack day last week. This post is about one of my hacks.

The idea is quite simple. People land up on 404 pages all the time. 404 pages are pages that have either gone missing, or were never there to begin with. 404 is the HTTP error code for a missing resource. Most 404 pages are quite bland, simply stating that the requested resource was not found, and that's it. Back when I worked at NCST, I changed the default 404 page to use a local site search based on the requested URL. I used the namazu search engine since I was working on it at the time.

This time I decided to do something different. Instead of searching the local site for a missing resource, why not engage the user in trying to find missing kids.

I started with trying to find an API for missingkids.com and ended up finding missingkidsmap.com. This service takes the data from Missing Kids and puts it on a google map. The cool thing about the service was that it could return data as XML.

Looking through the source code, I found the data URL:
http://www.missingkidsmap.com/read.php?state=CA
The state code is a two letter code for states in the US and Canada. To get all kids, just pass in ZZ as the state code.

The data returned looks like this:
<locations>
   <maplocation zoom="5"
                state_long="-119.838867"
                state_lat="37.370157"/>
   <location id="1"
             firstname="Anastasia"
             lastname=" Shearer "
             picture="img width=160 target=_new src=http://www.missingkids.com/photographs/NCMC1140669c1.jpg"
             picture2="img width=160 target=_new src=http://www.missingkids.com/photographs/NCMC1140669e1.jpg"
             medpic = "img width=60 border=0 target=_new src=http://www.missingkids.com/photographs/NCMC1140669c1.jpg"
             smallpic="img width=30 border=0 target=_new src=http://www.missingkids.com/photographs/NCMC1140669c1.jpg"
             policenum="1-661-861-3110"
             policeadd="Kern County Sheriff\'s Office (California)"
             policenum2=""
             policeadd2=""
             st=" CA"
             city="BAKERSFIELD"
             missing="12/26/2009"
             status="Endangered Runaway"
             age="16"
             url="1140669"
             lat="35.3733333333333"
             lng="-119.017777777778"/>
   ...
</locations>

Now I could keep hitting this URL for every 404, but I didn't want to kill their servers, so I decided to pass the URL through YQL and let them cache the data. Of course, now that I was passing it through YQL, I could also do some data transformation and get it out as JSON instead of XML. I ended up with this YQL statement:
SELECT * From xml
 Where url='http://www.missingkidsmap.com/read.php?state=ZZ'
Pass that through the YQL console to get the URL you should use. The JSON I got back looked like this:
{
   "query":{
      "count":"1",
      "created":"2010-02-19T07:30:44Z",
      "lang":"en-US",
      "updated":"2010-02-19T07:30:44Z",
      "uri":"http://query.yahooapis.com/v1/yql?q=SELECT+*+From+xml%0A+Where+url%3D%27http%3A%2F%2Fwww.missingkidsmap.com%2Fread.php%3Fstate%3DZZ%27",
      "results":{
         "locations":{
            "maplocation":{
               "state_lat":"40.313043",
               "state_long":"-94.130859",
               "zoom":"4"
            },
            "location":[{
                  "age":"7",
                  "city":"OMAHA",
                  "firstname":"Christopher",
                  "id":"Szczepanik",
                  "lastname":"Szczepanik",
                  "lat":"41.2586111111111",
                  "lng":"-95.9375",
                  "medpic":"img width=60 border=0 target=_new src=http://www.missingkids.com/photographs/NCMC1141175c1.jpg",
                  "missing":"12/14/2009",
                  "picture":"img width=160 target=_new src=http://www.missingkids.com/photographs/NCMC1141175c1.jpg",
                  "picture2":"",
                  "policeadd":"Omaha Police Department (Nebraska)",
                  "policeadd2":"",
                  "policenum":"1-402-444-5600",
                  "policenum2":"",
                  "smallpic":"img width=30 border=0 target=_new src=http://www.missingkids.com/photographs/NCMC1141175c1.jpg",
                  "st":" NE",
                  "status":"Missing",
                  "url":"1141175"
               },
               ...
            ]
         }
      }
   }
}

Step 2 was to figure out whether the visitor was from the US and Canada, and if so, figure out which state they were from and pass that state code to the URL.

This is fairly easy to do at Yahoo!. Not so much on the outside, so I'm going to leave it to you to figure it out (and please let me know when you do).

In any case, my code looked like this:
$json = http_get($missing_kids_url);
$o = json_decode($json, 1);
$children = $o['query']['results']['locations']['location'];

$child = array_rand($children);

print_404($child);
http_get is a function I wrote that wraps around curl_multi to fetch and cache locally a URL. print_404 is the function that prints out the HTML for the 404 page using the $child data object. The object's structure is the same as each of the location elements in the JSON above. The important parts of print_404 are:
function print_404($child)
{
   $img = preg_replace('/.*src=(.*)/', '$1', $child["medpic"]);
   $name = $child["firstname"] . " " . $child["lastname"];
   $age = $child['age'];
   $since = strtotime(preg_replace('|(\d\d)/(\d\d)/(\d\d\d\d)|', '$3-$1-$2', $child['missing']));
   if($age == 0) {
      $age = ceil((time()-$since)/60/60/24/30);
      $age .= ' month';
   }
   else
      $age .= ' year';

   $city = $child['city'];
   $state = $child['st'];
   $status = $child['status'];
   $police = $child['policeadd'] . " at " . $child['policenum'];

   header('HTTP/1.0 404 Not Found');
?>
<html>
<head>
...
<p>
<strong>Sorry, the page you're trying to find is missing.</strong>
</p>
<p>
We may not be able to find the page, but perhaps you could help find this missing child:
</p>
<div style="text-align:center;">
<img style="width:320px; padding: 1em;" alt="<?php echo $name ?>" src="https://tech.bluesmoon.info/<?php%20echo%20$img%20?>"><br>
<div style="text-align: left;">
<?php echo $age ?> old <?php echo $name ?>, from <?php echo "$city, $state" ?> missing since <?php echo strftime("%B %e, %Y", $since); ?>.<br>
<strong>Status:</strong> <?php echo $status ?>.<br>
<strong>If found, please contact</strong> <?php echo $police ?><br>
</div>
</div>
...
</body>
</html>
<?php
}
Add in your own CSS and page header, and you've got missing kids on your 404 page.

The last thing to do is to tell apache to use this script as your 404 handler. To do that, put the page (I call it 404.php) into your document root, and put this into your apache config (or in a .htaccess file):
ErrorDocument 404 /404.php
Restart apache and you're done.

Update: 2010-02-24 To see it in action, visit a missing page on my website. eg: http://bluesmoon.info/foobar.

Update 2: The code is now on github: http://github.com/bluesmoon/404kids

Update: 2010-02-25 Scott Hanselman has a Javascript implementation on his blog.

Update: 2010-03-28 There's now a drupal module for this.

...===...