Skip to content

Latest commit

 

History

History
63 lines (48 loc) · 5.2 KB

walkthrough-classification-aws.md

File metadata and controls

63 lines (48 loc) · 5.2 KB

Example Use: Creating a Neural Network to Find Populated Areas in Tanzania

Let's create a neural network to detect populated areas in Tanzania. Because there may not be a suitable dataset for this, let's use Label Maker to create our own. Afterwards we can use it to train a simple classifier with the data (with instructions on how to use it with custom architectures).

Creating Training Data

First install label-maker and tippecanoe.

Then we need to define our desired data by setting up a config.json file. You can read more specifics about each property in the README.

{
  "country": "united_republic_of_tanzania",
  "bounding_box": [38.83563,-6.78309,39.142055,-6.57952],
  "zoom": 16,
  "classes": [
    { "name": "Populated Area", "filter": ["has", "building"] }
  ],
  "imagery": "http://a.tiles.mapbox.com/v4/mapbox.satellite/{z}/{x}/{y}.jpg?access_token=ACCESS_TOKEN",
  "background_ratio": 1,
  "ml_type": "classification"
}
  • country and bounding_box: the first two parameters are relatively straightforward -- we'd like to create training data in Tanzania within a specific bounding box (defined with latitude and longitude coordinates).
  • zoom: Choosing the zoom is a bit tricker, too low and you can't see enough detail in the image, too high and you have to run your model over lots and lots of tiles to see any results. We're using 16 here after manually inspecting the imagery at http://geojson.io/.
  • classes: There is a single class "Populated Area" which defines a Mapbox GL Filter to find the appropriate OSM QA data. In this case we're using buildings as a proxy for populated areas.
  • imagery: We'll use satellite imagery from Mapbox (remember to add your access token)
  • ml_type: For each tile, we're looking to determine if it contains and residential land use, so we only need to classify the tile.
  • background_ratio: For single-class classification problems, we need to download tiles with no classes to help the algorithm learn what makes the class distinct from other images. This ratio tells us how many "no class" images to download.

Once you've created the configuration file, you can run the following steps from the command line to create the data:

  1. label-maker download: Download OpenStreetMap QA tiles. You can inspect the file afterwards with mbview
  2. label-maker labels: Create labels from the QA tiles. This will take a while since it is first retiling the tiles to zoom level 16. Then it creates the label data as labels.npz and file to visually inspect at classification.geojson. Here's what the latter looks like in QGIS (burnt orange showing the tiles with populated areas):

  1. label-maker preview: Preview the data by download example imagery for each class . Example satellite images will be at data/examples

  1. label-maker images: When you're ready, download all the necessary imagery tiles. Note that this will download 1,338 total images.
  2. label-maker package: This will create a file (data.npz), which contains test and train data for both labels and images that's easily loaded with numpy.

Training a Neural Network Classifier

Requirements

Now that you have the necessary data, we can train a neural network classifier. This example uses the pre-existing ResNet50 architecture included with Keras but feel free to try it with you own custom network.

To run the example included here:

  • Start an EC2 instance with ami-78994d02. This is the AWS Deep Learning image with Keras and Tensorflow installed. If you have the AWS CLI installed, you can start a g3.4xlarge instance by running aws ec2 run-instances --cli-input-json file://examples/utils/ec2runinst.json (just make sure to fill the ec2runinst.json with your SSH key and security group name)
  • Once the instance starts, copy the network code and data.npz file to the instance with scp (replace the SSH key and IP address in the following code):
  • Start the training by connecting to your instance with SSH and running python resnet.py. The default is to run 50 epochs with a batch size of 16.
  • Our test run of this network took about 40 minutes to train and gives a classifier with a test accuracy of ~89%. If you'd like to help improve the label accuracy, start mapping on Open Street Map