Skip to content

grozwalker/de-zoomcamp-meterorite-landings

Repository files navigation

Meteorite landings map

Final project for data-engineer zoomcamp 2024

About project

For my project I decided to analyse meteorites which landings to Earth with coordinates, years and other additionals params. I get data from The Meteoritical Society which contains information about all of the known meteorite landings and want to figure out in which areas meteorite fragments concentrated.

In this project, I am going to implement some data engineering best practices (partition table, pre commits hooks and others) and gain interesting metrics, such as:

  • number of meteorites by year
  • distribution of the number of meteorites by latitude
  • distribution meteorites by types
  • interactive map of meteorite landings

Go to https://lookerstudio.google.com/reporting/6c8488a2-2e39-4b79-a966-de9cba50b83c/page/lo3tD to view report

In fact, a partitioned table is not needed for such a small amount of data, but I decided to add it to show that I can do it. Also processing with spark.

Dataset

In this project I get Meteorite Landings from NASA data open portal.

Technologies

Reproduction Step

Prerequisites

  1. A Google Cloud Platform account
  2. Docker (https://www.docker.com/get-started/)
  3. Terraform (https://developer.hashicorp.com/terraform/install)

Create a Google Cloud Project

Go to Manage Resource page in the Google Cloud console. Click Create Project and fill in the fields, after that click Finish. Then add billing account to the project

Enable necessary api

Clone repo

cd ~
git clone [email protected]:grozwalker/de-zoomcamp-meterorite-landings.git
git submodule update --init --recursive

Create a Service Account and key

  1. In the Google Cloud console, go to the Create service account page
  2. Select a Google Cloud project
  3. Fill necessary fields
  4. Add this roles: BigQuery Admin, Cloud Datastore Owner, Cloud SQL Admin, Storage Admin, Storage Object Admin, Viewer
  5. Click Done to finish creating the service account.
  6. In the Service account dashboard find just now created account and click on Actions -> Manage keys
  7. Click on Add key -> Create new key and choose key type JSON
  8. Save file as gcp-service-account.json and store it in your project folder, in {project_folder}/key.

Create BQ infrastructure

cd ~/de-zoomcamp-meterorite-landings/terraform
cp terraform.tfvars.example terraform.tfvars
nano terraform.tfvars # fill the variable **project_id** with the value of the project ID that you created above
terraform init
terraform apply

Start project localy

cd ~/de-zoomcamp-meterorite-landings
cp dev.env .env
nano .env # fill GOOGLE_PROJECT_ID
make build
make ingest_data # It take several minutes

If you want access to mage ui run: make ui and open http://localhost:6789/pipelines/meteorite_landings

After all you can destroy all infrastructure:

cd ~/de-zoomcamp-meterorite-landings/terraform
terraform destroy

About

Final project for data-engineer zoomcamp 2024

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published