Skip to content

rbramwell/cloud-data-analysis-at-scale

 
 

Repository files navigation

Data Analysis at Scale in the Cloud

Course taught at Duke MIDS, Spring 2020 by Noah Gift.

Lecture Topics:

Getting Started: [Week1]

Cloud Computing Foundations: [Week2]

Virtualization and Containers: [Week3 & Week 4]

Challenges and Opportunities in Distributed Computing: [Week 5 & Week 6]

Cloud Storage [Week 7 & Week 8]

Serverless [Week 9 & Week 10]

Big Data Platforms [Week 11]

Managed Machine Learning Systems [Week 12]

Edge Computing [Week 13]

General

Student Example Projects

References:

A practical guide to Data Science, Machine Learning Engineering and Data Engineering

This book is being written "just in time", with a weekly release schedule.

cloud4data books

TOC (Table of Contents) Book

  • Chapter 1: Getting Started

  • Chapter 2: Cloud Computing Foundations

  • Chapter3: Virtualization & Containerization

  • Chapter 4: Challenges and Opportunities in Distributed Computing

    • CAP Theorem
  • Chapter 5: Cloud Storage

    • Cloud Databases: HBase, MongoDB, Cassandra, DynamoDB, Google BigQuery
  • Chapter 6: Serverless

    • AWS Cloud 9 Development Environment
    • FaaS (Function as a Service)
    • AWS Lambda
    • GCP Cloud Functions
    • Azure Functions
    • AWS Cloud-Native Primitives Overview
    • AWS Step Machines
    • AWS SQS
    • AWS SNS
    • AWS Cognito
    • AWS API Gateway
    • Google Cloud Shell Development Environment
    • Google App Engine
  • Chapter7: Big Data Platforms

    • Batch Processing: EMR/Hadoop, AWS Batch
  • Chapter 8: Managed Machine Learning Systems, Platforms and AutoML

    • AutoML Overview
    • AWS Sagemaker
    • AWS Sagemaker Autopilot
    • GCP AI Platform
    • GCP AutoML Overview
    • GCP AutoML Vision
    • GCP AutoML Tables
    • Azure ML Studio
    • H20 AutoML
    • Open Source ML Platforms Overview
    • Ludwig
  • Chapter9: Edge Computing

    • IoT Overview
    • AWS Greengrass
    • Raspberry Pi
    • Edge Machine Learning Solutions Overview
    • Google AutoML
    • Tensorflow lite
    • Intel Movidius
    • Apple X12
  • Chapter 10: Data Science Case Studies and Projects

    • Case Study: Datascience meets intermittent fasting
    • Case Study: Coronavirus Epidemic
    • Applied Computer Vision Overview
    • Project: AWS DeepLense Edge Computer Vision
    • Project: Rasberry Pi
    • Project: Intel Movidius Edge Computer Vision
    • Project: Serverless Data Engineering Pipelines
    • Project: Operationalizing Containerized Machine Learning Models
    • Project: Continuous Delivery of GCP PaaS
    • Project: Using Docker Containers and Registeries
    • Project: Cloud Machine Learning with Kubernetes
  • Chapter 11: Essays

    • Why There Will Be No Data Science Job Titles By 2029
    • Exploiting The Unbundling Of Education
    • How Vertically Integrated AI Stacks Will Affect IT Organizations
    • Here Come The Notebooks
    • Cloud Native Machine Learning And AI
    • The "missing technical sememester" for MBA programs
  • Chapter 12: Cloud Certifications

    • AWS Certification Guide Overview
    • AWS Certified Cloud Practitioner
    • AWS Certified Solutions Architect
    • AWS Certified Developer
    • AWS Certified Data Analytics Specialty
    • AWS Certified Machine Learning Specialty
    • GCP Certification Guide Overview
    • Azure Certification Guide Overview
  • Chapter 13: Career

Public Trello Board

Public status of tickets for course/book

Text and Code License

The text and code content of notebooks and documents is released under the CC-BY-NC-ND license

About

[Course-2021] taught at Duke MIDS. This is also a Coursera Course that goes live in early 2021. Stay Tuned!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%