diff --git a/.github/workflows/jekyll-gh-pages.yml b/.github/workflows/jekyll-gh-pages.yml new file mode 100644 index 00000000..8d7c7590 --- /dev/null +++ b/.github/workflows/jekyll-gh-pages.yml @@ -0,0 +1,50 @@ +# Sample workflow for building and deploying a Jekyll site to GitHub Pages +name: Deploy Jekyll with GitHub Pages dependencies preinstalled + +on: + # Runs on pushes targeting the default branch + push: + branches: ["master"] + + # Allows you to run this workflow manually from the Actions tab + workflow_dispatch: + +# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages +permissions: + contents: read + pages: write + id-token: write + +# Allow one concurrent deployment +concurrency: + group: "pages" + cancel-in-progress: true + +jobs: + # Build job + build: + runs-on: ubuntu-latest + steps: + - name: Checkout + uses: actions/checkout@v3 + - name: Setup Pages + uses: actions/configure-pages@v2 + - name: Build with Jekyll + uses: actions/jekyll-build-pages@v1 + with: + source: ./ + destination: ./_site + - name: Upload artifact + uses: actions/upload-pages-artifact@v1 + + # Deployment job + deploy: + environment: + name: github-pages + url: ${{ steps.deployment.outputs.page_url }} + runs-on: ubuntu-latest + needs: build + steps: + - name: Deploy to GitHub Pages + id: deployment + uses: actions/deploy-pages@v1 diff --git a/README.md b/README.md index 6b2cc442..385a104f 100644 --- a/README.md +++ b/README.md @@ -64,6 +64,11 @@ Get familiar and comfortable with manipulating data in a database with a common * SQL School [Mode Analytics / Tutorials](http://bit.ly/sqlschool) ### Math & Statistics + +#### Calculus + * Single Variable Calculus [MIT OpenCourseWare](http://ocw.mit.edu/courses/mathematics/18-01-single-variable-calculus-fall-2006/) + * Multivariable Calculus [MIT OpenCourseWare](http://ocw.mit.edu/courses/mathematics/18-02sc-multivariable-calculus-fall-2010/) + #### Linear Algebra The foundational mathematics for working with large samples of data. Spend time in exercises until you feel highly confident in the key topics of Linear Algebra. It will serve you well. * An Intuitive Guide to Linear Algebra [Better Explained / Article](https://betterexplained.com/articles/linear-algebra-guide/) @@ -71,6 +76,7 @@ The foundational mathematics for working with large samples of data. Spend time * Vector Calculus: Understanding the Cross Product [Better Explained / Article](https://betterexplained.com/articles/cross-product/) * Vector Calculus: Understanding the Dot Product [Better Explained / Article](https://betterexplained.com/articles/vector-calculus-understanding-the-dot-product/) * Linear Algebra [Khan Academy / Videos](http://bit.ly/khanlinalg) + * Linear Algebra [MIT](http://ocw.mit.edu/courses/mathematics/18-06-linear-algebra-spring-2010/) #### Statistics How can we answer questions with data? Everywhere you look, you'll see methods from statistics. Spend a lot of time here! @@ -120,7 +126,7 @@ A branch of statistics that uses graphical models and specialized statistics to ### Natural Language Processing The imperfect and immensely useful art (science?) of transforming human language into data. * From Languages to Information / Stanford CS147 [Materials](http://bit.ly/nlpcs124) - * NLP with Python (NLTK library) [Digital](http://bit.ly/ebook-nltk), [Book ```$55```](https://bookshop.org/a/2958/9780596516499) + * NLP with Python (NLTK library) [Digital](http://bit.ly/py-nltk), [Book ```$55```](https://bookshop.org/a/2958/9780596516499) * How to Write a Spelling Correcter / Norvig [Tutorial](http://norvig.com/spell-correct.html) ### Graph Analysis @@ -153,6 +159,7 @@ If you have interest in operations management, manufacturing, supply chains, or ### Deep Learning / Neural Networks * Neural Networks [Andrej Karpathy / Python Walkthrough](http://bit.ly/karpathyneuralnets) + * Neural Networks for Machine Learning [Geoffrey Hinton / U Toronto](https://www.youtube.com/playlist?list=PLoRl3Ht4JOcdU872GhiYWf6jwrk_SNhz9) * Deep Learning for Natural Language Processing CS224d [Stanford](http://cs224d.stanford.edu/syllabus.html) ## 🤝 Doing Data Science @@ -197,9 +204,9 @@ A document conveying the motives, direction, investment, and expected value of t #### Results Presentation A slide deck or document with the goal of conveying the results of the work and how the findings support an important decision(s). -Best appended to the Spec, and summarized in a slide deck for easy consumption. Depending on the culture of the group, slides or a short docuemnt may be easier to look through to understand the results of the work. In the remote work era, think about how your work will be passed around and make sure your "above the fold" is easy to understand and clearly conveys the "why" and results in particular. +Best appended to the Spec, and summarized in a slide deck for easy consumption. Depending on the culture of the group, slides or a short document may be easier to look through to understand the results of the work. In the remote work era, think about how your work will be passed around and make sure your "above the fold" is easy to understand and clearly conveys the "why" and results in particular. -__Example__: A particularly polished [presentation](https://medium.com/lyft-engineering/how-lyft-discovered-openstreetmap-is-the-freshest-map-for-rideshare-a7a41bf92ec) of [map quality study results](https://drive.google.com/file/d/1Sb-dOUjeP1Ljqz4ra931D3Pe8B5C3pde/view) showing higher data quality in US maps on OSM than commercially available alternatives. The impact of this work was a) increased confidence in service reliability and b) enabled the company to decide against buying a commercially available annual license costing ~$10mi/yr. +__Example__: A particularly polished [presentation](https://medium.com/lyft-engineering/how-lyft-discovered-openstreetmap-is-the-freshest-map-for-rideshare-a7a41bf92ec) of [map quality study results](https://drive.google.com/file/d/1Sb-dOUjeP1Ljqz4ra931D3Pe8B5C3pde/view) showing higher data quality in US maps on OSM than commercially available alternatives. The impact of this work was a) increased confidence in service reliability for the company and b) enabled the company to decide against buying a commercially available annual license costing millions of dollars annually. ## 🧑‍💻 Capstone Project _Choose a meaningful project or dataset to demonstrate what you've learned._ @@ -229,9 +236,8 @@ Show the process you used to disprove your hypothesis, preferably in a jupyter n * Exploratory Data Analysis [Tukey / Book ```$81```](http://amzn.to/1kNUEPa) [```$113```](https://bookshop.org/books/exploratory-data-analysis-classic-version/9780134995458) * Mining Massive Data Sets / Stanford [Course & Digital Textbook](http://bit.ly/mmds-course) & [Book ```$58```](https://bookshop.org/a/2958/9781108476348) * Introduction to Information Retrieval / Stanford [Digital](http://bit.ly/ebook-stanford-inforetrieval) & [Book ```$70```](https://bookshop.org/a/2958/9780521865715) - * [Data Science in IPython Notebooks](http://bit.ly/ipynb-ds) (Linear Regression, Logistic Regression, Random Forests, K-Means Clustering) * Probabilistic Graphical Models [Stanford / Coursera](http://bit.ly/stanford-pgm) - * Differential Equations in Data Science [Python Tutorial](http://bit.ly/ipynb-differentialeq) + * Differential Equations in Data Science [Python Tutorial](https://web.archive.org/web/20190617023702/https://nbviewer.jupyter.org/github/URXtech/techblog/blob/master/continuousTimeMarkovChain/markovChain.ipynb) * Algorithm Design, Kleinberg & Tardos [Book ```$125```](http://amzn.to/1iMnWm5) * [Tidy Data in Python](http://www.jeannicholashould.com/tidy-data-in-python.html) * Designing, Visualizing and Understanding Deep Neural Networks [Berkeley CS294-129](https://bcourses.berkeley.edu/courses/1453965/pages/cs294-129-designing-visualizing-and-understanding-deep-neural-networks) @@ -243,6 +249,7 @@ Show the process you used to disprove your hypothesis, preferably in a jupyter n * SQL Tutorials [SQLZOO / Tutorials](http://bit.ly/tut-sqlzoo) * Machine Learning [Caltech / Edx](http://bit.ly/caltech-ml) * A Course in Machine Learning [UMD / Digital Book](http://bit.ly/22WyV3N) + * Designing Data Intensive Applications [Book ```$56```](https://bookshop.org/a/2958/9781449373320) *** diff --git a/datasets.md b/datasets.md index 1e7051f3..b299592e 100644 --- a/datasets.md +++ b/datasets.md @@ -25,6 +25,7 @@ * [Qandl](http://www.quandl.com) provides a lot of interesting data with a clean API. * [Time Series Data Library](http://datamarket.com/data/list/?q=provider:tsdl) * USA Congressional Voting Records [Voteview](http://voteview.org/downloads.asp) +* [Traffic data](https://github.com/graphhopper/open-traffic-collection) ### Datasets Sources diff --git a/r-resources.md b/r-resources.md index cdfa2414..0cbdfd6f 100644 --- a/r-resources.md +++ b/r-resources.md @@ -13,7 +13,7 @@ _[Note: The core of The Open Source Data Science Masters focuses on programmatic #### Basic Statistics with R - * An Introduction to Statistical Learning [Book pdf](http://www-bcf.usc.edu/~gareth/ISL/ISLR%20First%20Printing.pdf) ^also a Machine Learning resource + * An Introduction to Statistical Learning [Book pdf](https://www.statlearning.com/) ^also a Machine Learning resource #### Data Science with R * Introduction to Data Science [Syracuse University / ebook](http://jsresearch.net/index.html) diff --git a/scott-davis-transcript.md b/scott-davis-transcript.md new file mode 100644 index 00000000..e054318a --- /dev/null +++ b/scott-davis-transcript.md @@ -0,0 +1,133 @@ +

Scott Davis Transcript

+

Open Source Data Science Masters

+ +
I'm going to have some time for indepedent study this year so I plan to work through as much as possible. I work in the real estate industry and we have so much data that isn't used for meaningful analysis and the tools, though readily available, haven't caught up for the needs of real estate users. That's what I'm interested in working on. I use a lot of GIS and R, so my curriculum is tailored to follow [R](https://www.r-project.org/)/[Python](www.python.org) and [QGIS](www.qgis.org). I'm a bit of an open-source nut so I like learning much better this way. I'm looking for people to connect with, and possibly to work on projects.
+ +Want to collaborate? Get in touch: + * [linkedin](http://www.linkedin.com/in/scottcdavis); + * [twitter](http://www.twitter.com/scottdavisCRE); or + * [email](mailto:scott@tisonadevelopment.com) + + +

Open Source Curriculum

+

Base Introduction

+Data Science Introductions + - [ ] Intro to Data Science by UW / Coursera, online course + - [ ] Data Science Specialization by Johns Hopkins / Coursera + - [X] [Data Scientists Toolbox](https://www.coursera.org/account/accomplishments/certificate/UY4EBM46HL) + - [X] [R Programming](https://www.coursera.org/account/accomplishments/records/Va5vuEvGKyr7UyHEL) + - [X] [Getting and Cleaning Data](https://www.coursera.org/account/accomplishments/records/ENSGmvNfx24sANRW) + - [X] [Exploratory Data Analysis](https://www.coursera.org/account/accomplishments/records/2PPsRu2Us3sUehBQ) + - [X] [Reproducible Research] + - [ ] [Statistical Inference] (in progress) + - [ ] [Regression Models] (in progress) + - [X] [Practical Machine Learning] + - [ ] [Developing Data Products] + - [ ] [Data Science Capstone] +- [ ] [Data Science by Harvard](http://cs109.github.io/2015/) (online course) +- [ ] [Data Science with Open Source Tools](http://shop.oreilly.com/product/9780596802363.do) +- [50 Years of Data Science](http://pages.cs.wisc.edu/~anhai/courses/784-fall15/50YearsDataScience.pdf) +- [ ] [Datasmart](http://www.amazon.com/Data-Smart-Science-Transform-Information/dp/111866146X/ref=sr_1_1?s=books&ie=UTF8&qid=1458768727&sr=1-1&keywords=datasmart) - in Excel, but also works in LibreOffice and so much of business analytics is still in Excel. + + +

Mathematics/Statistics

+ - [ ] [Statistics for Spatial Data, Revised Edition](http://www.wiley.com/WileyCDA/WileyTitle/productCd-1119114616.html) + - [ ] [Statistics for Spatio-Temporal Data](http://www.wiley.com/WileyCDA/WileyTitle/productCd-EHEP002348.html) + - [ ] [Linear Algebra](http://www.amazon.com/Linear-Algebra-Dover-Books-Mathematics/dp/048663518X) + - [ ] Problem-Solving Heuristics: [How to Solve It](http://www.amazon.com/How-Solve-It-Mathematical-Princeton/dp/069111966X) + +

Computing

+R: + - [ ] [R in Action](https://www.manning.com/books/r-in-action-second-edition?a_bid=5c2b1e1d&a_aid=RiA2ed) + - [ ] [R Cookbook](http://shop.oreilly.com/product/9780596809164.do) + - [ ] [Forecasting: Principles and Practice](http://otexts.com/fpp/) + +R Libraries/Task Views + * [ProjectTemplate](http://projecttemplate.net/index.html) + * Spatial Data [CRAN Task View: Analysis of Spatial Data](https://cran.r-project.org/web/views/Spatial.html) + * Spatio-Temporal Data [CRAN Task View: Handling and Analyzing Spatio-Temporal Data](https://cran.r-project.org/web/views/SpatioTemporal.html) + * Optimization [CRAN Task View: Optimization and Mathematical Programming](https://cran.r-project.org/web/views/Optimization.html) + * Finance [CRAN Task View: Empirical Finance](https://cran.r-project.org/web/views/Finance.html) + +Python: + - [ ] [Dive Into Python](http://www.diveintopython.net/) + - [ ] [Google's Python Class](code.google.com/edu/languages/google-python-class/) + - [ ] [Python for Data Analysis](http://shop.oreilly.com/product/0636920023784.do) + - [ ] [Webscraping with Python](https://www.packtpub.com/big-data-and-business-intelligence/web-scraping-python) + +QGIS: + - [X] [QGIS Tutorials and Tips](http://www.qgistutorials.com/en/) + - [X] [Mastering QGIS](https://www.packtpub.com/application-development/mastering-qgis) + - [ ] [Building Mapping Applications with QGIS](https://www.packtpub.com/application-development/building-mapping-applications-qgis) + - [ ] [GIS Tutorial Workbook 1](https://esripress.esri.com/display/index.cfm?fuseaction=display&websiteID=232&moduleID=1) This is for ArcView, but you can work the examples in QGIS too + - [ ] [GIS Tutorial Workbook 2: Spatial Analysis](https://esripress.esri.com/display/index.cfm?fuseaction=display&websiteID=230&moduleID=0) This is for ArcView, but you can work the examples in QGIS too + - [ ] [QGIS Map Design](https://locatepress.com/qmd) I've just thumbed through this, but it's beautiful and belongs on any list of GIS books. + +MySQL: + - [ ] [Learn MySQL in One Video](https://www.youtube.com/watch?v=yPu6qV5byu4) + - [ ] [MySQL Workbench Starter](code.google.com/edu/languages/google-python-class/) + +Octave: + - [ ] [GNU Octave Beginners Guide](https://www.packtpub.com/big-data-and-business-intelligence/gnu-octave-beginners-guide) + - +PostGIS/PostGRESQL: + - [ ] [PostGIS Essentials](https://www.packtpub.com/big-data-and-business-intelligence/postgis-essentials) + - [ ] [PostGRESQL Tutorial](http://www.postgresqltutorial.com/) + - [ ] [PostgreSQL: Up and Running: A Practical Introduction to the Advanced Open Source Database](http://shop.oreilly.com/product/0636920032144.do) + +

Algorithms

+ - [ ] [Algorithms Design & Analysis](http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=IntroToAlgorithms) Stanford openclassroom + +

Distributed Computing Paradigms

+ - [ ] Intro to Hadoop and MapReduce by Cloudera and Udacity +*Note: I might swap the above course with an EdX course on Apache Spark and distributed computing* + +

Data Mining

+ - [ ] Mining Massive Data Sets, by Stanford and Coursera + - [ ] [Clean Data](https://www.packtpub.com/big-data-and-business-intelligence/clean-data) + +

Machine Learning/Predictive Analytics - Foundational/Theoretical/Practical

+ - [ ] Machine Learning, by Ng Stanford and Coursera (NB this class requires a lot of higher level math) + - [ ] [An Introduction to Statistical Learning with Applications in R](http://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/) (by the authors of The Elements of Statistical Learning at Stanford.) + - [ ] [Machine Learning with R](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-r-second-edition) + - [ ] [Building a Recommendation System in R](https://www.packtpub.com/big-data-and-business-intelligence/building-recommendation-system-r) + - [ ] [Mastering Predictive Analytics in R](https://www.packtpub.com/application-development/mastering-predictive-analytics-r) + - [ ] [Bootstrapping Machine Learning](http://www.louisdorard.com/machine-learning-book/) + - [ ] [Applied Predictive Modeling](http://www.amazon.com/gp/product/1461468485?psc=1&redirect=true&ref_=oh_aui_detailpage_o08_s00) + +

Analysis

+ - [ ] [Practical Data Science Cookbook](http://www.diveintopython.net/) + - [ ] [R Data Analysis Cookbook](code.google.com/edu/languages/google-python-class/) + +

Spatial Analysis

+ - [ ] [An Introduction to R for Spatial Analysis and Mapping](https://us.sagepub.com/en-us/nam/an-introduction-to-r-for-spatial-analysis-and-mapping/book241031) + - [ ] [Applied Spatial Data Analysis with R](http://www.springer.com/us/book/9781461476177) + +

Land Use/Transport/Gravity Modeling

+ - [ ] [Integrated Land Use and Transport Modelling: Decision Chains and Hierarchies](http://www.amazon.com/gp/product/0521022177?psc=1&redirect=true&ref_=oh_aui_detailpage_o03_s00) + - [ ] [Gravity and Spatial Interaction Models (Scientific Geography Series)](http://www.amazon.com/gp/product/0803925441?psc=1&redirect=true&ref_=oh_aui_detailpage_o06_s00) + - [ ] [TRANUS Model](http://www.tranus.com/tranus-english) + - [ ] [Urban Sim](https://pypi.python.org/pypi/urbansim) + - [ ] [Huff-tools Package in R](http://rstudio-pubs-static.s3.amazonaws.com/42357_1e6fcc5bcfec439096eb86a106ebf22e.html) + - +

Data Design/Data Viz

+ - [ ] [Beautiful Evidence](http://www.edwardtufte.com/tufte/books_be) + - [ ] [Semiology of Graphics](http://www.amazon.com/Semiology-Graphics-Diagrams-Networks-Maps/dp/1589482611) + - [ ] [Visual Complexity Mapping Patterns of Information](hhttp://www.visualcomplexity.com/vc/book/) + - [ ] [The Visual Display of Quantitative Information](http://www.edwardtufte.com/tufte/books_vdqi) + - [ ] [Design for Information](http://isabelmeirelles.com/book-design-for-information/) + - [ ] [Design Elements: A Graphical Style Manual](http://www.amazon.com/Design-Elements-Graphic-Style-Manual/dp/1592532616) + - [ ] [Storytelling with Data](http://www.amazon.com/gp/product/1119002257?psc=1&redirect=true&ref_=oh_aui_detailpage_o09_s00) + - [ ] [Mastering Python Data Visualization](https://www.packtpub.com/big-data-and-business-intelligence/mastering-python-data-visualization) + - [ ] [The Grammar of Graphics](https://www.packtpub.com/big-data-and-business-intelligence/mastering-python-data-visualization) + - [ ] [R Graphics Cookbook](http://shop.oreilly.com/product/9780596809164.do) + +

Relevant prior studies

+ - [X] MS in Community and Regional Planning, UT-Austin + - [X] BA in Liberal Arts, concentration in geography, UT-Austin + +

OpenSource Data Science Masters Capstone Project

+I'm interesting in using data science approaches for better intelligence behind real estate decisions, specifically evaluating population growth, transactions and location decisions. I'd also like to evaluate statistical learning technqiues to make better pricing decisions. Finally, I'd like to develop a model to optimize real estate portfolios. + +If you'd like to pair up for the capstone, [let me know](http://www.twitter.com/scottdavisCRE) + diff --git a/transcripts/scott-davis-transcript.md b/transcripts/scott-davis-transcript.md new file mode 100644 index 00000000..d6ba2428 --- /dev/null +++ b/transcripts/scott-davis-transcript.md @@ -0,0 +1,138 @@ +

Scott Davis Transcript

+

Open Source Data Science Masters

+ +I'm going to have some time for indepedent study this year so I plan to work through as much as possible. I work in the real estate industry and we have so much data that isn't used for meaningful analysis and the tools, though readily available, haven't caught up for the needs of real estate users. That's what I'm interested in working on. I use a lot of GIS and R, so my curriculum is tailored to follow [R](https://www.r-project.org/)/[Python](www.python.org) and [QGIS](www.qgis.org). I'm a bit of an open-source nut so I like learning much better this way. I'm looking for people to connect with, and possibly to work on projects. Also, maybe not technically purely open source as I've used a lot of books - which I've linked to here. + +Want to collaborate? Get in touch: + * [linkedin](http://www.linkedin.com/in/scottcdavis); + * [twitter](http://www.twitter.com/scottdavisCRE); or + * [email](mailto:scott@tisonadevelopment.com) + + +

Open Source Curriculum

+

Base Introduction

+Data Science Introductions +- [X] [Data Science with Open Source Tools](http://shop.oreilly.com/product/9780596802363.do) +- [X] [Data Science from Scratch](http://shop.oreilly.com/product/0636920033400.do) +- [X] [50 Years of Data Science](http://pages.cs.wisc.edu/~anhai/courses/784-fall15/50YearsDataScience.pdf) +- [X] [Datasmart](http://www.amazon.com/Data-Smart-Science-Transform-Information/dp/111866146X/ref=sr_1_1?s=books&ie=UTF8&qid=1458768727&sr=1-1&keywords=datasmart) - This book is a thorough review of using Excel for data science tools. Every aspiring data scientist should work through this book because (1) you'll learn a lot because Excel makes you do every step and (2) you'll realize you need to learn R or python or some other way to do these analyses. + - [X] [Data Science Specialization by Johns Hopkins / Coursera](https://www.coursera.org/account/accomplishments/specialization/3WN77YYQ7QK7) + - [X] [Data Scientists Toolbox](https://www.coursera.org/account/accomplishments/certificate/UY4EBM46HL) + - [X] [R Programming](https://www.coursera.org/account/accomplishments/records/Va5vuEvGKyr7UyHEL) + - [X] [Getting and Cleaning Data](https://www.coursera.org/account/accomplishments/records/ENSGmvNfx24sANRW) + - [X] [Exploratory Data Analysis](https://www.coursera.org/account/accomplishments/records/2PPsRu2Us3sUehBQ) + - [X] [Reproducible Research](https://www.coursera.org/account/accomplishments/certificate/YRP8NLFYPCV9) + - [X] [Statistical Inference](https://www.coursera.org/account/accomplishments/records/9733QCP94GEF) + - [X] [Regression Models](https://www.coursera.org/account/accomplishments/records/PP8SKS7CPSDC) + - [X] [Practical Machine Learning](https://www.coursera.org/account/accomplishments/certificate/AJJS85KTU6GZ) + - [X] [Developing Data Products](https://www.coursera.org/account/accomplishments/certificate/6QREL457PPKE) + - [X] [Data Science Capstone](https://www.coursera.org/account/accomplishments/certificate/A9M48VWHBAMT) + + +

Mathematics/Statistics

+ - [ ] [Statistics for Spatial Data, Revised Edition](http://www.wiley.com/WileyCDA/WileyTitle/productCd-1119114616.html) + - [ ] [Statistics for Spatio-Temporal Data](http://www.wiley.com/WileyCDA/WileyTitle/productCd-EHEP002348.html) + - [X] [Linear Programming: An Introduction With Applications (Second Edition)](http://www.amazon.com/Linear-Programming-Introduction-Applications-Edition/dp/1463543670?ie=UTF8&psc=1&redirect=true&ref_=oh_aui_detailpage_o01_s00) + - [X] Problem-Solving Heuristics: [How to Solve It](http://www.amazon.com/How-Solve-It-Mathematical-Princeton/dp/069111966X) + +

Computing

+R: + - [ ] [R in Action](https://www.manning.com/books/r-in-action-second-edition?a_bid=5c2b1e1d&a_aid=RiA2ed) + - [ ] [R Cookbook](http://shop.oreilly.com/product/9780596809164.do) + - [X] [Forecasting: Principles and Practice](http://otexts.com/fpp/) + +R Libraries/Task Views + * [ProjectTemplate](http://projecttemplate.net/index.html) + * Spatial Data [CRAN Task View: Analysis of Spatial Data](https://cran.r-project.org/web/views/Spatial.html) + * Spatio-Temporal Data [CRAN Task View: Handling and Analyzing Spatio-Temporal Data](https://cran.r-project.org/web/views/SpatioTemporal.html) + * Optimization [CRAN Task View: Optimization and Mathematical Programming](https://cran.r-project.org/web/views/Optimization.html) + * Finance [CRAN Task View: Empirical Finance](https://cran.r-project.org/web/views/Finance.html) + +Python: + - [X] [Jumpstart Python by Building 10 Apps](https://training.talkpython.fm/courses/details/python-language-jumpstart-building-10-apps) This is probably the best introduction to Python that I have seen. + - [X] [Dive Into Python](http://www.diveintopython.net/) + - [X] [Google's Python Class](code.google.com/edu/languages/google-python-class/) + - [X] [Introduction to Python for Data Science - edx](https://courses.edx.org/courses/course-v1:Microsoft+DAT208x+2T2016/info) + - [X] [Python for Data Analysis](http://shop.oreilly.com/product/0636920023784.do) + - [X] [Webscraping with Python](https://www.packtpub.com/big-data-and-business-intelligence/web-scraping-python) + +QGIS: + - [X] [QGIS Tutorials and Tips](http://www.qgistutorials.com/en/) + - [X] [Mastering QGIS](https://www.packtpub.com/application-development/mastering-qgis) + - [X] [QGIS 2.0 Cookbook](https://www.packtpub.com/application-development/qgis-2-cookbook) Advanced data management, data visualization and spatial analysis techniques with QGIS. + - [X] [Building Mapping Applications with QGIS](https://www.packtpub.com/application-development/building-mapping-applications-qgis) + - [X] [GIS Tutorial Workbook 1](https://esripress.esri.com/display/index.cfm?fuseaction=display&websiteID=232&moduleID=1) This is for ArcView, but you can work the examples in QGIS too + - [X] [GIS Tutorial Workbook 2: Spatial Analysis](https://esripress.esri.com/display/index.cfm?fuseaction=display&websiteID=230&moduleID=0) This is for ArcView, but you can work the examples in QGIS too + - [ ] QGIS Python Programming Cookbook (https://www.packtpub.com/application-development/qgis-python-programming-cookbook) Automated desktop QGIS processing. + - [ ] [QGIS Map Design](https://locatepress.com/qmd) I've just thumbed through this, but it's beautiful and belongs on any list of GIS books. + + +MySQL: + - [X] [Learn MySQL in One Video](https://www.youtube.com/watch?v=yPu6qV5byu4) + - [X] [MySQL Explained](https://www.ostraining.com/books/mysql/about/) + +Octave: + - [ ] [GNU Octave Beginners Guide](https://www.packtpub.com/big-data-and-business-intelligence/gnu-octave-beginners-guide) + +PostGIS/PostGRESQL: + - [ ] [PostGIS Essentials](https://www.packtpub.com/big-data-and-business-intelligence/postgis-essentials) + - [ ] [PostGRESQL Tutorial](http://www.postgresqltutorial.com/) + - [ ] [PostgreSQL: Up and Running: A Practical Introduction to the Advanced Open Source Database](http://shop.oreilly.com/product/0636920032144.do) + +

Algorithms

+- [ ] Data Structures and Algorithms by UCSD / Coursera [Decided not to take the balance of the specialization) + - [X] [Algorithmic Toolbox] in progress (https://www.coursera.org/account/accomplishments/certificate/RUKKXTCFDAPV) + +

Data Mining

+ - [ ] [Clean Data] (https://www.packtpub.com/big-data-and-business-intelligence/clean-data) + +

Machine Learning/Predictive Analytics - Foundational/Theoretical/Practical

+ - [ ] [Statistical Learning with Trevor Hastie and Robert Tibshirani](http://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/) + - [ ] [An Introduction to Statistical Learning with Applications in R](http://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/) (by the authors of The Elements of Statistical Learning at Stanford.) + - [ ] [Machine Learning with R](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-r-second-edition) + - [ ] [Building a Recommendation System in R](https://www.packtpub.com/big-data-and-business-intelligence/building-recommendation-system-r) + - [ ] [Mastering Predictive Analytics in R](https://www.packtpub.com/application-development/mastering-predictive-analytics-r) + - [X] [Bootstrapping Machine Learning](http://www.louisdorard.com/machine-learning-book/) + - [ ] [Applied Predictive Modeling](http://www.amazon.com/gp/product/1461468485?psc=1&redirect=true&ref_=oh_aui_detailpage_o08_s00) + +

Analysis

+ - [ ] [Practical Data Science Cookbook](https://www.packtpub.com/big-data-and-business-intelligence/practical-data-science-cookbook) + - [X] [R Data Analysis Cookbook](http://www.amazon.com/Data-Analysis-Cookbook-Recipes-Deliver/dp/1783989068) + - [X] [Python Data Science Essentials](https://www.packtpub.com/big-data-and-business-intelligence/python-data-science-essentials) + +

Spatial Analysis

+ - [ ] [An Introduction to R for Spatial Analysis and Mapping](https://uk.sagepub.com/en-gb/eur/an-introduction-to-r-for-spatial-analysis-and-mapping/book241031) + - [ ] [Applied Spatial Data Analysis with R](http://www.springer.com/us/book/9781461476177) + - [ ] [Geospatial Analysis - 5th Edition, 2015 - de Smith, Goodchild, Longley](http://www.spatialanalysisonline.com/HTML/index.html) + - [X] [Learning Geospatial Analysis with Python](https://www.packtpub.com/application-development/learning-geospatial-analysis-python) + - [X] [Python Geospatial Development - Second Edition](https://www.packtpub.com/application-development/python-geospatial-development-second-edition) + +

Land Use/Transport/Gravity Modeling

+ - [ ] [Integrated Land Use and Transport Modelling: Decision Chains and Hierarchies](http://www.amazon.com/gp/product/0521022177?psc=1&redirect=true&ref_=oh_aui_detailpage_o03_s00) + - [ ] [Gravity and Spatial Interaction Models (Scientific Geography Series)](http://www.amazon.com/gp/product/0803925441?psc=1&redirect=true&ref_=oh_aui_detailpage_o06_s00) + - [ ] [TRANUS Model](http://www.tranus.com/tranus-english) + - [ ] [Urban Sim](https://pypi.python.org/pypi/urbansim) + - [ ] [Huff-tools Package in R](http://rstudio-pubs-static.s3.amazonaws.com/42357_1e6fcc5bcfec439096eb86a106ebf22e.html) + + +

Data Design/Data Viz

+ - [ ] [Beautiful Evidence](http://www.edwardtufte.com/tufte/books_be) + - [ ] [Semiology of Graphics](http://www.amazon.com/Semiology-Graphics-Diagrams-Networks-Maps/dp/1589482611) + - [ ] [Visual Complexity Mapping Patterns of Information](http://www.visualcomplexity.com/vc/book/) + - [ ] [The Visual Display of Quantitative Information](http://www.edwardtufte.com/tufte/books_vdqi) + - [ ] [Design for Information](http://isabelmeirelles.com/book-design-for-information/) + - [ ] [Design Elements: A Graphical Style Manual](http://www.amazon.com/Design-Elements-Graphic-Style-Manual/dp/1592532616) + - [X] [Storytelling with Data](http://www.amazon.com/gp/product/1119002257?psc=1&redirect=true&ref_=oh_aui_detailpage_o09_s00) + - [ ] [Mastering Python Data Visualization](https://www.packtpub.com/big-data-and-business-intelligence/mastering-python-data-visualization) + - [ ] [The Grammar of Graphics](http://www.springer.com/us/book/9780387245447) + - [X] [R Graphics Cookbook](http://shop.oreilly.com/product/9780596809164.do) + +

Relevant prior studies

+ - [X] MS in Community and Regional Planning, UT-Austin + - [X] BA in Liberal Arts, concentration in geography, UT-Austin + +

OpenSource Data Science Masters Capstone Project

+I'm interesting in using data science approaches for better intelligence behind real estate decisions, specifically evaluating population growth, transactions and location decisions. I'd also like to evaluate statistical learning technqiues to make better pricing decisions. Finally, I'd like to develop a model to optimize real estate portfolios. + +If you'd like to pair up for the capstone, [let me know](http://www.twitter.com/scottdavisCRE) +