It's been said that data is the new "dirt"—the raw material from which and on which you build the structures of the modern world. And like dirt, data can seem like a limitless, undifferentiated mass. The ability to take raw data, access it, filter it, process it, visualize it, understand it, and communicate it to others is possibly the most essential business problem for the co...
It's been said that data is the new "dirt"—the raw material from which and on which you build the structures of the modern world. And like dirt, data can seem like a limitless, undifferentiated mass. The ability to take raw data, access it, filter it, process it, visualize it, understand it, and communicate it to others is possibly the most essential business problem for the coming decades.
"Machine learning," the process of automating tasks once considered the domain of highly-trained analysts and mathematicians, is the key to efficiently extracting useful information from this sea of raw data. By implementing the core algorithms of statistical data processing, data analysis, and data visualization as reusable computer code, you can scale your capacity for data analysis well beyond the capabilities of individual knowledge workers.
Machine Learning in Action is a unique book that blends the foundational theories of machine learning with the practical realities of building tools for everyday data analysis. In it, you'll use the flexible Python programming language to build programs that implement algorithms for data classification, forecasting, recommendations, and higher-level features like summarization and simplification.
As you work through the numerous examples, you'll explore key topics like classification, numeric prediction, and clustering. Along the way, you'll be introduced to important established algorithms, such as Apriori, through which you identify association patterns in large datasets and Adaboost, a meta-algorithm that can increase the efficiency of many machine learning tasks.
Peter Harrington holds Bachelors and Masters Degrees in Electrical Engineering. He worked for Intel Corporation for seven years in California and China. Peter holds five US patents and his work has been published in three academic journals. He is currently the chief scientist for Zillabyte Inc. Peter spends his free time competing in programming competitions, and building 3D pr...
Peter Harrington holds Bachelors and Masters Degrees in Electrical Engineering. He worked for Intel Corporation for seven years in California and China. Peter holds five US patents and his work has been published in three academic journals. He is currently the chief scientist for Zillabyte Inc. Peter spends his free time competing in programming competitions, and building 3D printers.
目录
· · · · · ·
Part 1: Classification
1 Machine learning basics
2 Classifying with k-nearest neighbors
3 Splitting datasets one feature at a time: decision trees
4 Classifying with probability distributions: Na�ve Bayes
5 Logistic regression
· · · · · ·
(更多)
Part 1: Classification
1 Machine learning basics
2 Classifying with k-nearest neighbors
3 Splitting datasets one feature at a time: decision trees
4 Classifying with probability distributions: Na�ve Bayes
5 Logistic regression
6 Support vector machines
7 Improving classification with a meta-algorithm: Adaboost
Part 2: Forecasting numeric values with regression
8 Predicting numeric values: regression
9 Tree-based regression
Part 3: Unsupervised learning
10 Grouping unlabeled items using k-means clustering
11 Association analysis with the Apriori algorithm
12 Efficiently finding frequent itemsets with FP-Growth
Part 4 Additional tools
13 Using principal components analysis to simplify our data
14 Simplifying data with the singular value decomposition
15 Big data and MapReduce
· · · · · · (收起)
Pros: High accuracy, insensitive to outliers, no assumptions about data
Cons: Computationally expensive, requires a lot of memory
Works with: Numeric values, nominal values
The first machine-learning algorithm we’ll look at is k-Nearest Neighbors (kNN). It
works like this: we have an existing set of example data, our training set. We have
labels for all of this data—we know what class each piece of the data should fall into.
When we’re given a new piece of data without a label, we compare that new piece of
data to the existing data, every piece of existing data. We then take the most similar
pieces of data (the nearest neighbors) and look at their labels. We look at the top k
most similar pieces of data from our known dataset; this is where the k comes from. (k
is an integer and it’s usua... (查看原文)
Pros: Computationally cheap to use, easy for humans to understand learned results,
missing values OK, can deal with irrelevant features
Cons: Prone to overfitting
Works with: Numeric values, nominal values (查看原文)
1 有用 算文解字 2016-03-17 02:54:45
随便翻翻,当复习Python和相关库了。适合初学者。
0 有用 阿拉丁 2013-01-07 14:46:43
看这书可以同时入门机器学习,python,mapreduce,作者可以几个方面都讲清楚,真不容易
0 有用 安德 2017-06-22 02:48:01
哇FP growth简直美
0 有用 iphyer 2014-03-15 15:26:18
多种书籍共同收获
0 有用 剑南 2013-04-30 19:22:04
读了LR,ada boost,略读了svm,psvm。数学渣子的福音,码农最爱的实例。 虽然大家都说写的不好,不过入个门还是不错。