机器学习早期版本(岁月觞):吴恩达机器学习系列课程
变化一下,转成 凸函数(convex function)。
越高阶的,参数越大,平衡后,更趋向于低阶,从而达到正则化的目的。
看了吴恩达和李飞飞的采访,目前的人工智能距离 AGI 还存在无法逾越的鸿沟。人类目前对大脑的工作原理不得而知,大脑的每个神经元是一个活体细胞,而大模型的每个神经元还是很简陋的函数。宇宙为什么会诞生生命?意识来源是啥?不得而知。
It's still useful to understand how they work under the hood. So that if something unexpected happends, You have a better chance of knowing how to fix it.
ReLU: rectified linear unit.
The relu function kind of goes flat only in one part of the graph, Whereas the sigmoid activation function, It kind of goes flat in two places.
And if you're using gradient descent to train a neural network, Then when you have a function that is flat in a lot of places, Gradient descent will be really slow.
相当于是为了计算更精确,所以把 softmax 函数运算放到了损失函数里面,而此时模型预测出来的值不再是概率, 必须再经过 softmax 函数还原成预测概率。
每个维度一个学习率,自动调整。 Adam Algorithm Intuition
Adam: Adaptive Moment estimation
搞机器学习最重要的就是分配你的时间。 应该搞数据还是搞算法。
Diagnostic: A test that you run to gain insight into whatis/isn't working with a learning algorithm, to gain guidanceinto improving its performance.
Diagnostics can take time to implement but doing so can bea very good use of your time.
if a learning algorithm suffersfrom high bias, getting moretraining data will not (byitself) help much.
if a learning algorithm suffersfrom high variance, gettingmore training data is likely tohelp.
建立基线作为参考, 这里应该存在三种情况:
sometimes, luck is essential
一时学习,一生去理解。 大体是说,有一个博士生,偏差和方差一生去理解。 07:00 Andrew Ng: One of my phd students form stanford, Many years after he had already graduated from stanford, Once said to be that while he was studying at stanford, He learned about bias and variance, But he understood it, But that susequently, After many years of work experience and a few different companies, He realized that, Bias and variance is one of those concepts that takes a short time to learn, Beut it takes a lifetime to master.
Andrew Ng: “One of my PhD students from Stanford, many years after he had already graduated from Stanford, once said to me that while he was studying at Stanford, he learned about bias and variance. He understood it at the time, but subsequently, after many years of work experience at a few different companies, he realized that bias and variance are concepts that take a short time to learn, but a lifetime to master.”
看到 13.2 误差分析 00:24
数据增强,随机噪声意义不大,因为这些情况在测试集里面不会出现。
用于照片 OCR 的人工数据合成 Artificial data synthesis for photo OCR
traditional procedure of statistical study, machine learning <-> statistics
F1 score
Entropy as a measure of impurity 熵作为杂质的度量
水。
热编码 (One-Hot)
有点像贪心算法,不断的按照方差减少最多的选择持续分裂。
ensemble of trees 学习是一件很困难的事情,特别是那种纯英文的,稍不留意一走神,就不知道在讲啥了。 需要集中精力 ++。
随机森林使用一种名为 Bagging 的技术,通过数据集的随机自助抽样样本并行构建完整的决策树。最终得到的预测结果是所有决策树预测结果的平均值。
整个过程有点像正则化的过程。 和 dropout 思想有相似性。
和残差网络有点像,把那些分类不好的重点关注重新分类。
eXtreme Gradient Boosting https://www.nvidia.cn/glossary/data-science/xgboost/
如何向 10 岁小孩解释 XGBoost 回归算法 极端梯度提升(XGBoost)的理论基础
XGBoost 是一种提升树模型,即将许多树模型集成在一起形成一个很强的分类器。
算法思想是不断地添加树,不断地进行特征分裂生成一棵树,每次添加一棵树,就是学习一个新函数,去拟合上次预测的残差,当训练完成就得到了 k 棵树。
根据样本特征在每棵树中会落到对应的一个叶子节点,每个叶子节点对应的一个分数就是该样本的预测分数,将每棵树对应的分数加起来就是该样本的预测值。
XGBoost 工作原理:
XGBoost 跟 ResNet 还有点像,不断计算模型的“残差”,相对模型的黑盒,这个更偏向于可解释的白盒。 每次都以当前预测为基准,下一个弱分类器去拟合误差函数对预测值的残差(预测值与真实值之间的误差)。 损失函数的泰勒展开:XGBoost 利用损失函数的二阶泰勒展开进行优化,提升计算效率。
Decision Trees and Tree ensumbles
Neural Networks
在表格数据上,基于树的模型仍然优于深度学习方法。 表格数据具有特征不均匀、样本量小、极值较大等特点,因此很难找到相应的不变量。 基于树的模型不可微,不能与深度学习模块联合训练。 在表格数据上,使用基于树的方法比深度学习(甚至是现代架构)更容易实现良好的预测。
越来越感觉随机森林算法是一种变相的正则化,而 XGBoost 算一种残差算法。
手肘法 缺陷:大部分情况,可能并不存在明确的肘部。
m-1 是样本方差,m 是二阶中心矩。
多元正态分布。
Anomaly detection | Supervised learning |
---|---|
Very small number of positive examples (y=1). (0-20 is common). Large number of negative (y=0) examples. 数据量非常小的时候 | Large number of positive and negative examples. 20 positive examples |
Many different "types" of anomalies. Hard for any algorithm to learn from positive examples what the anomalies look like; future anomalies may look nothing like any of the anomalous examples we've seen so far. 可以处理没见过的错误 Fraud | Enough positive examples for algorithm to get a sense of what positive examples are like, future positive examples likely to be similar to ones in training set. Spam |
Fraud detection | Email spam classification |
Manufacturing - Finding new previously unseen defects in manufacturing. (e.g. aircraft engines) | Manufacturing - Finding known, previously seen defects |
Monitoring machines in a data center | Weather prediction (sunny / rainy / etc.) |
… | Diseases classification |
这里有点 SVM 的思想。
二分类问题。
Collaborative filtering:
Content-based filtering:
有点 CLIP 的思想。
CLIP & FAISS!!
经典,。
2024-09-05: review
参考资料快照