如果你想学数据科学,这 7 类资源千万不能错过

​7 Resources for Those Wanting to Learn Data Science 如果你想学数据科学,这 7 类资源千万不能错过
# 7 Resources for Those Wanting to Learn Data Science

Sometimes all you need to know is how to get started. Here’s what worked for me.

0_9XYGDTV7rkWHakOq.jpg

Src: https://tinyurl.com/y7bgtyjo

Data Science happens as a natural consequence of multiple skills and experiences acquired by working with computers, maths, people and businesses. Some will develop these skills naturally from multiple experiences along many years… But what if there’s a shortcut?

I’ve decided to compile the top 7 resources that I reckon as my fundamental steps in my personal journey towards data science. The resources below are to engender both the interest and the intuition required for dealing with the data and the science involved.

Source: https://timoelliott.com/blog/ | 学习数据科学的7个资源

有时你只需要知道如何开始即可,以下是那些适用于我的经验。

图片来源:https://tinyurl.com/y7bgtyjo

数据科学从一开始就是一个交叉学科,要求从业者在计算机、数学领域具备一定的技能,同时还要具备在同人与生意打交道的经验。有些人会在多年的工作生活中慢慢累积相关的经验,但是如果那儿有一个捷径存在呢?

现在我决定同大家分享我在数据科学这条路上帮我打基础的7大资源。选取的这些资源即兼顾了趣味儿性的同时也兼顾了包含在内的数据与科学原理。

图片来源 https://timoelliott.com/blog/

|
| ### A little bit of context

The key word in “Data Science” is not Data, it is Science — Jeff Leek

Jeff Leek, Professor at Johns Hopkins Bloomberg School of Public Health, wrote 5 years ago: “the key word in data science is not ‘data’; it is ‘science’. Data science is only useful when the data are used to answer a question. That is the science part of the equation. The problem with this view of data science is that it is much harder than the view that focuses on data size or tools. It is much, much easier to calculate the size of a data set and say ‘My data are bigger than yours’ or to say, ‘I can code in Hadoop, can you?’ than to say, ‘I have this really hard question, can I answer it with my data?’.”

Data Science is old. John Graunt did it when it was cool. Literally cool. It was during the Little Ice Age in the XV century that he developed one of the first works in demography using probabilistic models. By 1960 the subject was already very mature and called datalogy, but it was only in 2012 when Harvard Business Review published the article “Data Scientist: the Sexiest Job of the 21st Century” that the term — and the job — increased in popularity. A great offer of online courses on the subject would be available only two years after this. Google Trends shows something interesting: although the term “Statistician” was trending downwards since 2004, the term “Data Scientist” had a stronger positive movement after HBR’s article and it was boosted when the offer of online courses increased.

https://trends.google.com/trends/explore?date=all&q=Data%20Scientist,%2Fm%2F0c_xl

With that in mind, I need to make it clear: data scientists are not statisticians and they don’t replace statisticians (and vice-versa), but a genuine interest in statistics and maths is key for achieving proper data science. I am not a statistician myself, but the very first resource below made me love the theme.

Please also observe how the resources below focus more on “curiosity” and “understanding” rather than “applying”.

Img source: https://towardsdatascience.com/introduction-to-statistics-e9d72d818745

Do you like lists?

In a nutshell the 7 resources are:

  1. The Drunkard’s Walk, book by Leonard Mlodinow
  2. Machine Learning Course, Created by Stanford University and taught by Dr. Andrew Ng
  3. Introduction to Mathematical Thinking, also by Stanford and taught by Dr. Keith Devlin
  4. Coding
  5. A prejudice-free review on Maths
  6. People
  7. Reading about Heterodox and Orthodox Economics

The list is neither in chronological nor climactic order. A lot of these things will happen in parallel and items 4 to 7 will almost certainly be part of your routine as a Data Scientist forever! | ## 写在前面

“数据科学”的关键词不是数据,而是科学。——杰夫·莱克

约翰霍普金斯大学彭博公共卫生学院教授Jeff Leek在5年前写道:“数据科学中的关键词不是\'数据\',是\'科学\'。仅在使用数据回答问题时数据科学才有用。这是等式的科学部分。这种数据科学观点的问题在于它比关注数据大小或工具的观点要困难得多。计算数据集的大小要容易得多,并且说“我的数据比你的大”,或者说“我能在Hadoop中编码,你能吗?” 而不是说‘我有这个非常难的问题,我可以用我的数据回答吗?’。”

数据科学是个老概念了,John Graunt在它还是个新概念的时候就研究它了。正是在15世纪的小冰期,他利用概率模型开发了人口统计学的第一部作品。到1960年,该主题已经非常成熟并且被称为数据,但是仅在2012年哈佛商业评论发表文章“数据科学家:21世纪最性感的工作”后这个词和这个工作才越来越受欢迎。此后仅两年,就可以获得关于该主题的大量在线课程。谷歌趋势显示了一些有趣的东西:尽管自2004年以来“统计学家”一词呈下降趋势,但“数据科学家”这一术语在哈佛商业评论的文章之后有了更强烈的积极变动,并且当在线课程的提供增加时,它得到了提升。

https://trends.google.com/trends/explore?date=all&q=Data%20Scientist,%2Fm%2F0c_xl

考虑到这一点,我要说清楚:数据科学家不是统计学家,并且他们也不会代替统计学家(反之亦然),但对统计学和数学的真正兴趣才是正确进行数据科学的关键。我自己不是个统计学家,但是下面的第一个资源让我喜欢上这个主题。

还请观察下面资源如何更多地关注“好奇心”和“理解”而不是“应用”。

图片来源:https://towardsdatascience.com/introduction-to-statistics-e9d72d818745

你喜欢清单吗?

简而言之,7个资源是:

  1. The Drunkard’s Walk,Leonard Mlodinow的书
  2. Machine Learning Course,由斯坦福大学创建并由Andrew Ng执教
  3. Introduction to Mathematical Thinking,同样由斯坦福大学创建并由Keith Devlin执教
  4. Coding
  5. A prejudice-free review on Maths
  6. People
  7. Reading about Heterodox and Orthodox Economics

该清单既不是按照时间顺序也不是按重要性顺序,很多这些事情会同时发生并且4-7项几乎肯定会并且永远成为你作为数据科学家日常工作的一部分。 |
| ### Let’s jump into details…

The Drunkard’s Walk

Many years ago (2009?) I wasn’t really impressed when I’ve got this book as a Secret Santa present, but it turned out to be one of my all time favourites.

The Drunkard’s Walk is about understanding randomness’ influence in our lives and “reveals the psychological illusions that prevent us understanding everything from stock-picking to wine-tasting”, according to Amazon’s product description.

It has enlightened me to realise how easily one can become a victim of chance and to understand why explaining all factors leading to an occurrence is much simpler than predicting when — or if — similar occurrences will ever happen again.

From the analytics perspective, this book shows the importance of predictive models and demonstrates the history behind statistical analysis, also showing how data can be used to answer hard questions and how some of these questions will remain unanswered. I recall this as my first contact with “data science”.

Another good read in the same style: Algorithms to Live by: The Computer Science of Human Decisions.

Representation of a random walk. Src: http://cu.t-ads.org/python-intro-02-random-walk/drunkard/

Machine Learning

Offered by Stanford University on Coursera, this is the entry door for many in Machine Learning. It was mine. A must have, in my opinion.

It first gives you a revision on key concepts of linear algebra and teaches you basic Matlab (or Octave) programming. Only then the very first concepts of Regression with one variable will be presented. Further on the course will walk you through intriguing parts such as Principal Component Analysis and Neural Networks, but if you are serious about learning it with proper reviews, taking quizzes and grasping concepts beyond the course, this will definitely foster your intuition and teach you key artefacts and the maths used in machine learning.

Example slide from the course. Source: https://tinyurl.com/ybupqzzb

Introduction to Mathematical Thinking

I took discrete maths at uni back in 2008 and I loved it! Even still, I decided to review core concepts and boost skills using this course provided by Stanford University via Coursera. The course covers part of what I saw in the first term back then and it bolstered my analytical thinking capabilities. It can easily become a new paradigm to a lot of people as the mathematical thinking proposed in the course differs a lot from the ordinary reasoning process, especially that of common written language. Besides, the last few sessions can become rather challenging on the mathematical-proofing-side-of-things. Unavoidably it will make you feel like learning a new spoken language.

You won’t mind learning this “new language” because critical thinking is crucial for data science. An important add-on (or painless alternative) to this course is grasping concepts of common fallacies, to avoid — or detect — them during exploratory data analysis or getting insights from data for example.

Master List of Logical Fallacies
utminers.utep.edu/omwilliamson/emgl1311utminers.utep.edu

Lock yourself in a cage for the first few weeks after starting these courses as the excitement can make you over criticise a lot of things you hear and read. Remember the goal: data science!

Dilbert listing some common components in fallacies. Src: https://tinyurl.com/y97w2oej

Coding

First of all, you should learn how to code regardless of your career choice. It’s useful across many areas and can make your life a lot more convenient and fun!

It’s true that we may no longer need any coding at all for simpler Data Science routines on forecasting or clustering due to platforms such as Alteryx, Azure Studio, Dataiku, H2O.ai, or Knime who drastically cut the coding effort. However, expertise in coding and common frameworks will still be (and for a long time) an essential asset for dwelling in the realms of confidence, productivity and precision, particularly when dealing with absurd amounts of data or real-time applications of Machine Learning.

In the beginning your typical questions will be “How do I read this csv into a Data Frame? How do I make simple visualisations? How do I convert all categorical values into numerical? How do I deal with this multi-index data-set?” then at some point you’ll progress to things like “is it ok to use for loopsin here or is there a vectorised way for achieving this? How can I make my algorithm less complex and less expensive in computational power?”

A simple command with Seaborn can help you build nice visualisations (own source) | 来让我看看细节部分吧!

醉汉漫步(译者注:这是一本书,豆瓣地址:https://book.douban.com/subject/3102009/

许多年前(大概是2009年?)当我在圣诞节收到这本书时,对它并没有很深的印象,但是多年后它变成了我一直以来的最爱。

醉汉漫步这本书是帮助我们理解随机性对我们生活造成的影响,亚马逊的产品描述是这样的“它揭露了这些心理错觉是如何阻挡我们理解生活,从选股到品酒”。

它帮助我觉察到我们是多么容易成为概率的受害者,也帮助我理解为什么我们更容易推到理解导致事件发生的所有因素,而预测类似情况会不会再次发生却很难的原因。

从分析者的角度来说,这本书向我们展示了预测模型的重要性也从统计数据的角度证明了历史事件发生的必然性,同样还向我们展示了如何使用数据来解答一些难以理解的问题,以及为什么有些问题仍旧无法获得答案。我把这本书认为是我同“数据科学”的第一次亲密接触。

另外一本很好的同类读物是:《Algorithms to Live by: The Computer Science of Human Decisions.》(《生活中的算法:人类决策中的计算机科学》)

很好的一幅展示随机漫步的图片,来源: http://cu.t-ads.org/python-intro-02-random-walk/drunkard/

机器学习

这门课是斯坦福大学在Coursera上的公开课,是很多朋友也是我在机器学习中的入门课,在我看来这是一个必读的课程。

它会首先帮你复习线性代数中的核心概念,然后会教你基础的Matlab(Octave)编程。完成这些后,才会提出第一个关于一元回归的概念。之后,课程会带你学习一些有趣的部分,比如做数据分析的一些主要方法工具以及神经网络,但是如果你想很认真很认真的学习,那你就要多做练习并且在过程中会接触到很多课外知识,这些都会帮助你建立在机器学习中的直觉与数学能力。

课程中的示例图片,来源:https://tinyurl.com/ybupqzzb

数学思维概论

2008年我的大学阶段我很开心,因为我在这个阶段学习了不同种类的数学。即便这样,我也打算来听斯坦福大学在Coursera上的这门公开课,通过它来帮助我回顾数学中的一些核心概念,同时增强我的数学技能。这门课不仅覆盖了我在第一学期中学到的一些概念,它还提升了我在分析思维上的能力。对于大多数人来说,这门课在帮助大家从传统的推理思维转换到数学思维这件事儿上建立了一个很好的范例。除此以外,部分课程会在数学证明上具有一定的挑战性。特别是对于普通的书写语言而言,它会让你有一种在学习一门新语言的感觉。

你不会感到学习“一门新语言”这件事很痛苦,因为批判思维对数据科学尤为重要。另外一个重要的点(或者说减轻痛苦的点)是这门课会帮助你在探索数据科学时扫清一些常见谬误的概念。

Master List of Logical Fallacies 逻辑谬论列表
utminers.utep.edu/omwilliamson/emgl1311utminers.utep.edu

如果在刚开始学习这些内容时把你自己所在一个笼子里几周,出来后你会对很多你听到的、你见到的事情产生质疑。不要忘了目标:数据科学!

Dilbert列举了许多常见谬误

来源:https://tinyurl.com/y97w2oej

编程

总的来说,无论你的职业选择是如何,你都要学会编程。它能在很多方面让你的工作生活变的有趣和方便。

大体来说,对于生活中遇到的一些数据科学问题我们已经渐渐不用再自己去编程,这部分任务已经被 Alteryx, Azure Studio, Dataiku, H2O.ai, or Knime这些平台完成。然而,成为一个通用框架或编程专家在很长一段时间内都能给你在快速高效处理未清洗的机器学习数据时极强的自信。

在最开始时你的问题可能是“我如何把csv文件的内容读取到数据框架里?我如何做一些简单的数据可视化工作?我如何把所有类里的数据值变成数值?我如何处理多列表数据集?”,在过了某个时间节点后,你的问题就变成“这里用loopsin合适吗,还是有其他的方式能达到这个目标,比如vectorised?我如何才能把我的算法简化从而降低所占用的计算机资源?“

一个简单的Seaborn指令可以帮助你建立直观大方的数据可视化视图。(自有资源) |
| Fortunately you have tons of options for dealing with these questions and challenges: more experienced people, Stack Overflow and Quora to name a few.

As per the programming languages, I recommend Python to get started as it is easy to learn, popular among data scientists and has a plethora of libraries such as Pandas, Numpy and Matplotlib to deal with data preparation, wrangling and visualisation and frameworks like Tensorflow which, among many perks, allows you to take advantage of GPU processing without hassle.

Another common path is through R (programming language). Yes, it may take longer for you to jump into “data science” with Python whereas R will lead you to Statistical Modelling without ado. Python, on the other hand, will give you a better general understanding of programming principles commonly used beyond data science and in other programming languages such as C++ or Java. Whichever you choose, you’ll be fine.

Check this course out on Udemy. I didn’t do it, but the content and the price of approximately $6 make it look promising:

The Data Science Course 2018: Complete Data Science Bootcamp
Data scientist is one of the best suited professions to thrive in this century. Digital. Programming-oriented…www.udemy.com

Another great resource is An Introduction to Statistical Learning with Applications in R by Gareth James. The book is publicly available in the link and covers statistical concepts with R programming. It will be challenging if Linear Algebra scares you (you’ll face it right on Page 10). Regardless of the choice you make, you’ll need…

…A prejudice-free review on Maths

I always loved maths, but I wasn’t great at it in high school. I did very well at university with a lot of hard work, but it was only in 2013 — more or less when I decided to pursue a Master’s degree at some point in the future — that I’ve recognised how many fundamental concepts were rusty and feeble. Because I was serious about getting into masters, I decided to study for the GRE and GMAT assuming I knew nothing about maths so that I could get back to the basics without feeling bad about it. Funny part: I started by watching videos of subtraction and addition for kids.

Many months later I moved to the corner stones of data science and machine learning: advanced linear algebra and calculus. Although it’s not required for becoming a data scientist, it’s extremely useful to understand, for example, what a Principal Component Analysis is actually doing. Understanding the maths behind data science can help you a lot in your story-telling process, at the very minimum.

Plot of 2 clusters upon 3 Principal Components (own source). Maths intuition helps in interpreting this

Although pricey, Manhattan Prep online courses and books were crucial in my re-encounter with maths. Nowadays I find it much easier to understand machine learning, statistics and financial analysis. Having solid foundations will definitely make you embrace new concepts faster and more naturally, playing a a key role in productivity and self-esteem.

Check it in the below resources

GRE Just Math | Manhattan Prep
GRE Just Math is your live, comprehensive GRE Math prep course taught by a 99th-percentile GRE Math expert. In one…www.manhattanprep.com

Manhattan Prep GRE Set of 8 Strategy Guides, 4th Edition : Manhattan Prep : 9781937707910
Manhattan Prep GRE Set of 8 Strategy Guides, 4th Edition by Manhattan Prep, 9781937707910, available at Book Depository…www.bookdepository.com

Unless you’re in research or want to implement algorithms yourself, I would say that more advanced linear algebra and calculus are not mandatory, but it worked for me as a way to quickly acquire key concepts, particularly in Statistical Modelling.

People

Data Science may require you to sit down, type what seems to be endless lines of code and do feature-engineering for weeks or months before any insightful output. One can not fully enjoy the beauty of it without being able to explain at least what is being done at a high-level understanding and why it’s done that way. Most importantly, you’ll first need a clear definition of the problem you’re trying to resolve.

Suppose you’re a data scientist working in an e-commerce company who briefed you about the “sudden increase in fraudulent credit card transactions”. You still have no clue about the problem, although you may already know some approaches to take. A rookie mistake would be jumping straight into a classification or clustering model just because the word “fraudulent” has induced you in doing so… Exploratory data analysis could help you in the beginning, but key questions remain unanswered: what to analyse, where to start from, what’s the end goal and how to measure the impact of your work. You can end up figuring out that the original issue was actually a bug in the latest website release, something that you wouldn’t need fancy data science works to figure out.

In fact, Design Thinking is a widely known add-on for dealing with people. Especially for general communication, story-telling, problem-definition and product development. Despite not being directly related to Data Science, its principles can be very useful for Data Scientists working to deliver “data products” requiring interaction with people on a daily basis.

Design Thinking Framework. Source: https://www.nngroup.com/articles/design-thinking/

Evidently, the more experienced one is in a specific domain, the easier it gets to prepare, model and visualise the problem through supporting data and propose solutions to gain productivity or efficiency. Unavoidably, even the most senior data scientists will have to talk to people whichever new challenge they face.

A Virtual Crash Course in Design Thinking
This is an online version of one of our most frequently sought after introductory learning experiences. Using a video…dschool.stanford.edu
| 幸运的是你有很多种选择来解决这些问题与挑战:更多有经验的人,Stack OverflowQuora还有其他一些社区。

对编程语言来说,我推荐Python作为人工智能的入门语言,因为它很容易上手,而且它在数据科学家中很流行,因为它的Pandas,Numpy,Matplotlib等多个库可以帮助他们很好的处理数据准备、数据清洗及可视化等问题,而且它和机器学习框架TensorFlow配合很好,可以让你很轻松的就调用GPU的算力。

另外一条比较通用的路径就是R语言了。相对于Python可能会让你花点时间进入“数据科学”的世界这件事来讲,R会让你立马进入统计模型的世界。相比之下,选择Python除了数据科学这个功能外,它还能帮助你更好的理解编程这件事,让你上手其他语言如C++,Java更快。但是无论你选择R还是选择Python,都是可以的。

看看Udemy的这个课程,我没去学习它,但是就课程内容和价格来讲,看起来还不错,链接如下:

https://www.udemy.com/the-data-science-course-complete-data-science-bootcamp/

另外一个比较好的资源就是Gareth James的 An Introduction to Statistical Learning with Applications in R(R语言在统计学习中的应用)。这本书在链接中就能看到,它的核心内容是如何使用R语言进行统计编程。如果你线性代数没有学的很扎实,这本书会吓到你(你第10页就会遇到)。无论你如何选择,你都需要学习数学。。。

我一直很喜欢数学,但是我在高中的时候学的比较一般。在大学阶段我就学的比较好了,当然是通过不懈的努力(2013年的时候)。差不多这个时候我决定我将在未来的某个时间节点去攻读硕士研究生,然而在学习Gre和GMAT的过程中,我就发现了我一些基础知识薄弱的地方,中间还有个有趣的插曲,就是我是通过看视频学习的,这个视频不是一般的视频,而且给孩子学习数学看的lol。

多年后,我开始接触到了数据科学和机器学习的边角:高级线性代数和计算。虽然说这对于成为一个数据科学家来说并不是一个必须品,但是它对你理解数据很有帮助,比如说一个主成分分析(PCA)倒是在做什么。懂的数据背后的数学可以在讲述故事的时候帮助你很多,当日是在开始的时候。

2个群在3个主成分中的绘图(自有资源)。数学直觉可以很好的帮助你理解这个。

总的来说,Manhattan Prep的线上课程虽然昂贵,但是对我学习数学起到了至关重要的作用。它帮助在今天更容易的理解机器学习,统计学和金融分析。拥有一个牢固的数学基础可以帮助你学习新知识时更快更好,同样它在工作效率和自我认同上也提供了非常重要的作用。

看看下边的资源:

GRE Just Math | Manhattan Prep
GRE Just Math is your live, comprehensive GRE Math prep course taught by a 99th-percentile GRE Math expert. In one…www.manhattanprep.com

Manhattan Prep GRE Set of 8 Strategy Guides, 4th Edition : Manhattan Prep : 9781937707910
Manhattan Prep GRE Set of 8 Strategy Guides, 4th Edition by Manhattan Prep, 9781937707910, available at Book Depository…www.bookdepository.com

除非你想独自研究或者学习算法,高级线性代数与计算并不是必备的,它对我来说是更快理解核心观念的工具,尤其是在统计模型中。

作为一个数据科学可能需要你安静的坐下来,然后敲无止尽的代码,做成天上月的特征工程,最后才能得到一些有意义的数据。在这样的情况下,一个人如果无法能从高认知层面上解释或者理解它在干什么以及它为什么这么干,那么他是无法从中获得乐趣的。当日,最关键的是你需要首先定义清楚你要解决的问题。

假设你是一个电子商务公司的数据科学家,你突然接到报告说“信用卡欺诈交易数量突然间上升了”。虽然说你对于这个问题还没有任何线索,但是你心中已经知道大概可以先从这几个点入手。一个新手会犯的错误是直接去看一个分类或者群模型,只是因为“欺诈”这个字眼影响你这么做。探索性的数据分析可以在最开始的时候帮助你,但是核心问题仍旧无法被回答:应该去分析什么,从哪儿开始,最终的目标是什么,如何来衡量你工作的影响。最终可能你会发现问题是出在版本发行中的一个bug,一个你无须什么数据分析技巧就能解决的问题。

事实上设计思维是一个在People中被广泛认知的方法论。尤其是在基础交流、讲故事、定义问题和产品开发中。尽管设计思维同数据科学没有直接的联系,但是它可以帮助数据科学家同身边的人交流并交付“数据产品”。

Design Thinking Framework(设计思维框架). 来源: https://www.nngroup.com/articles/design-thinking/

总的来说,一个人在某一个方面越专精,越容易通过支持数据定义一个问题,并基于支持数据来快速高效的提出解决方案。不可忽略的是,哪怕是一个资深数据科学家,也要通过同人来交流从而帮助他解决遇到的挑战。

A Virtual Crash Course in Design Thinking
This is an online version of one of our most frequently sought after introductory learning experiences. Using a video…dschool.stanford.edu
|
| Grasping ideas from more experienced people (either in technology or in the business domain) is really useful and a big shortcut in a lot of cases. You can also meet people from all corners to share experiences, attend hackathons and meetups as most major cities around the world will have events of this sort. The key idea is to communicate, learn and share and there’s absolutely no excuse for being isolated.

Heterodox and Orthodox Economics

This resource is especially useful if you’re dealing with consumer data, but less relevant if you’re dealing with data centre monitoring, pharmaceuticals, financial forensics, biomedical data or sensors in a factory. Remember that this article is based on my experience, but the key takeaway here is to get knowledge from a specific domain.

In the simplest way I can put it: Orthodox Economics is concerned with explaining past, present and the future events with a bunch of mainstream models whereas Heterodox adds in interactions of individuals living within society, often bringing subjectivity into the equation.

Take car prices in Brazil, for example. While it’s true that the country struggles with high production costs, absurd taxes and only circa 12% of paved roads, car makers still have high profit margins. Besides, people will tend to flirt with an upper tier or extras when buying a car, sometimes splitting the payment in up to 72 instalments with exorbitant interest rates. There are many objective and subjective reasons related to this: relative perception of public transportation quality, having a nice car as sign of status, people want to feel nice within the car in the heavy traffic… These variables can be estimated by orthodox economists, but not in a straight-forward manner as each individual will have a different perception of value. The challenge is to define a product and price in which the number of buyers and margins are maximised, so understanding what cultural groups value the most versus how macroeconomic factors influence their perceptions can give you further advantage in your analysis. Moreover, understanding how an individual behave is the key to personalisation, a key theme for data science. By the way, per the example above, even interest rates also derive from subjective factors such time preference of borrowers and lenders. | 在很多情况下,理解更有经验的人的观点是非常有用的,同时这也是一种捷径。你可以和来自世界各个角落的人分享观点,参加黑客马拉松和各种聚会,世界上的大多数城市都会有诸如此类的活动。核心是交流,学习和分享。完全没有理由孤立自己。

正统经济学和异端经济学

如果你正在处理消费者数据,这些资源将会特别有用。但是如果你正在处理数据中心监控数据,药品数据,金融监控,工厂传感器上的生物医学数据,这些资源基本用不上。记住,这篇文章是基于我自身的经验,但是这里的要点是获取某一专业领域的知识。

我用简单的方式介绍一下两者的不同:正统经济学着重用一串的主流模型解释过去,现在和未来。然而异端经济学加入了很多的社会中的个人情况,经常在均衡中带入一些主观性。

拿巴西汽车价格举例。虽然巴西仍然在与高物价,荒谬的税收,仅仅只有12%的地方铺有公路等情况做着斗争,但是汽车制造商仍然有很高的利润率。除此之外,当人们在买车的时候,人们更倾向购买高档次的汽车,有时不惜以过高的利率分72期进行偿还。这又很多客观的和主观的原因在里面。相比于公共交通的质量,有一辆好车既是地位的象征,同时在繁忙的交通中人们有一辆车感觉会更好......这些变量可以用正统经济学进行评估,但是当每一个人都有自己的价值感知时,这些变量就不能直接进行评估了。这个挑战是在购买者和利润率都最大化时定义产品和价格。所以理解文化价值和宏观经济因素对感知的影响之间的对抗是重要的,这将对你进行分析更加有利。而且,理解个人行为是个性化的关键,同时这也是数据科学中的一个重要主题。顺便说一句,根据以上的例子,利率同样来自主观的因素,比如借款人和贷款人的时间偏好。 |
| Understanding economics is crucial when dealing with global businesses. Knowing that the macro-economic dynamics can’t be fully addressed by mainstream indicators — such as GDP or surplus — will naturally push you into pursuing alternative, yet compelling, explanations.

I was initiated on these subjects around 2008 due to my curiosity about the financial crisis, but it was only by 2012 that I’d have been exposed to these resources:

Winning At Innovation: The A-to-F Model
Innovation is a responsibility normally assigned to R&D departments but this is not enough. Companies need a systematic…www.amazon.co.uk

Economics
The MIT Press has been a leader in open access book publishing for two decades, beginning in 1995 with the publication…mitpress.mit.edu

Journals
Political Theory It seems to be universal that elected officials are seduced by the fantasy thesis that election to…mises.org

Demand: Creating What People Love Before They Know They Want It
Demand is one of the few economic terms almost everyone knows. Demand drives supply. When demand rises, it stimulates…www.amazon.co.uk
| 理解经济学是处理国际业务的关键。明知道宏观经济动态并不能够彻底地被GDP或者(贸易)顺差这样的主流指标解决,将迫使你自然而然地陷入可选择但强制的辩解中。

出于我个人对金融危机的好奇,我大约在2018年的时候启动了这些课题,但是我直到2012年的时候才发布了以下这些资源:

Winning At Innovation: The A-to-F Model
Innovation is a responsibility normally assigned to R&D departments but this is not enough. Companies need a systematic…www.amazon.co.uk

Economics
The MIT Press has been a leader in open access book publishing for two decades, beginning in 1995 with the publication…mitpress.mit.edu

Journals
Political Theory It seems to be universal that elected officials are seduced by the fantasy thesis that election to…mises.org

Demand: Creating What People Love Before They Know They Want It
Demand is one of the few economic terms almost everyone knows. Demand drives supply. When demand rises, it stimulates…www.amazon.co.uk
|
| ### How does it all come together?

Now suppose you work for an agency responsible for CRM and content management for a major provider of pet products by subscription.

With your understanding on the dynamics of business and people, you’ve coded a program which includes an algorithm able to classify which clients are prone to churn, and identified that the root causes are related to a combination of factors including “how loud the background noise in the call centre is” with the “increasing amount of content about ugly dogs” being published in the company’s app.

You’ve also realised that “call centre loudness” and “dogs’ ugliness” are not a major cause of churn when considered independently. You’ve presented this to the Chief Strategy Officer with the charts you built with Seaborn in Python and you allow make the data available to them so that they can play around with a data visualisation tool. You’ve successfully explained to the key leadership of the organisation how the conclusion was composed, like a maestro conducting an orchestra, but you’re equipped with solid understanding of Maths and Statistical Modelling instead of a conducting baton. You’re confident, your arguments make sense and this results in the leadership’s buy-in as you’re now creating a thriving environment to discuss the real issue without fallacies.

The CRM will address cases for its clients’ customers prone to churn and the content managers will now start publishing more about llamas, which increases add-on sales on holidays seasons, since most of your customers are based in a city with frequent travellers to Peru.

Lastly, you’ve also trained this churn prediction algorithm using Dense-Neural-Networks with Tensorflow running on GPUs to cope with billions of records and features. It has been deployed in a way that all interactions of customers within the app, website, physical stores and call centre are instantly assessed, allowing the system to understand patterns and notify you when it detects high likelihood of churn.

Data Science is more than algorithms. Src: https://xkcd.com/1831/

Sounds crazy, but that’s the point of data science: transforming questions into answers and challenges into big opportunities (it takes, in many cases, several months or even years!) | ### 这些是怎么结合起来的呢?

现在,假设你是在一家负责CRM(客户关系管理)和订阅主要宠物产品供应商的内容管理的机构工作。

以你对商业和人群不断变化的理解,你写了一个包含能够对潜在客户分类,并能识别“呼叫中心背景噪音有多大“与发表在企业APP上的“不断增长的有关丑陋的狗狗内容”的综合因素有关的根本原因的代码。

你还了解到客户中心的响度与“狗狗的丑陋”不是独立思考的客户流失的主要原因。你把这些你用python的Season库并且你允许它使用这些数据并用数据可视化工具活动起来建立起来的图表呈递给你的首席战略官。你已经成功地向组织的主要领导解释了结论是如何形成的,就像指挥管弦乐队的大师,但是你用对数学和统计建模的扎实理解的能力替代了指挥棒。你很自信,你的论点很有道理,这导致了领导层的认可,因为你现在正在创造一个欣欣向荣的环境来讨论真正的问题,而不是谬误。CRM将解决客户容易流失的客户案例,内容经理现在将开始发布有关llamas的更多信息,这将增加节假日的附加销售,因为您的大多数客户都居住在秘鲁一个旅游频繁的城市。

最后,您还使用在GPU上运行的密集神经网络的TensorFlow来训练这种流失预测算法,来处理数十亿条记录和特性。它的部署方式是,即时评估应用程序、网站、实体商店和呼叫中心内客户的所有交互,使系统能够了解模式,并在检测到高流失可能性时通知您。

数据科学不仅仅是算法。资源:https://xkcd.com/1831/

听起来很疯狂,但这就是数据科学的重点:将问题转化为答案,将挑战转化为巨大的机遇(在许多情况下,需要几个月甚至几年时间!) |
| ### General thoughts

Certainly the speed in which you’ll be exposed to new information will outpace your capacity to absorb them. I am still in the process of learning a lot of things I don’t fully understand. It’s true that my degrees and professional experiences have helped me land on this field almost naturally, but the resources above are degree-independent and I can safely say that virtually every career has transferable skills that can be used in Data Science.

Choosing somewhere to begin from can be daunting, particularly with so many available information, but I hope experiences can help you find some resources to get started.

Looking forward

I have a lot a of fun with data science and I’m sure those who enjoy multidisciplinary areas and constant learning will also have. Nowadays I still rely upon some of the resources above to keep myself moving forward. I give extra weight to hearing people’s experiences though.

It’s true that the hype and buzz around it can make a lot of people frustrated and a lot of myths — and oversimplification — appear, but whether you’re getting started or just passing by, I hope these resources are useful to you.

Want to hear more about Data Science and A.I.?

Follow me on twitter and here on medium where I’ll start sharing and posting on these topics more often | ## 总体思路

当然,你接触新信息的速度将超过你吸收新信息的能力。我仍然在学习许多我不完全理解的事情的进程上。诚然,我的学位和专业经验几乎是自然地帮助我在这一领域中立足的,但以上资源是独立的,我可以放心地说,几乎每个职业都有可转移的技能,可以用于数据科学。选择一个开始的地方可能是令人望而生畏的,特别是有这么多可用的信息,但我希望经验可以帮助您找到一些资源开始。

展望未来

我对数据科学有很多乐趣,我相信那些喜欢多学科领域和不断学习的人也会有。现在,我仍然依靠上面的一些资源来继续前进。不过,我更重视倾听人们的经历。

事实上,它周围的炒作和嗡嗡声会让很多人感到沮丧,许多神话-和过于简单化-出现,但无论你是开始或只是路过,我希望这些资源对你有用。

1. 本站提供资源以极具性价比的价格出售,我们的定价远低于市场常见价格。无论是单独购买还是购买永久会员以下载全站资源,我们不提供任何相关技术服务。
2. 若遇到资源下载链接失效,请及时通过联系站长QQ以获取补发。
3. 所有本站资源仅供学习和研究目的使用。用户必须在24小时内删除所下载的资源,并严禁将其用于任何商业活动。对于因违反此规定引发的任何法律问题及连带责任,本站及发布者不承担任何责任。除非特别注明为原创,本站资源大多来源于网络,版权归原作者所有。若有侵权,请联系我们以便进行删除处理。
4. 本站提供的所有下载资源(包括软件等),我们保证未进行任何负面修改(不包括为改善功能或修复bug等正向优化或二次开发)。然而,我们无法保证资源的准确性、安全性和完整性。用户下载后应自行判断。本站旨在促进学习交流,并不保证所有源码完全无误或无bug。用户应明白,除非特别注明,【雾码资源】对提供下载的软件等不持有任何权利,其版权属于相应合法拥有者。
5. 请您仔细阅读以上内容,购买即表示您同意以上所有条款。
雾码资源 » 如果你想学数据科学,这 7 类资源千万不能错过

提供最优质的资源集合

立即查看 了解详情