I would say that my data science learning path was fairly traditional. I did my undergrad in economics and have master’s degrees in global commerce and computer science (concentration in machine learning and artificial intelligence). I learned my business acumen from my coursework during my commerce degree and picked up most of the technical elements from my master’s in CS. I had a data science internship, and I was on my way.
Looking back, there was nothing wrong with my path, but knowing what I do now, what would I change about my learning journey? This question is particularly relevant for people that are new to the field. Many things have changed since I started. Positions are more competitive and there are far more learning options. I would hope that my experience could help others learn data science faster, more completely, and give them better job opportunities.
How do I start my Data Science Career | Insideaiml
I will caveat this article by saying that learning is a little bit different for everyone. My word is not gospel, and there is a good chance that you will find something that works a little bit better for you. Still, I hope this is a good foundation from which to build off. I hope it instills in you the big picture priorities that are relevant when learning this field.
This article focuses more on how to learn than on where to learn (courses, bootcamps, degrees, etc.). I recommend these two articles for specific courses and online resources for learning the field.
When I first started learning data science I was overwhelmed with the size of the field. I had to learn programming languages and concepts from statistics, linear algebra, calculus, etc. When I was confronted with this many options, I didn’t know where to start.
Fortunately for me, I had coursework to guide my studies. The degrees that I did broke down many of the concepts into smaller chunks (classes) so they were digestible. While this worked for me, I find that schools have a one-size-fits-all approach to this. They also include many extraneous classes that you don’t actually need. If I could go back, I could definitely break my data science learning journey into chunks better suited for me.
Before diving into data science, it makes sense to understand the components that are used in the field. Rather than breaking things into “courses”, you can make data science into even smaller and more digestible chunks.
I generally break data science into programming and math.
Regression (linear, multiple linear, ridge, lasso, random forest, svm, etc.)
Classification (naive bayes, knn, decision tree, random forest, svm, etc.)
Clustering (k means, hierarchical)
By breaking data science down into its components, you transform it from being an abstract concept into concrete steps.
Lesson 2: Start somewhere
When I was starting out, I was obsessed with learning things in the “correct” sequence. After entering the field, I found that many data scientists learned their skills in drastically different orders. I met PhD’s that had studied the math first, and only learned the programming concepts after taking a bootcamp. I also met software engineers that were incredible programmers, and learned the math later through self study and application.
I now realize that it is most important to start somewhere, preferably with a topic you are interested in. I found that learning is additive. If you learn one thing, you are not forgoing learning another concept.
If I had to go back, I would start with the concepts that were most interesting for me at the time. Once you learn a single concept, you can build on that knowledge to understand others. For example, if you learn a simple linear regression, a multiple linear regression is a fairly easy step.
Still, I probably wouldn’t jump right in and start with deep learning. It helps to start small and simple and build on that foundation.
Lesson 3: Build Minimum Viable Knowledge (MVK)
Over time, I’ve had a change of opinion about how much foundational knowledge you need. After experiencing many different types of learning myself, I believe that learning by doing real world projects is the most effective way to grasp a field. I think that you should understand just enough of these concepts to be able to start exploring your own projects.
This is where minimum viable knowledge comes into play. You should start by learning just enough to be able to learn through doing. This stage is fairly hard to identify. Generally, you will feel like you aren’t ready when you first get here. This is a good thing though, it means that you are pushing yourself out of your comfort zone.
You can reach this stage fairly easily. I think you can get to this level of knowledge with very introductory online courses, and I generally recommend the micro courses at kaggle.com.
To get to this step, all you really need to understand is the basics of python or R and have a familiarity of the packages used. You can start learning the math later by applying some of the algorithms to real world data.
Lesson 4: Get your hands dirty
With your basic knowledge, I recommend getting into projects as quickly as possible. Again, this sounds scary, but a project is all about how you define it.
At the early stages, a project could be something as simple as experimenting with a for loop. As you progress, you can graduate to projects using data on kaggle, and eventually using data that you have collected.
I am a HUGE believer that the best way to learn data science is to do data science. I think that the theory is VERY important, but no one says that you have to understand it all before you start applying it. Theory is something you can go back to after you have a functional understanding of the algorithms. For me, real world examples were always what made things click. If you start with the real world examples through projects, I think things have a far higher chance of things “clicking” when you start learning the theory.
Projects also have the power to make data science smaller. One of the biggest challenges I see for new learners is that the field of data science can be overwhelming. Confining the things you are learning to the size of a small project allows you to break things down even further than you did in Lesson 1.
Projects offer one additional benefit. They give you immediate feedback on where you need to improve. If you are working on a project and you run into a roadblock about what package, algorithm, or visual to use, you now know that you should probably study that area of the field further.
Lesson 5: Learn from other people’s code
While doing your own projects is great, sometimes you don’t know what you don’t know. I highly recommend going through the code of more experienced data scientists to get ideas about what to learn next and to better understand logic or syntax.
On Kaggle and GitHub there are thousands (maybe millions) of kernels where people have shared the code that they used to analyze datasets. Going through these is a great way to complement your projects.
I recommend making a list of the packages, algorithms, and visuals that you see being used. You should go to the documentation for the packages and expand your knowledge there. They almost always have examples in the docs for how they should be used. Again, this list can be used to help you think of new project ideas and experiments.
Lesson 6: Build algorithms from scratch
This is a rite of passage for most data scientists. After you have applied an algorithm and understand how it works in practice, I recommend trying to code it from scratch. This helps you to better understand the underlying math and other mechanisms that make it work. When doing this, you will undoubtedly have to learn the theory behind it as well.
I personally think that learning in this direction is far more intuitive than trying to master the theory and then apply it. This is the approach that fastai has taken with their free mooc. I highly recommend it if you are interested in deep learning.
For this, I generally recommend starting with linear regression. This will help you to better understand gradient descent, which is an extremely important concept to build on.
As you advance your data science career further, I think theory becomes increasingly important. You bring value by matching the correct algorithm to the problem. The theory associated with the algorithm greatly facilitates this process.
Lesson 7: Never stop learning
The beauty of the data science journey is that it never ends. You will need to keep learning to stay on top of new packages and advancements in the field. I recommend doing this through (you guessed it) more projects. I also recommend continuing with the code review and reading new research that is published.
This is more of a mindset recommendation than anything practical. If you think that there is a pinnacle, you are in for a surprise!