About
I teach courses related to data analytics and machine learning to MSc/MRes/MBA/PhD students. Previously, I taught at the computer science school, and now I am based in the business school. My teaching and research extensively draw upon the following lists, which might be useful for students seeking guidance in directed studies. Due to my limited expertise, these contents may favour certain fields or topics, potentially overlooking many excellent materials. Your input is greatly appreciated; if you feel something important has been left out, please let me know.
DSML Theory
Probability, Statistics and Linear Algebra
I studied mathematical finance and statistics in postgraduate studies, and then self-studied machine
learning since my PhD research. The following are the fundamental mathematics materials which I read
and use in my study and teaching. Probability and statistics play a significant role. However, it
should be noted that many topics covered by these books are not very popular or widely
used in data science and machine learning.
- Walter Rudin. Principles of Mathematical Analysis, McGraw-Hill, 3rd Edition, 1976.
- David Freedman. Statistical Models: Theory And Practice, Cambridge University Press, 2nd
Edition, 2009.
- Morris DeGroot and Mark Schervish. Probability and Statistics, Pearson, 4th Edition, 2013.
- George Casella and Roger Berger. Statistical Inference, 2nd Edition, 2002.
- Sheldon Ross. Introduction to Probability Models, Academic Press, 10th Edition, 2009.
- Geoffrey Grimmett, David Stirzaker. Probability and Random Processes, Oxford University Press,
2001.
- Alexander Mood, Franklin Graybill, Duane Boes. Introduction to the Theory of Statistics, 3rd
Edition, McGraw-Hill, 1974.
The following two books provide a good exposition of the essential mathematics of machine learning. I
recommend them for those who want to study the theory of machine learning algorithms:
Machine Learning and Data Mining (Introductory Level)
I recommend the following two books for students in business background who do not know what is data
science and machine learning and want to get a grasp on the big picture.
- Pedro Domingos. The Master Algorithm: How the Quest for the Ultimate Learning Machine Will
Remake Our World, Allen Lane, 2015.
The following books are easy to follow. I recommend them for students in business background as the
first book to study machine learning and data science. I use several materials from Gareth James's
book in my teaching at business schools.
-
Jiawei Han, Micheline Kamber, Jian Pei. Data Mining: Concepts and Techniques, Morgan Kaufmann,
3rd Edition, 2011.
-
Nong Ye. Data Mining: Theories, Algorithms, and Examples, CRC, 2014.
-
Sandro Skansi. Introduction to Deep Learning From Logical Calculus to Artificial Intelligence,
Springer, 2018.
-
Gareth James, Daniela Witten, Trevor
Hastie, Robert Tibshirani. An Introduction to Statistical
Learning: with Applications in R, Springer, 2013.
Machine Learning and Data Mining (Intermediate and Advanced Level)
The books in the following list are suitable for undergraduates, postgraduates,
PhDs and mature researchers. I widely use materials for my research and teaching from the books by
David MacKay, Christopher Bishop, Kevin Murphy, Yaser Abu-Mostafa, Hang Li, and Zhihua Zhou.
-
David MacKay. Information
Theory, Inference and Learning Algorithms, Cambridge University
Press, 2013.
-
Charu Aggarwal. Data Mining: The Textbook, CRC, 2015.
-
Mohammed Zaki and Wagner Meira. Data Mining and Analysis: Fundamental Concepts and Algorithms,
Cambridge University Press, 2014.
-
Trevor Hastie, Robert
Tibshirani, Jerome Friedman. The Elements of Statistical Learning: Data
Mining, Inference, and Prediction, 2nd Edition, Springer, 2011.
-
Simon Rogers and Mark Girolami. A First Course in Machine Learning, CRC, 2nd Edition, 2016.
-
Christopher
Bishop. Pattern Recognition and Machine Learning, Springer, 2007.
-
Kevin Murphy.
Probabilistic Machine Learning: An Introduction, MIT Press, 2022.
-
Kevin Murphy.
Probabilistic Machine Learning: Advanced Topics, MIT Press, 2023.
-
David
Barber. Bayesian Reasoning and Machine Learning, Cambridge University Press, 2012.
-
Mehryar Mohri, Afshin Rostamizadeh, Ameet Talwalkar. Foundations of Machine Learning, MIT
Press, 2nd Edition, 2018.
-
Ethem Alpaydin. Introduction to Machine Learning, MIT Press, 3rd Edition, 2014.
-
Yaser Abu-Mostafa, Malik Magdon-Ismail, Hsuan-Tien Lin. Learning From Data, 2012
[[Link](http://amlbook.com/)]
-
Hang Li. 统计学习方法,
第二版,清华大学出版社, 2019.
-
Zhihua Zhou.
机器学习, 清华大学出版社, 2016.
-
Ian Goodfellow, Yoshua
Bengio, Aaron Courville. Deep Learning, MIT Press, 2016.
-
Zhihua Zhou. Ensemble Methods: Foundations and Algorithms, CRC, 2012.
-
Richard Sutton, Andrew Barto, Francis Bach. Reinforcement Learning: An Introduction, MIT
Press, 2nd Edition, 2018.
-
Carl Rasmussen and Christopher Williams. Gaussian Processes for Machine Learning, MIT Press,
2006.
DSML Practice
It is very difficult to compare the data programming tools without knowing:
- What do you plan to do?
-
What is your preference of investment (mainly your time) and reward?
-
Who you do work with and who do you want to present and share your work?
I have used Python, R, and Matlab for years in my research and teaching. The following are my humble experience.
Python
I mainly use Python in my research. It is a
high-level, object-oriented, general-purpose programming language. It is easy to learn, quite fast, and with a lot of machine learning packages and a comprehensive range of codes online. I guess the latter two are the main reasons why Python has been extremely successful for machine learning and data analytics today. The following are the introductory Python materials for those who have not used it before.
-
Wes McKinney. Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython,
O'Reilly, 2012.
-
Peter Harrington. Machine Learning in Action, Manning Publishing, 2012.
-
Jake VanderPlas. Python Data Science Handbook Essential Tools for Working with Data, O'Reilly,
2016.
-
Sebastian Raschka. Python Machine Learning, Packt Publishing, 2015.
-
Sheppard. Introduction
to Python for
Econometrics, Statistics and Data Analysis,
University of Oxford Lecture Notes, 2014.
R
In my research, I like to use R for quick
descriptive analytics and visualisation of experimental results. I knew R from S-Plus when I studied statistics courses many years ago. My experience of using R was not pleasant at that time so I switched to Matlab and Mathematica for a couple of years until RStudio and ggplot2 came to me. R was developed mainly for statistical computing but it is expanded to data science and machine learning in recent years. I am a big fan of the R packages developed by Hadley Wickham, which significantly improve my experience of using R so I strongly recommend his following R books:
Other good R books include:
-
Bernd Bischl, Raphael Sonabend, Lars Kotthoff, Michel Lang. Applied Machine Learning Using mlr3 in R, CRC Press, 2024.
-
Julia Silge, David Robinson. Text
Mining with R, O'Reilly, 2017.
-
Robert Kabacoff. R in Action: Data Analysis and Graphics with R, Manning Publications, 2015.
-
W. John Braun, Duncan J. Murdoch. A First Course in Statistical Programming with R, Cambridge
University Press, 3rd Edition, 2021.
-
Hefin I. Rhys. Machine Learning with R, the tidyverse and mlr. Manning, 2021.
-
黄天元. R语言数据高效处理指南, 北京大学出版社, 2019.
-
黄天元. 机器学习全解R语言版, 人民邮电出版社, 2024.
-
Deborah Nolan, Duncan Lang. Data Science in R: A Case Studies Approach to Computational
Reasoning and Problem Solving, CRC, 2015.
Matlab
Matlab was my favourite tool in my research. It is perhaps one of the most successful commercial software in mathematical programming. It is very powerful; has a user-friendly interface (debugging is easy
and the generated figures are editable); is very good at simulating and modelling systems. Matlab
has the File
Exchange while it is not as popular as the communities of Python and R. The
license for Matlab can be costly though many universities and companies purchase Matlab licence each
year for students, staff and researchers. The following two books I found very helpful when I used
and taught Matlab for data analytics.
-
Wendy Martinez, Angel Martinez. Computational Statistics Handbook with Matlab, 3rd Edition, CRC,
2015.
-
Jaan Kiusalaas. Numerical Methods in Engineering with Matlab, 2nd Edition, Cambridge University
Press, 2012.