About

I teach courses related to data analytics and machine learning to MSc/MRes/MBA/PhD students. Previously, I taught at the computer science school, and now I am based in the business school. My teaching and research extensively draw upon the following lists, which might be useful for students seeking guidance in directed studies. Due to my limited expertise, these contents may favour certain fields or topics, potentially overlooking many excellent materials. Your input is greatly appreciated; if you feel something important has been left out, please let me know.

DSML Theory

Probability, Statistics and Linear Algebra

I studied mathematical finance and statistics in postgraduate studies, and then self-studied machine learning since my PhD research. The following are the fundamental mathematics materials which I read and use in my study and teaching. Probability and statistics play a significant role. However, it should be noted that many topics covered by these books are not very popular or widely used in data science and machine learning.

  • Walter Rudin. Principles of Mathematical Analysis, McGraw-Hill, 3rd Edition, 1976.
  • David Freedman. Statistical Models: Theory And Practice, Cambridge University Press, 2nd Edition, 2009.
  • Morris DeGroot and Mark Schervish. Probability and Statistics, Pearson, 4th Edition, 2013.
  • George Casella and Roger Berger. Statistical Inference, 2nd Edition, 2002.
  • Sheldon Ross. Introduction to Probability Models, Academic Press, 10th Edition, 2009.
  • Geoffrey Grimmett, David Stirzaker. Probability and Random Processes, Oxford University Press, 2001.
  • Alexander Mood, Franklin Graybill, Duane Boes. Introduction to the Theory of Statistics, 3rd Edition, McGraw-Hill, 1974.
The following two books provide a good exposition of the essential mathematics of machine learning. I recommend them for those who want to study the theory of machine learning algorithms:

Machine Learning and Data Mining (Introductory Level)

I recommend the following two books for students in business background who do not know what is data science and machine learning and want to get a grasp on the big picture.

  • Pedro Domingos. The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World, Allen Lane, 2015.

The following books are easy to follow. I recommend them for students in business background as the first book to study machine learning and data science. I use several materials from Gareth James's book in my teaching at business schools.

Machine Learning and Data Mining (Intermediate and Advanced Level)

The books in the following list are suitable for undergraduates, postgraduates, PhDs and mature researchers. I widely use materials for my research and teaching from the books by David MacKay, Christopher Bishop, Kevin Murphy, Yaser Abu-Mostafa, Hang Li, and Zhihua Zhou.

DSML Practice

It is very difficult to compare the data programming tools without knowing:
  • What do you plan to do?
  • What is your preference of investment (mainly your time) and reward?
  • Who you do work with and who do you want to present and share your work?

I have used Python, R, and Matlab for years in my research and teaching. The following are my humble experience.

Python

I mainly use Python in my research. It is a high-level, object-oriented, general-purpose programming language. It is easy to learn, quite fast, and with a lot of machine learning packages and a comprehensive range of codes online. I guess the latter two are the main reasons why Python has been extremely successful for machine learning and data analytics today. The following are the introductory Python materials for those who have not used it before.

R

In my research, I like to use R for quick descriptive analytics and visualisation of experimental results. I knew R from S-Plus when I studied statistics courses many years ago. My experience of using R was not pleasant at that time so I switched to Matlab and Mathematica for a couple of years until RStudio and ggplot2 came to me. R was developed mainly for statistical computing but it is expanded to data science and machine learning in recent years. I am a big fan of the R packages developed by Hadley Wickham, which significantly improve my experience of using R so I strongly recommend his following R books:

Other good R books include:

Matlab

Matlab was my favourite tool in my research. It is perhaps one of the most successful commercial software in mathematical programming. It is very powerful; has a user-friendly interface (debugging is easy and the generated figures are editable); is very good at simulating and modelling systems. Matlab has the File Exchange while it is not as popular as the communities of Python and R. The license for Matlab can be costly though many universities and companies purchase Matlab licence each year for students, staff and researchers. The following two books I found very helpful when I used and taught Matlab for data analytics.

  • Wendy Martinez, Angel Martinez. Computational Statistics Handbook with Matlab, 3rd Edition, CRC, 2015.
  • Jaan Kiusalaas. Numerical Methods in Engineering with Matlab, 2nd Edition, Cambridge University Press, 2012.
© Bowei Chen 2025