Below are described the proposed projects, you are also free to setup your own. The description of the projects is voluntarily vague and initiative is expected. You are expected to look for information, tips, etc.
For all projects, you have to (and will be graded on these points):
Gather and preprocess data (Python code or notebook)
Extract information by data analysis or machine learning (Python code or notebook)
Generate visuals illustrating your findings (plots)
Present these results (notebook or slides)
In practice:
Groups of 3/4 students
Preparation in class
Presentation (5 min + 5 min questions)
Notebook/code handout
Several groups can take the same project (try not to overlap/collaborate)
Tree-based classifiers are classification procedures that determine a class by a succession of tests. For that reason, it is widely used in the industry. However, it raises a number of questions in terms of learning performance. Scikit-learn's documentation is well done for this problem.
Example of goals:
Investigate a tree-based classifier on the iris dataset then on bigger multi-class decision problems.
Produce images of the obtained decision trees.
Produce a tutorial notebook on tuning these classifiers.
Classifying into more than two classes may be way more involved than in the binary case. Worse, the imbalance between the number of examples in each class may become a serious problem (you can find examples of datasets here and here). Compare different strategies (like One versus All, Tree based approaches, etc.) and ivestigate different scoring metrics.
Example of goals:
Study the different scores on multiclass classification.
Compare different methods for different merits.
Explain you findings in a notebook.
In the last years, deep learning methods have become more and more popular especially as they reached mind-blowing precision on machine learning tasks such as image classification. Most frameworks for neural networks are interfaced with Python, the most popular being Google's TensorFlow and Keras. A typical good case for neural networks is image recognition.
Example of goals:
Install TensorFlow/Keras.
Generate a code for learning some classifier using deep learning architectures.
Produce a tutorial notebook on how to do that.
Take a Kaggle problem you find interesting and try to reach a good score. For instance, taxi fare prediction, box office, no show in appointments are good choices, with lots of discussion on the data. However, you can choose your own.
Example of goals:
Investigate the problem and the evaluation procedure.
Produce your own solution and try to improve your score.
Produce a Kernel showing what and how your reached that precision.
We have seen matplotlib or seaborn, but many other exist such as plotnine/ggplot or bokeh It is also possible to make user interfaces, using eg. tkinter (see for instance here)
Example of goals:
Compare the different libraries in terms of ease of use and possibilities.
Generate plots comparing visually several datasets.
Write a notebook to explain to your fellow students how to use these libraries.