Course of the tract Data Science of MSIAM and MOSIG master of Universite Grenoble Alpes.

## Team

## Contents

- Course
- Introduction to convex optimization: concepts in convex analysis (duality, proximal operators), how to identify potential difficulties in optimization problems. Illustrations in supervised learning (classification and regression problems) and in operation research (decomposition methods
- Algorithms in convex optimization (gradient, proximal gradient, conditional gradient, ADMM)
- Stochastic gradient and Incremental algorithms (SGD, SAGA, SVRG)
- Introduction to distributed computation (architectures for computation, map-reduce scheme, MPI, Spark) + practical work
- Distributed optimisation algorithms, stochastic algorithms, asynchronous methods

- Tutorials
- Incremental Algorithms
- Introduction to Spark
- Sparse logistic regression in high dimension
- Application to a recommendation system

## Grades

The final grade will be a convex combination of the grade on the report on the practical sessions and the grade of the presentation of a recent research article.**Report on the practical sessions.**

We would like you to write a report on the two sessions "sparse regression" and "matrix completion", by groups of 1, 2, or 3 students. The format of the report is free; we expect between 2 and 7 pages, presenting an overview of your work with a focus on a (or several) specific aspect(s).

We do not expect you to give all the answers, question by question. We do not expect either you to cover all the material of the two sessions. On the other hand, you can work out other developments.

You can emphasize any aspect of your work, depending on your personal interests and skills, for instance:- implementation and numerical tests (further developments, more experiments,...)
- applications in learning or statistics (interpretation of results, other models, other datasets...)
- theoretical or mathematical questions (convergence proof of algorithms, convergence rates, advanced versions, theoretical analysis of special case...)

*Friday, Dec 22*at cdo.grenoble@gmail.com.

The quality of presentation and of the analysis will obviously matters for the grade.

**Presentation of research articles.**

We would like you to present an article by groups of 1, 2, or 3 students. The article has to be chosen in a list that we will given*Monday, Dec. 18*. The list contains various articles around the topics of the course: some are more theoretical, some are more algorithmic, others deal with applications. The presentation will be short:*8 mins + around 5 mins of questions*. In this short time, you can present an overview of the article or put an emphasis on a specific aspect that you find interesting. The slides (in pdf) will be projected from our machine (if you want to present an implementation or a script run, you should prepare slides on it). The presentation will be in January. The presentation slides should be sent the day before the defense.

## Labs and material

All

*Labs*are in room F-203 at the UFR IM2AG.

- Lab 1: Incremental Algorithms
- Notebook: 01_Incremental_Algs
- [Correction]

- Lab 2: Introduction to Spark
- Notebook: 02_Intro_Spark

- Lab 3: Machine Learning with Spark
- Notebook: 03_ML_with_Spark
- [Report]

- Lab 4: Optimization for Machine Learning with Spark
- Notebook: 04_Opt_ML_with_Spark
- [Report]

## Setup

**[Recommended] On your machine:**Install

`Docker`(community edition)

- Check your install by running

`docker run hello-world`

- Create a folder in which you will work for the whole course
- Launch the image used in the tutorials

`docker run -v`

*absolute_path_to_folder*:/home/jovyan/work -it --rm -p 8888:8888 -p 4040:4040 jupyter/pyspark-notebook

where*absolute_path_to_folder*is the absolute path to the folder you created.

The first time your un this command, the image will be pull which requires some time for download (approx. 2Go) - Download and Extract the datasets and notebooks in this folder. Open you browser at localhost:8888 to open Jupyter, you should see the downloaded notebooks, and be able to modify and code inside them

*(We are not doing any support on Windows or Mac, use the doc and Google!)*

**in Classrooms:**

- Edit your bashrc
`~/.bashrc`(e.g. with`nano ~./bashrc`) and add the following lines:

`export SPARK_HOME=/opt/spark`

export PATH=$SPARK_HOME/bin:$PATH

export PYSPARK_DRIVER_PYTHON=jupyter

export PYSPARK_DRIVER_PYTHON_OPTS='notebook'

export PYSPARK_PYTHON=python3

- Close it and reload it using
`source ~./bashrc` - Run
`jupyter notebook`in a terminal from the folder where you decompressed the notebooks

## Reports

- We would like you to write a report on Labs 3 and 4 by groups of 1, 2, or 3 students. The format of the report is free; we expect between 2 and 7 pages, presenting an overview of your work with a focus on a (or several) specific aspect(s).

We do not expect you to give answers to the questions nor to cover all the material of the two sessions, but rather work out other developments.

You can emphasize any aspect of your work, depending on your personal interests and skills, for instance:- implementation and numerical tests (further developments, more experiments,...)
- applications in learning or statistics (interpretation of results, other models, other datasets...)
- theoretical or mathematical questions (convergence proof of algorithms, convergence rates, advanced versions, theoretical analysis of special case...)

*Friday, Dec 22*at cdo.grenoble@gmail.com.

The quality of presentation and of the analysis will obviously matters for the grade.

## Article presentation

**Link to the article list:**here**Presentation of research articles.**

We would like you to present an article by groups of 1, 2, or 3 students. The list contains various articles around the topics of the course: some are more theoretical, some are more algorithmic, others deal with applications. The presentation will be short:*8 mins + around 3 mins of questions*. In this short time, you can present an overview of the article or put an emphasis on a specific aspect that you find interesting. The slides (in pdf) will be projected from our machine (if you want to present an implementation or a script run, you should prepare slides on it). The presentation will be January 15th. The presentation slides should be sent the day before at cdo.grenoble@gmail.com.

## Groups and Sessions

Below are the groups and assigned papers. If you are not in the list, contact us. The presentations are split into two consecutive sessions A and B, you should assist to all of the presentations of your session.The presentations are Monday, January 15th in

**Amphi D**. Session A is 9:45 -- 11:15 and Group B is 11:20 -- 13:00.

The timetable is here. You have to be present for the full session you are scheduled in.

**The schedule is very tight, be there in advance!**

Group # |
Students |
Article |
Session |

1 |
HuarteSalazar Ricardo | 11 | A |

CerqueiraPonte Joel | |||

Su Aimin | |||

2 |
Audrey Cimadomo | 7 | A |

Erick Santillan | |||

Brendan Guérot | |||

3 |
Rhalimi Mouna | 17 | A |

Oussama Zerguine | |||

4 |
Amela FEJZA | 8 | A |

Jaime ROMERO | |||

Quoc-Trung VUONG | |||

5 |
Amir Asarbaev | 13 | A |

Michal Lewandowski | |||

mariia garkavenko | |||

6 |
NGUYEN Minh Kha | 13 | B |

Asif Mujtaba | |||

Khan Shaheer Ahmad | |||

7 |
Marvin Lerousseau | 7 | B |

Clément Perny | |||

Thomas Gerspacher | |||

8 |
PHAM Tuan Hiep | 15 | A |

PHAN Ly Huynh | |||

9 |
Anastasiia Doinychko | 12 | B |

Aleksandra Malkova | |||

Artem Betlei | |||

10 |
Camargo Manuel | 6 | B |

Bresson Roman | |||

11 |
Danilo Cazzolla | 11 | B |

Moreno La Quatra | |||

12 |
Feofanov Vasilii | 15 | B |

Emelianov Vitalii | |||

Vladimirova Mariia | |||

13 |
LUU Duc-Anh | 6 | A |

HOANG Van-Quy | |||

14 |
Nieves CRASTO | 3 | B |

Michel ARACTINGI | |||

Khoder JNEID | |||

15 |
Ivan Iudintsev | 12 | A |

Ivan Koval | |||

Anastasiia Kruk | |||

16 |
BOROVEC Ondrej | 8 | B |

LE Van Tuan | |||

17 |
Nadia Yende | 18 | B |

Mamady Nabe | |||