Universitat Internacional de Catalunya

MÓDULO 4: Machine Learning

MÓDULO 4: Machine Learning
10
13947
1
Second semester
OB
Main language of instruction: Spanish

Other languages of instruction: Catalan, English

Teaching staff

Introduction

In the last 10 years a whole series of advanced algorithms, which include what is known as Machine Learning (ML) or Automatic Learning and almost unknown until now by the current company, have been made available. These algorithms have been gathered under a theoretical-practical framework and, through open-source languages, they can be accessed, serialized and implemented both in apps, in business decision models and in the automation of tasks in an uncertain environment. .

In this course, the two best-known branches of Machine Learning are addressed, which are those that encompass supervised or unsupervised cutting algorithms, although a slight theoretical basis is offered in each case. However, the course is focused on the student acquiring knowledge of the different type algorithms (ML), which are available under R, Python, etc. environments.

The challenges that the student will encounter, in the use of this new paradigm, together, of course, with the derived advantages, makes its use being one of them the powerful improvements in predictivity, compared to the classical statistics that in general It was applied before 2010.

Finally, the door opens to more elaborate techniques that would be continued in later modules such as those based on Artificial Intelligence (AI).

Pre-course requirements

Knowledge of basic computer science and a basic command of both R and Python

Basic knowledge of statistics, especially regarding the classical models of Linear Regression, ANOVA, Decision Trees, ...

Knowledge of basic unsupervised techniques such as the kmeans, Ward algorithm, etc.

Objectives

  • ML concept and difference from previous classic models

  • ML concept and alignment with business goals

  • ML concept and relationship with information sources: Putting into Production & Automatic Re-training

  • Concept of Serialization of ML Models

  • Concept and generation Analytical Framework

Learning outcomes of the subject

The student will be able to coordinate a group of 3-4 members for the generation of an automatic application in * Shiny that showed the completed cycle in a statistical modeling project. It is hoped that I will advise a domain in the fundamental algorithms of the ML and that it acquired and intuited in the quests to solve the different problems that are posed at the business level offering analytical support for the decision-making process of tall tactical and strategic, mitigating the automation and the system of processes that do not respond quickly and effectively to the questions that are posed in an objective way and based on the exploitation of the available information.

Syllabus

A series of contents is proposed that covers the techniques created and developed from the 90s and popularized from the year 2010 under the concept of Machine Learning. With these contents, it is not only intended to offer an enumeration of statistical-mathematical algorithms, but it is also about giving coherence at all times between said algorithms and the methodological keys for their construction and application.

Topic 1: Association Rules:
1.1. Introduction to Unsupervised Learning
1.2. Cluster Analysis
1.3. Basket Market Analysis
1.4. Bayesian Networks

Topic 2: Linear Models:
2.1. Linear Regression Models
2.2. Extensions of the Linear Regression Model
2.3. Instrumental Variable Models
2.4. From Classic Statistics to Current Machine Learning Models
2.5. Lasso and Ridge regressions
2.6. Shiny I Model Implementation

Topic 3: Linear and Nonlinear Classifiers:
3.1. Linear and Quadratic Discriminant Analysis
3.2. Logistic regression
3.3. Ridge Logistics Models - Lasso
3.4. The KNN Model
3.5. Support Vector Machine
3.6. Shiny model implementation

Topic 4: Automation of Time Series Models:
4.1. Introduction to time series models
4.2. ARIMA models
4.3. Financial Analysis and introduction to ARCH and VAR models
4.4. Back-fed models on time series data
4.5. ML Models in Time Series
4.6. Prophet Model

Topic 5: Decision Tree, Random Forest and ensemble methods
5.1. Regression and decision trees
5.2. Random Forest Techniques: Bagging, Boosting, Random Forest, XGBOOST
5.3. Model Mixture: Weighting, Voting Classifier and Stacking
5.4. Interpretability of ML Models: LIME, SHAP, XEMP

Theme 6: Neural Network
6.1. Unsupervised Learning in Neural Networks: Kohonen Networks
6.2. Supervised Learning: Simple and Multilayer Perceptrons
6.3. Advanced Models of Neural Networks: Introduction to Deep Learning
6.4. Neural Networks and Time Series: Introduction to LTSM Models

Teaching and learning activities

In person



Throughout the module, a Continuous Assessment methodology is followed where both various individual activities and various group work will be combined whose general assessment criteria are described in the following section
The subject is taught based on "pills" of brief theoretical explanations with discussion of the statistical technique and an applied example of the concepts discussed above is immediately developed.
Attendance, participation and discussion in class will be valued
The usual programming environments in the company will be used with free data and GNU-type licensed statistical software that students will be able to download on their PC to follow the practices in a more personalized way and at their own pace.
The student is encouraged to use R and Python interchangeably to solve practical problems of the subject.

Evaluation systems and criteria

In person



The blocks to be evaluated and evaluated throughout the module will be 3, where each one is evaluated on a scale from 1 to 10, which are subsequently averaged in a weighted way as described below:

-Individual tests of 30 to 40 questions that will be carried out in a time of 45-60 minutes on what is taught every 2 modules. There will be 2 multiple choice exams (sessions 1 and 2 and sessions 3 and 4). Each question consists of 4 questions with only 1 correct and whose total sum will go on a scale from 1 to 10. The questions answered incorrectly will subtract ¼ of their value. The total weighting of the tests will be 30% of the total

-Group work: Different group work is detailed and will be announced at the end of the second session. These works will be mini-projects very focused on the use of the techniques developed, scoring above all: teamwork, originality on the given lines carried out by the group and the versatility of the deliverables (all these guidelines will be more detailed in the description of said works). The delivery of the work will be at most 4 weeks from the end of the module. The total weight of the work will be 60% of the total. This grade would affect all the students that make up the group


-Collaboration and participation in class: The behavior and especially the participation in the module by the students is evaluated. The posing of questions and questioning at all times what is being received in class will be considered very positively as long as it is carried out in a suitable constructive environment. The total weight in the evaluation will be 10%

The approval of the module is obtained if after the weighted assessment the value of 5 is passed and the students will be informed after the deadline for the delivery of the group work.

Bibliography and resources

Tema 1:
Hahsler, M; Grün, B; Hornik, K (2005) Arules – A Computational Environment for Mining Association Rules and Frequent Item Sets Journal of Statistical Software, October 2005, Vol 14, Issue 15
Grimmett, G; Stirzaker, D (2004) Probability and Random Process 3ed Oxford University Press ISBN 0-19-857223-9
Korl, K; Nichols, A (2011) Bayesian Artificial Intelligence 2ed Ed. CRC Press ISBN 978-1-4398-1591-5
Scutari, M (2010) Learning Bayesian Networks with the bnlearn R Package Journal of Statistical Software, July 2010, Vol 35, Issue 3

Tema 2:
Matilla García, M; Pérez Pascual, P.; Sanz Carnero, B. (2013) Econometría y Predicción Ed. UNED ISBN 9788448183103
Gareth, J.; Witten, D.; Hastie, T. y Tibshirani R. (2013) An Introduction to Statistical Learning with Applications in R Springer Science + Business Media New York ISBN 978-1-4614-7137-0
Grimmett, G; Stirzaker, D (2004) Probability and Random Process 3ed Oxford University Press ISBN 0-19-857223-9
Müller, A. C.; Guido, S (2017) Introduction to Machine Learning with Python Ed O’Relly ISBN 978144936415

Munzert, S.; Rubba, C.; Meissner, P.; Nyhuis, D. (2015) Automated Data Collection with R John Wiley & Sons, Ltd ISBN 9781118834817
Korl, K; Nichols, A (2011) Bayesian Artificial Intelligence 2ed Ed. CRC Press ISBN 978-1-4398-1591-5
Scutari, M (2010) Learning Bayesian Networks with the bnlearn R Package Journal of Statistical Software, July 2010, Vol 35, Issue 3
Tennenbaum, J; Director, B (2005) How Gauss Determined The Orbit of Ceres Journal of Statistical Software, October 2005, Vol 14, Issue 15

Tema 3:
Carmona Suárez J. (2014) Tutorial sobre Máquinas de Vectores Soporte (SVM) UNED http://www.ia.uned.es/~ejcarmona/publicaciones/[2013-Carmona]%20SVM.pdf
Cortes, C.; Vapnik, V. (1995) Support-vector networks. Machine Learning, 20(3), 273-297
Gareth, J.; Witten, D.; Hastie, T. y Tibshirani R. (2013) An Introduction to Statistical Learning with Applications in R Springer Science + Business Media New York ISBN 978-1-4614-7137-0
Fisher, R. A. (1936) The Use of Multiple Measurements in Taxonomic Problems Annals of Eugenics. 7 (2): 179-188

James G.; Witten D.; Hastie T. Tibshirani R. (2013) An Introduction to Statistical Learning Springer ISBN 978-1-4614-7137-02

Müller, A. C.; Guido, S (2017) Introduction to Machine Learning with Python Ed O’Relly ISBN 978144936415


Tema 4:
Matilla García, M; Pérez Pascual, P.; Sanz Carnero, B. (2013) Econometría y Predicción Ed. UNED ISBN 9788448183103
Cowpertwait, P. S. P.; Metcalfe, A. V. (2009) Introductory Time Series with R Springer ISBN 978-0387-88697-5

Tema 5:
Benjamin H.; Mayr A.; Robinzonov N. Schmidt M. (2012) Model-based Boosting in R A Hands-on Tutorial Using the R Package mboost Technical Report Number 120, 2012 Department of Statistics University of Munich

James G.; Witten D.; Hastie T. Tibshirani R. (2013) An Introduction to Statistical Learning Springer ISBN 978-1-4614-7137-02

Müller, A. C.; Guido, S (2017) Introduction to Machine Learning with Python Ed O’Relly ISBN 978144936415


Tema 6:
Bonifacio, M; Sanz Molina, A. (2001) Redes Neuronales y Sistemas Borrosos Ed. RAMA ISBN 84-7897-466-0
Chollet, F; Allaire J. J. (2018) Deep Learning with R Ed. Manning ISBN 9781617295546
Gareth, J.; Witten, D.; Hastie, T. y Tibshirani R. (2013) An Introduction to Statistical Learning with Applications in R Springer Science + Business Media New York ISBN 978-1-4614-7137-0
Hastie, T.; Tibshirani, R.; Friedman, J. (2008) The Elements of Statistical Learning. Data Mining, Inference, and Prediction Springer
Terence, L. F. (1999) Feedforward Neural Network Methodology Springer-Verlag New York, Berlín, Heidelberg ISBN 0-387-98745-2
Wehrens, R.; M. C. Buydens, L. (2007) Self and Super-organizing Maps in R: The Kohonen Package. Journal of Statistical Software Oct 2007, Vol 21, Issue 5