Universitat Internacional de Catalunya
MÓDULO 5: Tecnologías y Arquitecturas Big Data
Other languages of instruction: Catalan, English
Teaching staff
Introduction
In the Advanced Analytics and Big Data Science program, underlying digital technology plays a primary role, complementary to the knowledge that students are expected to acquire, centered on estimator modeling. It is not the purpose of providing in-depth knowledge about technological aspects, but rather to provide students with the necessary sufficiency to lead, with solvency, the aspects used in the technological adoptions that are lavished. Technologies such as the "cloud", "edge computing", GPUs, distributed processing and storage are intrinsic to Big Data, which tries to provide a technological solution to the field of Advanced Analytics.
For this reason, the module aims to provide the foundations and to settle in the students to understand the implications with which they will have to work in their future performances.
Pre-course requirements
Essential basic computer skills
Objectives
- Understand the digital technologies involved in Advanced Analytics
- Understand the premises that technologies entail
- Associate the different phases of a project with technological infrastructure solutions
- Have the knowledge to build automated pipelines
- Assess the cost of technological resources
Learning outcomes of the subject
- The student will be able to understand and be able to apply the underlying technologies for the practice of Advanced Analytics
- The student will be able understand the implication of technology in the deployment of predictive models elaborated in the laboratory and production environment.
- The student will be able to associate business problems with an architecture solution based on the type of data, the models to be used, the availability of new information and the inference requirements.
Syllabus
Arquitectura BIg Data y Cloud,
- introducción al Big Data y Cloud
- Datacenters
- Agile Analytics y Cloud
- Fases de la metodología Analítica
- 2020 Data and AI Landscape
Bases de datos (SQL, NoSQL, Documentales, clave-valor y Graph), teoría, prácticas y casos de aplicación
- NoSoloSQL
- MongoDB
- Noo4j
- Prácticas con lab de python y MongoDB
Recursos Cloud (Servidores, Microservicios, Colas, Bases de datos, ML, Gráficos y otros servicios), teoría, prácticas y casos de aplicación
- Introducción a los servicios cloud
- Servidores virtualizados
- Concepto de microservicios
- Colas
- Bases de datos en Cloud
- Almacenamiento y Data Lakes
- Prácticas con labs de storage, bases de datos, microservicios y colas
Procesamiento distribuido (Hadoop y Spark) herramientas open source y cloud , teoría, prácticas y casos de aplicación
- Map Reduce
- Hadoop
- Spark
- Prácticas con labs de Hadoop y Spark con python
Procesamiento batch, tiempo real y stream, teoría, prácticas y casos de aplicación
- Tipos de procesamiento: tiempo real, batch y stream
- Spark Streaming
- Prácticas con labs de Spark Streaming
Herramientas para ML, teoría, prácticas y casos de aplicación
- Spark MLlib
- Prácticas de ML y AutoML en Cloud
Teaching and learning activities
In person
- Presentation with concepts and theory
- For each topic, labs, tutorials, individual self-learning practices will be carried out, experimenting with the technology in question, with the support of the student community and the teacher.
- A dozen cases of real application will be proposed where the search for a technological solution for architecture will be worked together, through group analysis of specific customer cases for a participatory resolution of the students
Evaluation systems and criteria
In person
- Resolution of an architecture for a specific customer case
- Individual Labs: A dozen of Labs will be proposed, self-learning, some compulsory and others optional, but highly recommended, combining architecture with other knowledge acquired during the master
Bibliography and resources
Several readings of papers and articles related to the different points discussed will be proposed, combining them with other topics of the master.
- G. Linden, B. Smith and J. York, "Amazon.com recommendations: item-to-item collaborative filtering," in IEEE Internet Computing, vol. 7, no. 1, pp. 76-80, Jan.-Feb. 2003, doi: 10.1109/MIC.2003.1167344.
- Overview of Amazon Web Services, AWS, August 2020
- J Dean, S Ghemawat , MapReduce: simplified data processing on large clusters, Communications of the ACM, 2008
- Matt Turck, 2020 Data and AI Landscape, FirstMark
- Liu, Guimei & Nguyen, Tam & Zhao, Gang & Zha, Wei & Yang, Jianbo & Cao, Jianneng & Wu, Min & Zhao, Peilin & Chen, Wei. (2016). Repeat Buyer Prediction for E-Commerce. 155-164. 10.1145/2939672.2939674.