This is the main plan for the master course. Please find the current edition detailed plans at the Faculty Information System.
The following list of course units may not be complete. Please find the current edition detailed information about the course units at the faculty website.
Students will obtain a global perspective of the different steps of a typical Data Science project. For each of these steps some of the main techniques and methods will be presented, with further details left for other more specific UCs. The current UC will allow students to better frame the different topics addressed by the other UCs, giving them a general perspective of how they fit within the Data Science area and of their importance in the context of specific Data Science projects. The unit will also provide concrete examples of the application of the main Data Science techniques, thus immediately providing a case-base approach to learning Data Science.
Introduction. Time series data and their characteristics. Measures of dependence: autocorrelation and cross-correlation. Stationary time series. Estimation of correlation. Use of R for time series analysis. Time series decomposition and exponential smoothing. Exploratory data analysis. Estimation of trend, cycle and seasonal components. Loess, STL and “Bureau of the Census” decompositions. Moving averages, exponential smoothing. Forecasting. Time series models. ARMA models. Estimation and forecasting. Integrated ARIMA models for nonstationary data. Multiplicative Seasonal ARIMA models. Forecasting. Box-Jenkins methodology: building SARIMA models- identification, estimation and diagnostic. Model selection. Unit root tests. Forecasting. Visualizing and forecasting big time series data. representation of many time series. Summarization of main characteristics. Automatic model selection. Automatic forecasting.
(This course should be taken by students with a CS background)
At the end of the course, the students are expected to:
(This course should be taken by students with a non-CS background)
Students should learn the main concepts of programming, illustrated through the two programming languages most used for data analysis (R and Python). Major data structures, object-oriented programming concepts as well as some essential search algorithms will be introduced.Students will be introduced to key topics related to relational databases. Basic concepts of data modeling will be taught using the EER model. They will be taught basic SQL concepts.
Deployment of cloud-based infrastructures for big data applications. Programming big data applications using cloud programming models. Understanding of core problems and algorithms in big data applications. Hands-on practice with state-of-the-art tools for cloud computing.
Students should be aware of the algorithmic fundamentals of machine learning. They should be able to select the appropriate algorithms for each problem and apply the algorithms to new datasets and understand and evaluate their results. Linear models: least squares, shrinkage (Lasso);Nearest neighbours; Statistical decision theory; Bias-variance tradeoff; Mixture models; Evaluation: Cross validation and bootstrap; measures; using statistical testing; Maximum Likelihood; Expectation-Maximization and Gibbs sampling; MCMC; Boosting and Bagging; Neural Networks, auto-encoders and deep learning; Kernel methods and SVM; Embeddings, matrix factorization and gradient descent.
The purpose of this course is to provide students with: A global vision on organization’s management and an comprehensive knowledge on the major strategically issues that enterprises have to deal with; An understanding of financial and economical analysis needed to evaluate financial and accounting reporting information; The basic skills on entrepreneurship matters that may allow students to built their own business or financial project.
The course presents the main concepts and techniques of digital image analysis and processing. The goal is that at the end of the course students will be able to plan and implement algorithms for extracting information from images. The discipline orientation emphasizes the understanding of concepts and methods and their effective use in the analysis of simulated and experimental data. An intensive use of advanced computational tools (Matlab) will be used.
Acquire a solid knowledge in inductive statistics and develop capacities and skills in statistical modelling techniques, fundamental to the presentation, analysis and interpretation of data sets. Upon completing this course, the student should:
At the end of this unit the students should be able to:
It is intended that students:
Introduce the students to advanced concepts on the theory and practice of computational models for parallel and distributed memory architectures. Hands-on experience on programming distributed memory architectures with MPI, and programming shared memory architectures using processes, threads and OpenMP. On completing this course, the students must be able to:
Fundamentals of data streams: sufficient statistics, Hoeffding bounds; Algorithms and tools: online algorithms for data stream learning; Evaluation: adapted cross-validation, prequential evaluation; Applications: Sensor data, Internet of things
The student should be able to: recognize different problems of unsupervised and supervised classification tasks and to solve them using the discussed methods and the software R; prepare, solve and present data mining computational projects, where the several presented models are discussed, evaluated and compared to concrete cases; solve computational and non-computational exercises on the methodologies.
Frequent Pattern Mining: frequent itemsets and association rules; Apriori algorithm; itemsets summarization and rules selection; FP-Growth algorithm. Sequential Pattern Mining: GSP algorithm; PrefixSpan algorithm. Web Mining: information retrieval; recommender systems; link analysis. Text Mining: document clustering; document classification. Outlier Mining: challenges; unsupervised, semi-supervised and supervised techniques.
Review of fundamental concepts in artificial intelligence; Unsupervised learning; Knowledge-based Decision Systems; Search and Optimization Algorithms; Monte Carlo Learning and Methods6. Neural networks, "deep learning"; Algorithms for search, learning and optimization.
Digital image: The human visual system, formation of an image, digital representation of an image, color, noise. Image processing: Point-to-point manipulation, spatial filters, extraction of geometric structures, segmentation. Video processing: Optical flow, video compression. Pattern Recognition: Introduction, knowledge representation, statistical recognition of patterns, machine learning. Fields of application.- New directions in Bioinformatics.
Overview of Dimensionality Reduction: High Dimension Data Acquisition. Curse of the Dimensionality. Intrinsic and Extrinsic Dimensions. Preliminary Calculus on Manifolds. Geometric Structure of High-Dimensional: Similarity and Dissimilarity of Data. Graphs on Data Sets. Spectral Analysis of Graphs. Data Models and Structures of Kernels of DR. Linear Dimensionality Reduction. Classical Multidimensional Scaling. Random Projection. Nonlinear Dimensionality Reduction. Locally Linear Embedding. Local Tangent Space Alignment. Laplacian Eigenmaps. Diﬀusion Maps. Fast Algorithms for DR Approximation
Digital Signal Processing Review. Topics of probabilistic methods in signals, systems and time series. Measures of dependence and joint analysis. Stationarity and ergodicity. Linear modeling and prediction, spectral analysis and filtering- Data-driven signal analysis methods. ntroduction to time-frequency analysis and wavelets. Optimal and adaptive signal processing fundamentals. Least mean squares methods. Dimension reduction and Data decompositions such as, PCA/KLTransform, Empirical mode decomposition. Introduction to novel paradigms in statistical signal processing and time series, selected topics as. Independent component analysis. Bayesian Signal processing and Monte Carlo based approaches. Case study application and critical insight of the studied methods.
1. Decision Analysis - decision trees - conditional probabilities 2. Simulation - simulation models based on random number generators - decision problems combining different random variables 3. Introduction to Linear Optimization Modeling - formulations, key concepts, graphical solution methods - constructing, solving, and interpreting the solution - sensitivity and economic analysis - informed decision-making with linear optimization 4. Introduction to Nonlinear Optimization - similarities and differences between linear and nonlinear optimization - applications 5. Introduction to Discrete Optimization - modeling with discrete variables - discrete optimization to make informed and efficient decisions 6. Dynamic Optimization - the role of data - applications.
This course will introduce the concepts of Data Visualization with a focus on Data Science and Visual Analytics that support tasks that take the user from raw data into insights. Topics include basic concepts of information visualization; visual analytics of evolving phenomena; analysis of spatial and temporal data sets; visual social media analytics; and the visual analytics of text and multimedia collections. Coursework will integrate graphics developed in R (ggplot2) / Python (plotly) into interactive environments, namely data access dashboards for interactive manipulation of multiple graphs.