Data Science with R and SQL Server

4 days /32 hours | online / onsite

Introducing the language, statistics, data mining, and machine learning with R, and using data science in SQL Server and Microsoft BI stack.

What Does this Course Cover?

R is the most popular environment and language for statistical analyses, data mining, and machine learning. Managed and scalable version of R runs in SQL Server, Power BI, and Azure ML. The main topic of the course is the R language. However, the course also shows how to use the languages and tools available in MS BI suite for data science applications, including Python, T-SQL, Power BI, Azure ML, and Excel. The labs focus on R; the demos also show the code in other languages.

Next Dates

02/06/2023 – 02/09/2023, QA, online
04/03/2023 – 04/06/2023, QA, online

For our classes managed by our partners QA (UK) please visit their respective websites!


Attendees should have basic understanding of data analysis and basic familiarity with SQL Server tools.

Course Outline

Module 1. Introducing data science and R
  • What are statistics, data mining, machine learning…
  • Data science projects and their lifetime
  • Introducing R
  • R tools
  • R data structures
    Lab 1
Module 2. Introducing Python
  • Basic syntax and objects
  • Data manipulation with NumPy and Pandas
  • Visualizations with matplotlib and seaborn libraries
  • Data science with Scikit-Learn
    Lab 2: Discussion – R vs Python
Module 3. Data overview
  • Datasets, cases and variables
  • Types of variables
  • Introductory statistics for discrete variables
  • Descriptive statistics for continuous variables
  • Basic graphs
  • Sampling, confidence level, confidence interval
    Lab 2
Module 4. Data preparation
  • Derived variables
  • Missing values and outliers
  • Smoothing and normalization
  • Time series
  • Training and test sets
    Lab 3
Module 5. Associations between two variables and visualizations of associations
  • Covariance and correlation
  • Contingency tables and chi-squared test
  • T-test and analysis of variance
  • Bayesian inference
  • Linear models
    Lab 4
Module 6. Feature selection and matrix operations
  • Feature selection in linear modelsExecute
  • Basic matrix algebra
  • Principal component analysis
  • Exploratory factor analysis
    Lab 5
Module 7. Unsupervised learning
  • Hierarchical clustering
  • K-means clustering
  • Association rules
    Lab 6
Module 8. Supervised learning
  • Neural Networks
  • Logistic Regression
  • Decision and regression trees
  • Random forests
  • Gradient boosting trees
  • K-nearest neighbors
    Lab 7
Module 9. Modern topics
  • Support vector machines
  • Time series
  • Text mining
  • Deep learning
  • Reinforcement learning
    Lab 8
Module 10. R in SQL Server and MS BI
  • ML Services (In-Database) structure
  • Executing external scripts in SQL Server
  • Storing a model and performing native predictions
  • R in Azure ML and Power BI
    Lab 9

Dejan Sarka

Trainer & Author

Dejan Sarka focuses on database & business intelligence application development. Besides projects, he spends about half of his time on training and mentoring. He is the founder of the Slovenian SQL Server and .NET Users Group. Dejan Sarka is the main author or co-author of nine books on databases and SQL Server (including training kits for Microsoft exams 70-461 and 70-463). Dejan Sarka also developed several courses for Lucient. As an MCT, Dejan Sarka speaks at many local and international events. International events include conferences such as PASS, TechEd, and DevWeek. He is also indispensable at regional MS events. In addition, he is a co-organizer of the high-profile technical conference Bleeding Edge.