In this day-long training, you will learn about R, the premier language for data analysis. We will approach the language from the standpoint of data professionals: database developers, database administrators, and data scientists. We will see how data professionals can translate existing skills with SQL to get started with R. We will also dive into the tidyverse, an opinionated set of libraries which has modernized R development. We will see how to use libraries such as dplyr, tidyr, and purrr to write powerful, set-based code. In addition, we will use ggplot2 to create production-quality data visualizations.
Over the course of the day, we will look at several problem domains. For database administrators, areas of note will include visualizing SQL Server data, predicting error occurrences, and estimating backup times for new databases. We will also look at areas of general interest, including analysis of open source data sets.
No experience with R is necessary. The only requirements are a laptop and an interest in leveling up your data professional skillset.
Database developers looking to tame unruly data
Database administrators with an interest in visualizing SQL Server metrics
Data analysts and budding data scientists looking for an overview of the R landscape
Business intelligence professionals needing a powerful language to cleanse and analyze data efficiently
Module 0 — Prep Work
Review data sources we will cover during the training
Ensure laptops are ready to go
Module 1 — Basics of R
What is R?
Basic mechanics of R
Embracing functional programming in R
Connecting to SQL Server with R
Identifying missing values, outliers, and obvious errors
Module 2 — Intro to the Tidyverse
What is the Tidyverse?
Tidyverse basics: dplyr, tidyr, readr, tibble
Module 3 — Dive into the Tidyverse
Data loading: rvest, httr, readxl, jsonlite, xml2
Data wrangling: stringr, lubridate, forcats, broom
Functional programming: purrr
Module 4 — Plotting
Data visualization principles
Types of plots: good, bad, and ugly
Plotting data with ggplot2
Building professional quality plots
Module 5 — R for the DBA
A capstone notebook which covers many of the topics we covered today, focusing on Database Administration use cases
Use cases include:
Gathering CPU statistics
Analyzing Disk Utilization
Analyzing Wait Stats
Investigating Expensive Reports
Analyzing Temp Table Creation Stats
Analyzing Backup Times
Upon completion of this course, attendees will be able to:
Perform basic data analysis with the R programming language
Take advantage of R functions and libraries to clean up dirty data
Build a notebook using Jupyter Notebooks
Create data visualizations with ggplot2
No experience with R is necessary, though it would be helpful. Please bring a laptop to follow along with exercises and get the most out of this course.
Kevin Feasel is a Data Platform MVP and Engineering Manager of the Predictive Analytics team at ChannelAdvisor, where he specializes in T-SQL and R development, fighting with Kafka, and pulling rabbits out of hats on demand. He is the lead contributor to Curated SQL (https://curatedsql.com), a contributing author to Tribal SQL (http://www.tribalsql.com), and one of the contributors behind We Speak Linux (https://wespeaklinux.com). A resident of Durham, North Carolina, he can be found cycling the trails along the triangle whenever the weather’s nice enough.