uc-r.github.io Open in urlscan Pro
2606:50c0:8000::153  Public Scan

URL: http://uc-r.github.io/
Submission: On April 26 via api from GB — Scanned from GB

Form analysis 0 forms found in the DOM

Text Content

Follow me on twitter @bradleyboehmke

Home Course: Intro to R Bootcamp Course: Data Wrangling with R Introduction to R
R Basics Workflow     RStudio Projects     R Markdown     R Notebook Data Types
    Dealing with Numbers     Dealing with Characters     Dealing with Regex
    Dealing with Factors     Dealing with Logical Values     Dealing with Dates
    Dealing with NA's Data Structures     Basics     Managing Vectors
    Managing Lists     Managing Matrices     Managing Data Frames     Managing
Tibbles Functions     Basics     Writing Functions     Control Statements
    Apply Family Importing/Exporting Data     Importing Data     Scraping Data
    Exporting Data Shape & Transform Data     Simplify Code with %>%     Reshape
Data with tidyr     Transform Data with dplyr Visualizations     Quick Plots
    Visual Data Exploration     Advanced Plots with ggplot Analytics
    Descriptive Analytics     Predictive Analytics     Prescriptive Analytics



UC BUSINESS ANALYTICS R PROGRAMMING GUIDE


CREATING TEXT FEATURES WITH BAG-OF-WORDS, N-GRAMS, PARTS-OF-SPEACH AND MORE

02 Oct 2018

Historically, data has been available to us in the form of numeric (i.e.
customer age, income, household size) and categorical features (i.e. region,
department, gender). However, as organizations look for ways to collect new
forms of information such as unstructured text, images, social media posts,
etcetera, we need to understand how to convert this information into structured
features to use in data science tasks such as customer segmentation or
prediction tasks. In this post, we explore a few fundamental feature engineering
approaches that we can start using to convert unstructured text into structured
features.


MULTIVARIATE ADAPTIVE REGRESSION SPLINES

08 Sep 2018

Several previous tutorials (i.e. linear regression, logistic regression,
regularized regression) discussed algorithms that are intrinsically linear. Many
of these models can be adapted to nonlinear patterns in the data by manually
adding model terms (i.e. squared terms, interaction effects); however, to do so
you must know the specific nature of the nonlinearity a priori. Alternatively,
there are numerous algorithms that are inherently nonlinear. When using these
models, the exact form of the nonlinearity does not need to be known explicitly
or specified prior to model training. Rather, these algorithms will search for,
and discover, nonlinearities in the data that help maximize predictive accuracy.
This latest tutorial discusses multivariate adaptive regression splines (MARS),
an algorithm that essentially creates a piecewise linear model which provides an
intuitive stepping block into nonlinearity after grasping the concept of linear
regression and other intrinsically linear models.


INTERPRETING MACHINE LEARNING MODELS WITH THE IML PACKAGE

01 Aug 2018



With machine learning interpretability growing in importance, several R packages
designed to provide this capability are gaining in popularity. In recent blog
posts I assessed lime for model agnostic local interpretability functionality
and DALEX for both local and global machine learning explanation plots. This
newest tutorial examines the iml package to assess its functionality in
providing machine learning interpretability to help you determine if it should
become part of your preferred machine learning toolbox.


MODEL INTERPRETABILITY WITH DALEX

11 Jul 2018



As advanced machine learning algorithms are gaining acceptance across many
organizations and domains, machine learning interpretability is growing in
importance to help extract insight and clarity regarding how these algorithms
are performing and why one prediction is made over another. There are many
methodologies to interpret machine learning results (i.e. variable importance
via permutation, partial dependence plots, local interpretable model-agnostic
explanations), and many machine learning R packages implement their own versions
of one or more methodologies. However, some recent R packages that focus purely
on ML interpretability agnostic to any specific ML algorithm are gaining
popularity. One such package is DALEX and this latest tutorial covers what this
package does (and does not do) so that you can determine if it should become
part of your preferred machine learning toolbox.


GRADIENT BOOSTING MACHINES

14 Jun 2018

Gradient boosting machines (GBMs) are an extremely popular machine learning
algorithm that have proven successful across many domains and is one of the
leading methods for winning Kaggle competitions. Whereas random forests build an
ensemble of deep independent trees, GBMs build an ensemble of shallow and weak
successive trees with each tree learning and improving on the previous. When
combined, these many weak successive trees produce a powerful “committee” that
are often hard to beat with other algorithms. This latest tutorial covers the
fundamentals of GBMs for regression problems.

Older Newer