towardsdatascience.com Open in urlscan Pro
162.159.152.4  Public Scan

Submitted URL: https://towardsdatascience.com/kernel-pca-vs-pca-vs-ica-in-tensorflow-sklearn-60e17eb15a64
Effective URL: https://towardsdatascience.com/kernel-pca-vs-pca-vs-ica-in-tensorflow-sklearn-60e17eb15a64?gi=ecb766a4c265
Submission Tags: falconsandbox
Submission: On September 18 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

Open in app

Sign In

Get started


Home
Notifications
Lists
Stories

--------------------------------------------------------------------------------

Write


Published in

Towards Data Science

Jae Duk Seo
Follow

Sep 10, 2018

·
6 min read
·

Listen



Save







KERNEL PCA VS PCA VS ICA IN TENSORFLOW/SKLEARN


GIF from this website

Principle Component Analysis performs a linear transformation on a given data,
however, many real-world data are not linearly separable. So can we take
advantage of higher dimensions, while not increasing the needed computation
power so much?

> Please note that this post is for my future self to look back and review the
> materials on this post. (and self study)



Lecture: Kernel PCA


PPT from this website

Form the PPT above, I will make some short notes that were helpful to me.



Vapnik–Chervonenkis theory, tells us that if we project our data into a higher
dimensional space, it provides us with better classification power. (Example
seen left.) This might be similar to what a neural network us doing overall, as
the depth increase more abstract features are extracted and have better features
to perform classification.



kernel trick, a method to project original data into higher dimension without
sacrificing too much computational time. (Non-linear feature mapping). And the
matrix form to normalize the feature space.



Example of effective use of KPCA, seen above.



Different Use Cases of KPCA


Paper from this website

Paper from this website

In the first paper, the authors uses KPCA as a preprocessing step as a mean of
feature transformation, and paired up with Least Squares Support Vector Machine
to perform classification on DNA micro-arrays. (Micro-array data have high
dimensions and thus it is a good idea to perform dimensionality reduction
techniques before performing classification.) In the second paper KPCA was use
to extract features from functional Magnetic Resonance Image(fMRI) to perform
automatic diagnostic for Attention-Deficit Hyperactivity Disorder(ADHD).



KPCA (RBF) Layer In Tensorflow



A simple feed forward operation can be implemented like above, and at the time
of writing this article, I won’t work my way into implementing back propagation
respect to the input data.



KPCA vs PCA vs ICA



Lets start simple, we have a 2D data points that is linearly inseparable and now
to verify that our implementation is working lets project our data into two
dimensional space, using each KPCA, PCA and ICA.



Left Image → Projection using KPCA
Middle Image → Projection using PCA
Right Image → Projection using ICA

From the above example we can see that our implementation is working correctly
and our data is now linearly separable. But to make things more interesting lets
see how these methods will do on histopathological images. I am using the data
set from Histopathology data of bone marrow biopsies (HistBMP).



As seen above, each image is 28*28 gray scale image and we are going to find the
eigen images by compressing 1000 image into 100.



Left Image → Projection using KPCA
Middle Image → Projection using PCA
Right Image → Projection using ICA

In general we can see that PCA tries to capture global changes, ICA tries to
capture local changes. But KPCA seems to first capture global changes but as we
get to the lower part of eigen images, we can see that it is capturing local
changes.



Code



For Google Colab, you would need a google account to view the codes, also you
can’t run read only scripts in Google Colab so make a copy on your play ground.
Finally, I will never ask for permission to access your files on Google Drive,
just FYI. Happy Coding!

To access the code for this post please click here.



Final Words

Please note that for distance matrix I borrowed the non-loop form from this
website, and overall implementation was borrowed from ‘Kernel tricks and
nonlinear dimensionality reduction via RBF kernel PCA’ by Sebastian Raschka.

I always wondered how we can plot how much variance we can keep for each
individual eigenvalues, and this was a good post that explained the know how.


Image from this website

Also this was an interesting video I found.


video from this website

Finally, it was interesting to know that PCA / KPCA suffers from variance
inflation and lack of generalizability, the paper below proposes a solution to
the problem.


Paper from this website

If any errors are found, please email me at jae.duk.seo@gmail.com, if you wish
to see the list of all of my writing please view my website here.

Meanwhile follow me on my twitter here, and visit my website, or my Youtube
channel for more content. I also implemented Wide Residual Networks, please
click here to view the blog post.



Reference

 1.  Principal Component Analysis. (2015). Dr. Sebastian Raschka. Retrieved 7
     September 2018, from
     https://sebastianraschka.com/Articles/2015_pca_in_3_steps.html
 2.  About Feature Scaling and Normalization. (2014). Dr. Sebastian Raschka.
     Retrieved 7 September 2018, from
     https://sebastianraschka.com/Articles/2014_about_feature_scaling.html
 3.  About Feature Scaling and Normalization. (2014). Dr. Sebastian Raschka.
     Retrieved 7 September 2018, from
     https://sebastianraschka.com/Articles/2014_about_feature_scaling.html
 4.  Implementing a Principal Component Analysis (PCA). (2014). Dr. Sebastian
     Raschka. Retrieved 7 September 2018, from
     https://sebastianraschka.com/Articles/2014_pca_step_by_step.html
 5.  tf.ones | TensorFlow. (2018). TensorFlow. Retrieved 10 September 2018, from
     https://www.tensorflow.org/api_docs/python/tf/ones
 6.  Distance Matrix Vectorization Trick — Manifold Blog — Medium. (2016).
     Medium. Retrieved 10 September 2018, from
     https://medium.com/dataholiks-distillery/l2-distance-matrix-vectorization-trick-26aa3247ac6c
 7.  matplotlib, P. (2018). Plot two histograms at the same time with
     matplotlib. Stack Overflow. Retrieved 10 September 2018, from
     https://stackoverflow.com/questions/6871201/plot-two-histograms-at-the-same-time-with-matplotlib
 8.  tf.self_adjoint_eig | TensorFlow. (2018). TensorFlow. Retrieved 10
     September 2018, from
     https://www.tensorflow.org/api_docs/python/tf/self_adjoint_eig
 9.  Kernel tricks and nonlinear dimensionality reduction via RBF kernel PCA.
     (2014). Dr. Sebastian Raschka. Retrieved 10 September 2018, from
     https://sebastianraschka.com/Articles/2014_kernel_pca.html
 10. Vapnik–Chervonenkis theory. (2018). En.wikipedia.org. Retrieved 10
     September 2018, from
     https://en.wikipedia.org/wiki/Vapnik%E2%80%93Chervonenkis_theory
 11. Sidhu, G., Asgarian, N., Greiner, R., & Brown, M. (2012). Kernel Principal
     Component Analysis for dimensionality reduction in fMRI-based diagnosis of
     ADHD. Frontiers In Systems Neuroscience, 6. doi:10.3389/fnsys.2012.00074
 12. Thomas, M., Brabanter, K., & Moor, B. (2014). New bandwidth selection
     criterion for Kernel PCA: Approach to dimensionality reduction and
     classification problems. BMC Bioinformatics, 15(1), 137.
     doi:10.1186/1471–2105–15–137
 13. Abrahamsen, T., & Hansen, L. (2011). A Cure for Variance Inflation in High
     Dimensional Kernel Principal Component Analysis. Journal Of Machine
     Learning Research, 12(Jun), 2027–2044. Retrieved from
     http://jmlr.csail.mit.edu/papers/v12/abrahamsen11a.html
 14. Tomczak, J. (2018). Histopathology data of bone marrow biopsies (HistBMP).
     Zenodo. Retrieved 10 September 2018, from
     https://zenodo.org/record/1205024#.W5bcCOhKiUm




241





241

241





SIGN UP FOR THE VARIABLE


BY TOWARDS DATA SCIENCE

Every Thursday, the Variable delivers the very best of Towards Data Science:
from hands-on tutorials and cutting-edge research to original features you don't
want to miss. Take a look.

By signing up, you will create a Medium account if you don’t already have one.
Review our Privacy Policy for more information about our privacy practices.

Get this newsletter


MORE FROM TOWARDS DATA SCIENCE

Follow

Your home for data science. A Medium publication sharing concepts, ideas and
codes.

Oscar Sanchez

·Sep 10, 2018


DATA SCIENCE TAKES ON PUBLIC EDUCATION

Using machine learning to help analyze an education problem — “Education is the
most powerful weapon which you can use to change the world” — Nelson Mandela
Introduction Nelson Mandela was right. Education is a powerful weapon as well as
one of life’s greatest gifts. In the United States, there are various
educational program options for children such as public schools…

Data Science

8 min read





--------------------------------------------------------------------------------

Share your ideas with millions of readers.

Write on Medium

--------------------------------------------------------------------------------

Onel Harrison

·Sep 10, 2018


MACHINE LEARNING BASICS WITH THE K-NEAREST NEIGHBORS ALGORITHM

The k-nearest neighbors (KNN) algorithm is a simple, easy-to-implement
supervised machine learning algorithm that can be used to solve both
classification and regression problems. Pause! Let us unpack that. Breaking it
down A supervised machine learning algorithm (as opposed to an unsupervised
machine learning algorithm) is one that relies on labeled input data…

Data

9 min read





--------------------------------------------------------------------------------

Carsten Klein

·Sep 10, 2018

Member-only


WHO’S TALKING? — USING K-MEANS CLUSTERING TO SORT NEURAL EVENTS IN PYTHON

Spike sorting — Epilepsy is a form of brain disorder in which an excess of
synchronous electrical brain activity leads to seizures which can range from
having no outward symptom at all to jerking movements (tonic-clonic seizure) and
loss of awareness (absence seizure). For some epilepsy patients surgical removal
of the effected brain…

Machine Learning

8 min read





--------------------------------------------------------------------------------

Scott Zelenka

·Sep 10, 2018


HOW TO DO RAPID PROTOTYPING WITH FLASK, UWSGI, NGINX, AND DOCKER ON OPENSHIFT

This post will detail the technical aspects (with reference code) of getting
your prototype into a Docker container which can run on OpenShift using an
arbitrary user id. If you’re like me, you like to dive in headfirst, then figure
out how it works. …

Docker

11 min read





--------------------------------------------------------------------------------

Connor Shorten

·Sep 10, 2018


STORING TRAINING DATA ON THE CLOUD

When you are building deep learning models, you will likely benefit from having
more data to train with. You may run out of space on your local machine and
therefore need to find ways to store datasets on the cloud and then access them
while training your models. I recently…

Machine Learning

3 min read





--------------------------------------------------------------------------------

Read more from Towards Data Science


RECOMMENDED FROM MEDIUM

George Seif

in

Towards Data Science

KERAS VS PYTORCH FOR DEEP LEARNING



KARTHIKEYAN PCP

IMAGE PROCESSING USING COMPUTER VISION — OPENCV



Jae Duk Seo

in

Towards Data Science

HOW I GOT BETTER AT MACHINE LEARNING



Jae Duk Seo

HUMAN-IN-THE-LOOP DIFFERENTIAL SUBSPACE SEARCH IN HIGH-DIMENSIONAL LATENT SPACE



Jean de Dieu Nyandwi

ML MODEL IS 5% — WHAT SHOULD WE BE DOING?



Synced

in

SyncedReview

HOW TO TRAIN A VERY LARGE AND DEEP MODEL ON ONE GPU?



Hamilton Wong

AI HEDGE PROJECT: CRYPTOCURRENCY ALGORITHM TRADING. PART1/4



Theta Technolabs

THE SIGNIFICANCE OF DATA AUGMENTATION IN MACHINE LEARNING



AboutHelpTermsPrivacy

--------------------------------------------------------------------------------


GET THE MEDIUM APP


Get started

Sign In




JAE DUK SEO


4.8K Followers


https://jaedukseo.me I love to make my own notes my guy, let's get LIT with
KNOWLEDGE in my GARAGE


Follow



MORE FROM MEDIUM

Rukshan Pramoditha

in

Towards Data Science

HOW TO SELECT THE BEST NUMBER OF PRINCIPAL COMPONENTS FOR THE DATASET



Maria Gusarova

FEATURE SELECTION TECHNIQUES



SanDeep DuBey

DIMENSIONALITY REDUCTION



Rsalmoshbb

PCA DIMENSIONALITY REDUCTION



Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Knowable

To make Medium work, we log user data. By using Medium, you agree to our Privacy
Policy, including cookie policy.