medium.com Open in urlscan Pro
2606:4700:7::a29f:9804  Public Scan

Submitted URL: http://mur.ai/
Effective URL: https://medium.com/element-ai-research-lab/introducing-mur-ai-c056b6a31856
Submission: On October 29 via api from US — Scanned from DE

Form analysis 0 forms found in the DOM

Text Content

Open in app

Sign up

Sign in

Write


Sign up

Sign in




INTRODUCING MUR.AI


REAL-TIME NEURAL STYLE TRANSFER FOR VIDEO

Jeffrey Rainy

·

Follow

Published in

Element AI Lab

·
8 min read
·
Feb 12, 2018

764

3

Listen

Share


BY JEFFREY RAINY AND ARCHY DE BERKER

Mur.AI is a setup for real-time stylized video. It takes a reference style
image, such as a painting, and a video stream to process. The video stream is
then processed so that it has the style of the reference image. The stylization
is done in real-time, producing a stylized video stream.



In a previous post, we described a technique to stabilize style transfer so that
it works well for videos. We’ve subsequently deployed those techniques to
produce a system which we’ve demoed at a variety of conferences, including C2,
Art Basel and NIPS. You can check out some of the resulting selfies on Twitter.

In this post, we present some high-level explanations of how the system works.
We’ll provide a primer on style transfer and convolution, and detail some of the
engineering challenges we overcame to deploy our algorithm to the real world.


WHAT IS STYLE TRANSFER?

Style transfer is the application of the artistic style of one image to another
image:


Here the source image is of two developers (Phil Mathieu and Jean Raby) and the
style comes from Notre Dame de Grace by A’Shop, a mural in Montreal.

The technique was first introduced in the paper A Neural Algorithm of Artistic
Style by Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. Since then
we’ve seen style transfer applied all over the place, and a variety of
improvements to the original algorithm. In particular, in our implementation we
build upon the work Perceptual Losses for Real-Time Style Transfer and
Super-Resolution by Justin Johnson, Alexandre Alahi, and Fei-Fei Li, which
provides a faster solution for style transfer which is more readily applicable
to video.


HOW DOES IT WORK?

Each frame of the video that is stylized goes through a Convolutional Neural
Network (CNN) and is then displayed on screen. For a beautiful introduction to
convolutional networks, see Chris Olah’s post and for a visual guide to
convolution, see Vincent Dumoulin’s GitHub repo.

Briefly, a CNN performs multiple convolution operations on an image to obtain
another one. Each convolution is an operation on the pixels in a square region
of your image. In a CNN, the same operation is repeated all over the image to
compute each pixel. The diagram below illustrates the processing done by a
single convolution:



At deployment time, we use a single CNN, which we’ll call the stylization CNN,
to produce our stylized image. When we talk about “learning a style,” we are
talking about finding the parameters of the stylization network so that it
produces the right style.

In order to find those parameters, we’re going to use a second network: a
classification CNN. We use this as a feature extractor to provide
representations of the style and the content of our input images.


CLASSIFICATION NETWORK

A classification CNN takes an image and tries to identify what the image
contains. For example if you had some photos of cats and dogs, you might train a
classification CNN to figure out which ones are which. In the classifier CNN we
use, the task is to classify small images into one of 1000 categories, a hugely
popular task named ImageNet.

The classifier network performs multiple convolutions on an image to produce
features useful for classification. As an image goes through the network, it
becomes smaller and smaller in size (pixels) but it also grows in terms of
components per pixel. From RGB input (3 components) at full resolution, it
iteratively shrinks the image to a single pixel, but with many components: the
probability of the image representing each of many categories.


A typical schema for a convolutional neural network for classification.

Although the network is originally trained for classification, we’re not going
to use it for that. An attractive feature of CNNs is that they naturally
recapitulate the hierarchies that exist in natural images. Earlier in the
network, we capture small-scale features such as edges. Later in the network, we
capture larger-scale features, like whole objects. You can explore this
phenomenon in your browser here.

Imagine feeding a picture of yourself in pajamas to the network. After the first
layers, the information the network processes would map to local features (“thin
vertical blue stripes”) whereas the last layers of the network would capture
features that describe the picture as a whole (“someone standing”). Thus, early
layers capture the style of an image, whilst the features learned by late layers
capture the content.


Some examples of the kind of features that different layers in a CNN prefer (we
move deeper in the network as we move left to right). From Olah, et al.,
“Feature Visualization”, Distill, 2017.

This project relies on the pre-trained VGG16 network, from the University of
Oxford, for classification. This provides one of the two CNNs we are going to
need to perform style transfer.


STYLIZATION NETWORK

Our stylization network does the job of actually producing the stylized images.
We’re going to learn parameters which allow us to do this by using the
classifier network as a training tool.

To do so, we take a large set of images for training and repeat the following
steps:

 1. We feed the classifier CNN with Style image, Source image, and Stylized
    image from current stylization CNN
 2. We extract representations of style and content from the classifier network
 3. We adjust the stylization CNN so that the stylized image has style that more
    closely resembles the style image, and content that resembles the source
    image

In pseudo-python-code:



And pictorially:



Capturing a style consists of keeping the small-scale features of the Style
image while keeping the large-scale features of the Source image. In a nutshell,
we’re looking for the stylization CNN that maintains “blue and brown swirling
colors” of style while maintaining “two programmers, a column and a bunch of
desks” of source.

To capture this, we compute the difference between Style and Stylized early in
the network and Source and Stylized later in the network. Our loss function —
the thing we’re trying to minimize by changing the parameters of our stylization
network — is the sum of these two terms. In fact, as detailed in our previous
post, we incorporated an extra stability-loss term which helped generate stable
style transfer from frame-to-frame. You can find the code for our updated
implementation here.

All told, our implementation now trains a new style from a 512 x 512 pixel image
in ~6 hours, utilizing 4 GPUs.


DEPLOYING OUR SYSTEM


Our demo has been deployed on-screen around the world, and has even been
projected onto buildings here in Montreal (as part of the Up375 project).

We faced several challenges in deploying the demo to run in real-time. The main
issues were throughput and latency: how could we take the video, run it through
our model, then render it again in near real-time?


The finished system runs on a Zotac EN1070 minicomputer with an NVIDIA GeForce
1070 GPU, and is easily portable

H264 decoding on the GPU

The camera we use (Logitech C920) outputs pre-compressed H264 video. The naive
approach would be to decode this video with FFmpeg on the GPU, to bring the RGB
pixels back on CPU and upload them again as input for the CNN.

However, CPU-GPU transfer turned out to add significant latency, by forcing
synchronization points between CPU and GPU when copying occurs. The solution was
to decode the H264 video directly with the onboard NVDEC engine (a
hardware-based accelerated decoder that is included in the Zotac EN1070). This
allowed decoded frames to be passed directly as inputs to our CNN, allowing GPU
and CPU to allow to run fully asynchronously.

OpenGL rendering from the GPU

Having decoded our video and run our network on the GPU, we faced a similar
bottleneck in passing the resultant matrix back to the CPU for conventional
rendering. Again, the solution was on-GPU computation. Using the CUDA/OpenGL
Interop API, we can render the outputs to screen straight from the GPU and
avoiding further I/O bottlenecks.


TWITTER INTEGRATION

The demo booth integrates functionality to publish stylized images to Twitter.
The feed can be seen at @ElementAIArt.


STYLES USED

We trained stylization networks for a variety of our favourite murals around
Montreal.

For each style, we cropped a section that had the right characteristics we
wished the network to learn. Below, you’ll find the whole work and the cropped
sections we used for training. You can see some of these samples in action in
the video at the start of this post.


Notre Dame de Grâce, by A’Shop. 6310 rue Sherbrooke Ouest. Spray paint on brick,
2011.

Sans titre, by Bicicleta Sem Freio. 3527 boulevard St-Laurent. Spray paint on
brick, 2015

Sans titre, by El Curiot. 265 rue Sherbrooke Ouest. Acrylic on brick, 2015

Quai des Arts, by EN MASSE. 4890 boulevard St-Laurent. Spray paint on brick,
2011

Galaktic Giant, Chris Dyer. 3483 rue Coloniale. Spray paint on brick, 2013

Sans titre, by David ‘Meggs’ Hook. 3527 boulevard St-Laurent. 2016

Autumn Foliage Series #1 by Matt W. Moore. 4660 boulevard St-Laurent. 2014

Mémoire du coeur by Julian Palma. 4505 rue Notre-Dame Ouest. Spray paint on
brick, 2016

Sans titre, by SBuONe. 4243 boulevard St-Laurent. 2016

Sans titre, by Zilon. 53 rue Marie-Anne. Spray paint on various medium


ACKNOWLEDGMENTS

This work was carried out by Jeffrey Rainy, Eric Robert, Jean Raby and Philippe
Mathieu, with support from the team at Element AI. Thanks to Xavier Snelgrove
for comments on the post.

It relied upon the open-source codebase of Faster Neural Style Transfer by
Yusuketomoto, which is an implementation of Perceptual Losses for Real-Time
Style Transfer and Super-Resolution by Justin Johnson, Alexandre Alahi, and Li
Fei-Fei.




SIGN UP TO DISCOVER HUMAN STORIES THAT DEEPEN YOUR UNDERSTANDING OF THE WORLD.


FREE



Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.


Sign up for free


MEMBERSHIP



Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app


Try for 5 $/month
Artificial Intelligence
Art
Vision
Creativity
Machine Learning


764

764

3


Follow



WRITTEN BY JEFFREY RAINY

94 Followers
·Writer for

Element AI Lab

C++ specialist, Machine Learning enthusiast, Go (game) amateur. Often skeptic,
opinions are my own.

Follow




MORE FROM JEFFREY RAINY AND ELEMENT AI LAB

Jeffrey Rainy

in

Element AI Lab


STABILIZING NEURAL STYLE-TRANSFER FOR VIDEO


USING NOISE-RESILIENCE FOR TEMPORAL STABILITY IN STYLE TRANSFER

Feb 12, 2018
1



Archy de Berker

in

Element AI Lab


MULTITHREADED PREDICTIONS WITH TENSORFLOW ESTIMATORS


CACHING ESTIMATORS TO SPEED UP INFERENCE BY >100X

Jun 18, 2018
3



Nathan Schucher

in

Element AI Lab


NEURAL AUTOREGRESSIVE FLOWS


AN EXPRESSIVE AND TRACTABLE FRAMEWORK FOR LEARNING PROBABILISTIC MODELS

Sep 27, 2018
1



Yoan Mantha

in

Element AI Lab


ESTIMATING THE GENDER RATIO OF AI RESEARCHERS AROUND THE WORLD


WRITTEN WITH SIMON HUDSON

Aug 17, 2018
2


See all from Jeffrey Rainy
See all from Element AI Lab



RECOMMENDED FROM MEDIUM

Alexander Nguyen

in

Level Up Coding


THE RESUME THAT GOT A SOFTWARE ENGINEER A $300,000 JOB AT GOOGLE.


1-PAGE. WELL-FORMATTED.


Jun 1
483



Abi


I SHUT DOWN MY STARTUP. HERE’S THE HONEST TRUTH.


TODAY, I’M ANNOUNCING ZENCAPE HEALTH’S SHUTDOWN. AFTER FOUR YEARS, TWO MAJOR
PIVOTS, COUNTLESS EXPERIMENTS, TENS OF THOUSANDS IN ANNUAL…

Oct 15
187




LISTS


PREDICTIVE MODELING W/ PYTHON

20 stories·1621 saves


NATURAL LANGUAGE PROCESSING

1780 stories·1384 saves


AI REGULATION

6 stories·597 saves


PRACTICAL GUIDES TO MACHINE LEARNING

10 stories·1978 saves


Austin Starks

in

DataDrivenInvestor


I USED OPENAI’S O1 MODEL TO DEVELOP A TRADING STRATEGY. IT IS DESTROYING THE
MARKET


IT LITERALLY TOOK ONE TRY. I WAS SHOCKED.


Sep 15
136



Desiree Peralta

in

Publishous


ONLYFANS IS FINALLY DEAD


AND I’M HAPPY ABOUT IT.


Oct 8
324



Devon Price



in

Human Parts


LAZINESS DOES NOT EXIST


PSYCHOLOGICAL RESEARCH IS CLEAR: WHEN PEOPLE PROCRASTINATE, THERE'S USUALLY A
GOOD REASON


Mar 23, 2018
1755



Vishal Rajput



in

AIGuys


WHY GEN AI BOOM IS FADING AND WHAT’S NEXT?


EVERY TECHNOLOGY HAS ITS HYPE AND COOL DOWN PERIOD.


Sep 4
72


See more recommendations

Help

Status

About

Careers

Press

Blog

Privacy

Terms

Text to speech

Teams

To make Medium work, we log user data. By using Medium, you agree to our Privacy
Policy, including cookie policy.