medium.com
Open in
urlscan Pro
2606:4700:7::a29f:9804
Public Scan
Submitted URL: http://mur.ai/
Effective URL: https://medium.com/element-ai-research-lab/introducing-mur-ai-c056b6a31856
Submission: On October 29 via api from US — Scanned from DE
Effective URL: https://medium.com/element-ai-research-lab/introducing-mur-ai-c056b6a31856
Submission: On October 29 via api from US — Scanned from DE
Form analysis
0 forms found in the DOMText Content
Open in app Sign up Sign in Write Sign up Sign in INTRODUCING MUR.AI REAL-TIME NEURAL STYLE TRANSFER FOR VIDEO Jeffrey Rainy · Follow Published in Element AI Lab · 8 min read · Feb 12, 2018 764 3 Listen Share BY JEFFREY RAINY AND ARCHY DE BERKER Mur.AI is a setup for real-time stylized video. It takes a reference style image, such as a painting, and a video stream to process. The video stream is then processed so that it has the style of the reference image. The stylization is done in real-time, producing a stylized video stream. In a previous post, we described a technique to stabilize style transfer so that it works well for videos. We’ve subsequently deployed those techniques to produce a system which we’ve demoed at a variety of conferences, including C2, Art Basel and NIPS. You can check out some of the resulting selfies on Twitter. In this post, we present some high-level explanations of how the system works. We’ll provide a primer on style transfer and convolution, and detail some of the engineering challenges we overcame to deploy our algorithm to the real world. WHAT IS STYLE TRANSFER? Style transfer is the application of the artistic style of one image to another image: Here the source image is of two developers (Phil Mathieu and Jean Raby) and the style comes from Notre Dame de Grace by A’Shop, a mural in Montreal. The technique was first introduced in the paper A Neural Algorithm of Artistic Style by Leon A. Gatys, Alexander S. Ecker, and Matthias Bethge. Since then we’ve seen style transfer applied all over the place, and a variety of improvements to the original algorithm. In particular, in our implementation we build upon the work Perceptual Losses for Real-Time Style Transfer and Super-Resolution by Justin Johnson, Alexandre Alahi, and Fei-Fei Li, which provides a faster solution for style transfer which is more readily applicable to video. HOW DOES IT WORK? Each frame of the video that is stylized goes through a Convolutional Neural Network (CNN) and is then displayed on screen. For a beautiful introduction to convolutional networks, see Chris Olah’s post and for a visual guide to convolution, see Vincent Dumoulin’s GitHub repo. Briefly, a CNN performs multiple convolution operations on an image to obtain another one. Each convolution is an operation on the pixels in a square region of your image. In a CNN, the same operation is repeated all over the image to compute each pixel. The diagram below illustrates the processing done by a single convolution: At deployment time, we use a single CNN, which we’ll call the stylization CNN, to produce our stylized image. When we talk about “learning a style,” we are talking about finding the parameters of the stylization network so that it produces the right style. In order to find those parameters, we’re going to use a second network: a classification CNN. We use this as a feature extractor to provide representations of the style and the content of our input images. CLASSIFICATION NETWORK A classification CNN takes an image and tries to identify what the image contains. For example if you had some photos of cats and dogs, you might train a classification CNN to figure out which ones are which. In the classifier CNN we use, the task is to classify small images into one of 1000 categories, a hugely popular task named ImageNet. The classifier network performs multiple convolutions on an image to produce features useful for classification. As an image goes through the network, it becomes smaller and smaller in size (pixels) but it also grows in terms of components per pixel. From RGB input (3 components) at full resolution, it iteratively shrinks the image to a single pixel, but with many components: the probability of the image representing each of many categories. A typical schema for a convolutional neural network for classification. Although the network is originally trained for classification, we’re not going to use it for that. An attractive feature of CNNs is that they naturally recapitulate the hierarchies that exist in natural images. Earlier in the network, we capture small-scale features such as edges. Later in the network, we capture larger-scale features, like whole objects. You can explore this phenomenon in your browser here. Imagine feeding a picture of yourself in pajamas to the network. After the first layers, the information the network processes would map to local features (“thin vertical blue stripes”) whereas the last layers of the network would capture features that describe the picture as a whole (“someone standing”). Thus, early layers capture the style of an image, whilst the features learned by late layers capture the content. Some examples of the kind of features that different layers in a CNN prefer (we move deeper in the network as we move left to right). From Olah, et al., “Feature Visualization”, Distill, 2017. This project relies on the pre-trained VGG16 network, from the University of Oxford, for classification. This provides one of the two CNNs we are going to need to perform style transfer. STYLIZATION NETWORK Our stylization network does the job of actually producing the stylized images. We’re going to learn parameters which allow us to do this by using the classifier network as a training tool. To do so, we take a large set of images for training and repeat the following steps: 1. We feed the classifier CNN with Style image, Source image, and Stylized image from current stylization CNN 2. We extract representations of style and content from the classifier network 3. We adjust the stylization CNN so that the stylized image has style that more closely resembles the style image, and content that resembles the source image In pseudo-python-code: And pictorially: Capturing a style consists of keeping the small-scale features of the Style image while keeping the large-scale features of the Source image. In a nutshell, we’re looking for the stylization CNN that maintains “blue and brown swirling colors” of style while maintaining “two programmers, a column and a bunch of desks” of source. To capture this, we compute the difference between Style and Stylized early in the network and Source and Stylized later in the network. Our loss function — the thing we’re trying to minimize by changing the parameters of our stylization network — is the sum of these two terms. In fact, as detailed in our previous post, we incorporated an extra stability-loss term which helped generate stable style transfer from frame-to-frame. You can find the code for our updated implementation here. All told, our implementation now trains a new style from a 512 x 512 pixel image in ~6 hours, utilizing 4 GPUs. DEPLOYING OUR SYSTEM Our demo has been deployed on-screen around the world, and has even been projected onto buildings here in Montreal (as part of the Up375 project). We faced several challenges in deploying the demo to run in real-time. The main issues were throughput and latency: how could we take the video, run it through our model, then render it again in near real-time? The finished system runs on a Zotac EN1070 minicomputer with an NVIDIA GeForce 1070 GPU, and is easily portable H264 decoding on the GPU The camera we use (Logitech C920) outputs pre-compressed H264 video. The naive approach would be to decode this video with FFmpeg on the GPU, to bring the RGB pixels back on CPU and upload them again as input for the CNN. However, CPU-GPU transfer turned out to add significant latency, by forcing synchronization points between CPU and GPU when copying occurs. The solution was to decode the H264 video directly with the onboard NVDEC engine (a hardware-based accelerated decoder that is included in the Zotac EN1070). This allowed decoded frames to be passed directly as inputs to our CNN, allowing GPU and CPU to allow to run fully asynchronously. OpenGL rendering from the GPU Having decoded our video and run our network on the GPU, we faced a similar bottleneck in passing the resultant matrix back to the CPU for conventional rendering. Again, the solution was on-GPU computation. Using the CUDA/OpenGL Interop API, we can render the outputs to screen straight from the GPU and avoiding further I/O bottlenecks. TWITTER INTEGRATION The demo booth integrates functionality to publish stylized images to Twitter. The feed can be seen at @ElementAIArt. STYLES USED We trained stylization networks for a variety of our favourite murals around Montreal. For each style, we cropped a section that had the right characteristics we wished the network to learn. Below, you’ll find the whole work and the cropped sections we used for training. You can see some of these samples in action in the video at the start of this post. Notre Dame de Grâce, by A’Shop. 6310 rue Sherbrooke Ouest. Spray paint on brick, 2011. Sans titre, by Bicicleta Sem Freio. 3527 boulevard St-Laurent. Spray paint on brick, 2015 Sans titre, by El Curiot. 265 rue Sherbrooke Ouest. Acrylic on brick, 2015 Quai des Arts, by EN MASSE. 4890 boulevard St-Laurent. Spray paint on brick, 2011 Galaktic Giant, Chris Dyer. 3483 rue Coloniale. Spray paint on brick, 2013 Sans titre, by David ‘Meggs’ Hook. 3527 boulevard St-Laurent. 2016 Autumn Foliage Series #1 by Matt W. Moore. 4660 boulevard St-Laurent. 2014 Mémoire du coeur by Julian Palma. 4505 rue Notre-Dame Ouest. Spray paint on brick, 2016 Sans titre, by SBuONe. 4243 boulevard St-Laurent. 2016 Sans titre, by Zilon. 53 rue Marie-Anne. Spray paint on various medium ACKNOWLEDGMENTS This work was carried out by Jeffrey Rainy, Eric Robert, Jean Raby and Philippe Mathieu, with support from the team at Element AI. Thanks to Xavier Snelgrove for comments on the post. It relied upon the open-source codebase of Faster Neural Style Transfer by Yusuketomoto, which is an implementation of Perceptual Losses for Real-Time Style Transfer and Super-Resolution by Justin Johnson, Alexandre Alahi, and Li Fei-Fei. SIGN UP TO DISCOVER HUMAN STORIES THAT DEEPEN YOUR UNDERSTANDING OF THE WORLD. FREE Distraction-free reading. No ads. Organize your knowledge with lists and highlights. Tell your story. Find your audience. Sign up for free MEMBERSHIP Read member-only stories Support writers you read most Earn money for your writing Listen to audio narrations Read offline with the Medium app Try for 5 $/month Artificial Intelligence Art Vision Creativity Machine Learning 764 764 3 Follow WRITTEN BY JEFFREY RAINY 94 Followers ·Writer for Element AI Lab C++ specialist, Machine Learning enthusiast, Go (game) amateur. Often skeptic, opinions are my own. Follow MORE FROM JEFFREY RAINY AND ELEMENT AI LAB Jeffrey Rainy in Element AI Lab STABILIZING NEURAL STYLE-TRANSFER FOR VIDEO USING NOISE-RESILIENCE FOR TEMPORAL STABILITY IN STYLE TRANSFER Feb 12, 2018 1 Archy de Berker in Element AI Lab MULTITHREADED PREDICTIONS WITH TENSORFLOW ESTIMATORS CACHING ESTIMATORS TO SPEED UP INFERENCE BY >100X Jun 18, 2018 3 Nathan Schucher in Element AI Lab NEURAL AUTOREGRESSIVE FLOWS AN EXPRESSIVE AND TRACTABLE FRAMEWORK FOR LEARNING PROBABILISTIC MODELS Sep 27, 2018 1 Yoan Mantha in Element AI Lab ESTIMATING THE GENDER RATIO OF AI RESEARCHERS AROUND THE WORLD WRITTEN WITH SIMON HUDSON Aug 17, 2018 2 See all from Jeffrey Rainy See all from Element AI Lab RECOMMENDED FROM MEDIUM Alexander Nguyen in Level Up Coding THE RESUME THAT GOT A SOFTWARE ENGINEER A $300,000 JOB AT GOOGLE. 1-PAGE. WELL-FORMATTED. Jun 1 483 Abi I SHUT DOWN MY STARTUP. HERE’S THE HONEST TRUTH. TODAY, I’M ANNOUNCING ZENCAPE HEALTH’S SHUTDOWN. AFTER FOUR YEARS, TWO MAJOR PIVOTS, COUNTLESS EXPERIMENTS, TENS OF THOUSANDS IN ANNUAL… Oct 15 187 LISTS PREDICTIVE MODELING W/ PYTHON 20 stories·1621 saves NATURAL LANGUAGE PROCESSING 1780 stories·1384 saves AI REGULATION 6 stories·597 saves PRACTICAL GUIDES TO MACHINE LEARNING 10 stories·1978 saves Austin Starks in DataDrivenInvestor I USED OPENAI’S O1 MODEL TO DEVELOP A TRADING STRATEGY. IT IS DESTROYING THE MARKET IT LITERALLY TOOK ONE TRY. I WAS SHOCKED. Sep 15 136 Desiree Peralta in Publishous ONLYFANS IS FINALLY DEAD AND I’M HAPPY ABOUT IT. Oct 8 324 Devon Price in Human Parts LAZINESS DOES NOT EXIST PSYCHOLOGICAL RESEARCH IS CLEAR: WHEN PEOPLE PROCRASTINATE, THERE'S USUALLY A GOOD REASON Mar 23, 2018 1755 Vishal Rajput in AIGuys WHY GEN AI BOOM IS FADING AND WHAT’S NEXT? EVERY TECHNOLOGY HAS ITS HYPE AND COOL DOWN PERIOD. Sep 4 72 See more recommendations Help Status About Careers Press Blog Privacy Terms Text to speech Teams To make Medium work, we log user data. By using Medium, you agree to our Privacy Policy, including cookie policy.