Gyarados, a water type

Gyarados as fire type

Gyarados as grass type

Gyarados as electric type

pokemon2pokemon: Using Neural Networks to Generate Pokemon as Different Elemental Types

June 3, 2019

Have you ever wondered what a Gyarados would look like as a fire type? Or grass type, or electric type?

For my last project at the Recurse Center, I trained CycleGAN, an image-to-image translation model, on images of Pokémon of different types.

Ho-oh, a fire type

Ho-oh as dark type

Model Overview

CycleGAN is an image-to-image translation model that allows us to “translate” from one set of images to another. For more on CycleGAN, see previous blog posts on image-to-image translation with CycleGAN and pix2pix.

The open-source implementation used to train and generate these images of Pokémon uses PyTorch and can be found on Github. For this project, I trained the model to translate between sets of Pokémon images of different types, e.g. translating images of water types to fire types.

Training Data

I found the original dataset of Pokémon images and their types on Kaggle, containing Generations 1-7. I wrote a script to sort the Pokémon images by their primary type.

The resulting dataset, as well as the script, can both be found on my Github.

Results

For each pair of images, on the left is the original image of the Pokemon, and on the right is the type-translated version. (Results are best viewed if you turn off f.lux, night shift, or any other display mode that changes the color of your screen.)

Water type -> other types

Dewgong, water type

Dewgong as grass type

Lapras, water type

Lapras as grass type

Azumarill, water type

Azumarill as grass type

Kingdra, water type

Kingdra as grass type

Clawitzer, water type

Clawitzer as fire type

Empoleon, water type

Empoleon as grass type

Greninja, water type

Greninja as grass type

Keldeo, water type

Keldeo as grass type

Cloyster, water type

Cloyster as electric type

Lapras, water type

Lapras as fire type

Kyogre, water type

Kyogre as grass type

Feraligatr, water type

Feraligatr as grass type

Clawitzer, water type

Clawitzer as grass type

Carracosta, water type

Carracosta as fire type

Greninja, water type

Greninja as fire type

Clauncher, water type

Clauncher as grass type

Fire type -> other types

Slugma, fire type

Slugma as dark type

Ponyta, fire type

Ponyta as dark type

Combusken, fire type

Combusken as dark type

Torkoal, fire type

Torkoal as water type

Darmanitan, fire type

Darmanitan as dark type

Delphox, fire type

Delphox as dark type

Simisear, fire type

Simisear as dark type

Pignite, fire type

Pignite as water type

Heatmor, fire type

Heatmor as electric type

Ho-oh, fire type

Ho-oh as electric type

Rapidash, fire type

Rapidash as dark type

Blaziken, fire type

Blaziken as dark type

Flareon, fire type

Flareon as water type

Darmanitan, fire type

Darmanitan as electric type

Delphox, fire type

Delphox as water type

Simisear, fire type

Simisear as water type

Magmortar, fire type

Magmortar as electric type

Talonflame, fire type

Talonflame as water type

Grass type -> other types

Bellossom, grass type

Bellossom as water type

Grovyle, grass type

Grovyle as water type

Maractus, grass type

Maractus as water type

Leafeon, grass type

Leafeon as water type

Sceptile, grass type

Sceptile as water type

Pansage, grass type

Pansage as water type

Electric type -> other types

Electivire, electric type

Electivire as dark type

Thundurus, electric type

Thundurus as fire type

Dragon type -> other types

Latios, dragon type

Latios as grass type

Kyurem, dragon type

Kyurem as dark type

Garchomp, dragon type

Garchomp as dark type

Zekrom, dragon type

Zekrom as fire type

Rayquaza, dragon type

Rayquaza as fire type

Salamence, dragon type

Salamence as fire type

Haxorus, dragon type

Haxorus as fire type

Zygarde, dragon type

Zygarde as fire type

Dark type -> other types

Darkrai, dark type

Darkrai as dragon type

Yveltal, dark type

Yveltal as electric type

Hydreigon, dark type

Hydreigon as fire type

Localhost Talk: creative applications of deep learning, aka, neural networks for fun and not profit :-)

May 16, 2019

Earlier this week I gave a talk at Localhost, the Recurse Center’s public-facing technical speaker series. Slides embedded below. Here’s also a link to the talk slides if you want to see my notes included.

The talk covers some of the creative deep learning projects I’ve worked on while at RC:

generating jazz with an LSTM [+ github]
generating punchlines to jokes with seq2seq [+ github]
generating maps and buildings using circuit boards with pix2pix
neural style transfer [+ github]
translating between Pokemon types with CycleGAN [+ dataset] (blog post and repo link to come!)

Overall I received a lot of enthusiastic positive feedback and felt pretty good about how it went! I do feel somewhat proud of all of the fun projects I was able to explore while at RC, and it feels nice to be able to share that with others.

Implementing char-RNN from Scratch in PyTorch, and Generating Fake Book Titles

April 24, 2019

This week, I implemented a character-level recurrent neural network (or char-rnn for short) in PyTorch, and used it to generate fake book titles. The code, training data, and pre-trained models can be found on my GitHub repo.

Heart in the Dark

Me the Bean

Be the Life

Yours

Model Overview

Diagram of the char-rnn network architecture. Source.

The char-rnn language model is a recurrent neural network that makes predictions on the character level. In contrast, many language models operate on the word level.

Making character-level predictions can be a bit more chaotic, but might be better for making up fake words (e.g. Harry Potter spells, band names, fake slang, fake cities, fantasy terms, etc.). Word-level language models might have an advantage for generating longer pieces of text, like summaries or fiction, as they don’t need to figure out how to spell, in a sense.

There do exist character-word hybrid approaches. For example, the GPT-2 model uses byte pair encoding, an approach that interpolates between the word-level for common sequences and the character-level for rare sequences.

This particular char-rnn implementation is set up to handle multiple categories of text. In this use case, it is able to make predictions for different book genres, e.g. Romance, Fantasy, Young Adult, etc.

Training Data

The training data used for this model is a modified version of a Goodreads data scrape of 20K book titles. I transformed the CSV file into separate text files for the top 30 genres. The resulting split dataset can be found in my Github repo.

GPU training time with this model took about 20 minutes on an NVIDIA GeForce GTX 1080 Ti. Generating samples only takes a few seconds.

Results

The following results are a selected sampling of outputs. Note that I’m mainly including examples that consist of real words, with a few exceptions.

Romance
Heart in the Dark

Years of the Dark

You the Book

The Stove to the Story

Fantasy
Growing the Dark

Book of the Dark

Red Sande

Fiction
In the Bead Store

Jen the Bead

King the Bean

Historical
A to the Bean

Other and Story

Science Fiction
Darke Sers

Voringe

In the Beantire

Mystery
Bed Singe

Kiss of the Dark

Red Story

Classics
A Mander of the Suckers

Gorden the Story of Merica

Childrens
Dark Book of the Story of the Sures of the Surating

Late

Story of the Bean

Paranormal
A Store of the Store

Red Store

Stariss and Storiss

Wind Store

New Adult
Live Me Life

Growing Me

In the Bean

Me the Bean

Poetry
Yours

Me

Erotica
Volle the Story of Men

King of the Dark

Dork of the Dark

Work of the Dark

Bed Storys of the Dark

Your Mind

Biography
Be the Life

On Anger and Of Mand Anger

Comically, there are many book titles that revolve around beans, beads, stores, and darkness. While I did notice some subtle differences between genres, it doesn’t appear to be particularly drastic overall.

samoyed2bernese: Using CycleGAN for Image-to-Image Translation between Samoyeds and Bernese Mountain Dogs

April 19, 2019

Dogs!!! More dogs this week!!! Is it possible I picked this project because I was in the mood for dog pictures? Absolutely.

This week, I used the CycleGAN image-to-image translation model to translate between images of Samoyeds and Bernese mountain dogs, two of my favorite dogs. If you’re not familiar with these breeds, you’re in luck, because here are some dog pictures for your reference. (Such good dogs!!)

Model Overview

CycleGAN builds off of the pix2pix network, a conditional generative adversarial network (or cGAN) that can map paired input and output images. Unlike pix2pix, CycleGAN is able to train on unpaired sets of images. For more on pix2pix and CycleGAN, see my previous blog post here.

The CycleGAN implementation used to train and generate dog pictures uses PyTorch and can be found on Github here. (This repo also contains a pix2pix implementation, which I had used previously to generate circuit cities.)

A major strength of CycleGAN over pix2pix is that your datasets can be unpaired. For pix2pix, you may have to really dig, curate, or create your own dataset of 1-to-1 paired images. For example, if you wanted to translate daytime photos to nighttime photos with pix2pix, you would need a pair of daytime and nighttime photos of the same location. With CycleGan, you can just have a set of daytime photos of any location and a set of nighttime photos of any location and call it a day (no pun intended).

Another strength of CycleGAN over, say, neural style transfer, is that the translations can be localized. In the following examples, you’ll see that the translation applies only to the dog. Object recognition is implied, and the non-dog portions of the images are not really affected. With neural style transfer, you’re applying a style transformation to the entire image.

As an aside, I originally ran CycleGAN on a set of images of forests, and a set of images of forest paintings. While the results did turn out as expected, I realized this kind of task is really best suited for neural style transfer. (Which inspired me to implement it from scratch! See previous blog post on implementing neural style transfer from scratch in PyTorch.)

Training Data

To train the model, I used 218 images of Samoyeds and 218 images of Bernese mountain dogs from one of my favorite datasets currently on the internet: the Stanford Dogs Dataset. So many good dogs!! My heart!!

GPU training time took a couple of hours on an NVIDIA GeForce GTX 1080 Ti, and generating results only took a few minutes.

Results

In the following examples, on the left is the input, a real photo of a Samoyed. On the right is the CycleGAN output, a generated image translated from the input into a Bernese mountain dog.

Notes

Note that, since this is a blog post and not a scientific paper, I’ve only included the more effective results in this post. For example, bernese2samoyed doesn’t look quite as good — it just looks like white-out was applied to the dog lol.

I would add that a major strength of cycleGAN is that the changes are applied locally, and not to the entire image. The network is able to identify the boundaries of dog and not-dog.

Another note is that this approach seems to work best when translating between inputs with similar shapes. In these results, mainly the coloring was transferred, and not so much the dog shape. I would posit that breeds that are similar in shape would yield more effective results, e.g. translating between golden and chocolate labs, or between tabby cats and tortoise shell cats.

Implementing Neural Style Transfer from Scratch using PyTorch

March 19, 2019

This past week, I’ve been playing around with more image processing and generation techniques. In particular, I implemented the neural style transfer algorithm by Gatys, Ecker, and Bethge in PyTorch following this tutorial. The paper and technique have been around for a few years, but it wasn’t until now that I have access to a GPU here at Recurse. This was so much fun to implement and experiment with!

My GitHub repo contains instructions on setup and usage, as well as a directory containing many results, if you would like to try it out and explore for yourself!

Model Overview

Neural style transfer takes two images as input and applies the style of one image onto the content of the other. In the example below, the first image is the style input, the second image is the content input, and the third image is the result of the style transfer. (The style image used here is one of my favorite paintings: Nocturne in Black and Gold, the Falling Rocket by James Abbott McNeill Whistler.)

The approach builds off of the VGG-19, a convolutional neural network pretrained on millions of images. It’s 19 layers deep and built by the Visual Geometry Group, hence VGG-19.

For neural style transfer, we modify the network architecture as such: we insert a content loss layer, using mean squared error, after the fourth convolutional layer; and insert style loss layers, using mean squared error on normalized gram matrices, after the first five convolutional layers.

Results

These are some of my favorite images that resulted from my explorations. Note that this is a curated collection of results.

Circuit Cities with Pix2Pix: Using Image-to-Image Translation with Generative Adversarial Networks to Create Buildings, Maps, and Satellite Images from Circuit Boards

March 6, 2019

I’ve been playing around with generative adversarial networks this week. In particular, using image-to-image translation to see what we can create using images of circuit boards.

I’ve noticed before that circuit boards mildly resemble aerial geospatial images. What kinds of cities could we build with them?

Model Overview

GANs

GAN stands for Generative Adversarial Network: generative, because we are using it to generate data; adversarial, because it comprises of two competing networks; and network, because we are describing a neural network architecture.

Essentially, you have two models competing: a generator that generates fake images, and a discriminator that judges whether an image is fake or real.

First, we generate a bunch of fake images using the generator. Then, we take these fake images to the discriminator, which classifies images as fake or real. Using the information on how the discriminator determined which images are fake, we take that back to the generator so we can generate better fake images. We repeat this process, taking turns training the generator, then the discriminator, until the discriminator can no longer tell which images are real or fake (generated).

Pix2Pix

The pix2pix model uses conditional adversarial networks (aka cGANs, conditional GANs) trained to map input to output images, where the output is a “translation” of the input. For image-to-image translation, instead of simply generating realistic images, we add the condition that the generated image is a translation of an input image. To train cGANs, we use pairs of images, one as an input and one as the translated output.

For example, if we train pairs of black-and-white images (input) alongside the color image (translation), we then have a model that can generate color photos given a black-and-white photo. If we train pairs of day (input) and night (translation) images of the same location, we have a model that can generate night photos from day photos.

CycleGAN

A related model architecture is CycleGAN (original CycleGAN paper), which builds off of the pix2pix architecture, but allows you to train the model without having explicit pairings. For example, we can have one dataset of day images, and one dataset of night images; it’s not necessary to have a specific pairing of a day and night image of the same location. To train CycleGAN, we can use unpaired training data. (CycleGAN is not used here but I hope to explore it more this week!)

The pretrained models I used for these explorations are from a PyTorch implementation of pix2pix that can be found on Github.

Results

In the results below, on the left is the circuit board image input, and on the right is the generated translation.

Circuit Boards to Buildings

For these, I used the facades_label2photo pretrained model, originally trained on paired images like this:

Circuits to Maps

For these, I used the sat2map pretrained model, originally trained on paired of satellite aerial images (input) and Google maps (translation).

Circuits to Satellite Images

For these, I used the map2sat pretrained model, originally trained on paired of Google maps images (input) and satellite aerial images (translation).

Dogspotting: Using Machine Learning to Draw Bounding Boxes around Dogs in Pictures

March 5, 2019

I wanted to try out a computer vision project, and what better way to do that than to point out where dogs are in photos??

Project Overview

I’ve included a Github repo and Jupyter notebook for this project.

This project uses the ImageAI computer vision library for Python, which offers support for RetinaNet, YOLOv3, and TinyYOLOv3 algorithms for object detection. The model used is a RetinaNet model pretrained on the ImageNet-1000 dataset, also provided by ImageAI.

Official guide and documentation for ImageAI detection classes are provided as well.

Overall Impressions

I was pleasantly surprised at how easily out-of-the-box object detection has become. The ImageAI library supports custom object detection for the following categories:

_{person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, fire hydrant, stop_sign, parking meter, bench, bird, cat, dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack, umbrella, handbag, tie, suitcase, frisbee, skis, snowboard, sports ball, kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket, bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange, broccoli, carrot, hot dog, pizza, donot, cake, chair, couch, potted plant, bed, dining table, toilet, tv, laptop, mouse, remote, keyboard, cell phone, microwave, oven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy bear, hair dryer, toothbrush.}

This made it very easy to detect dogs specifically! All I had to do is set up my project, download the pretrained model, and set a few parameters and filepaths. The entire project only took about 20 minutes from setup to output image.

Some parameters of interest:

custom_objects = detector.CustomObjects(dog=True, cat=True)

Any of the object categories can be included here. We are not just limited to dogs, and we can include as many categories as we want -- or potentially all of them -- in the same detector.

detections = detector.detectCustomObjectsFromImage(input_image=input_path, output_image_path=output_path, custom_objects=custom_objects, minimum_percentage_probability=45)

minimum_percentage_probability refers to how confident the model should be before drawing a bounding box. We can set it to a low percentage, e.g. 15%, if we want it to flag everything it sees. We can set it to a high percentage, e.g. 85%, if we only want it to flag objects when the model is confident about what it’s detecting.

Results

Overall, fairly decent results, especially considering the short amount of time it took to set up. The boxes are bounding boxes for the dog (or animal, or bird) detected, with the object name and prediction probability.

Text Generation with GPT-2, OpenAI's Recently Released Language Model

February 28, 2019

Venus, planet of love Was destroyed by global warming, while the other suns have been ravaged by the tides of time. There are no suns and there are no tides, except the sun itself. A few suns exist now in Mars (the moon is now in the orbit of Sol and Jupiter), but they are gone in the future. (It is implied that we can go back home to Earth) Jupiter is the only other planet that is not the source; that planet would be the nearest known red planet to us. So is Earth.

The Earth's only visible source of energy is the sun itself. (In Greek it means "sun" or "heaven.") According to the Old Testament story, Jupiter was so cold that it was able to cause the death of children when they died in a ship. The only real star in the solar system that is capable of causing death is the sun, which must be one of the most powerful stars in the universe. Only the moon can cause death from its star at once, and Venus must be at least one of the most powerful star systems in the entire galaxy (more details here). Earth was never seen as an "open" planet.

Earlier this month, OpenAI released a new text generation model, called GPT-2. GPT-2 stands for “Generative Pre-Training 2”: generative, because we are generating text; pre-training, because instead of training the model for any one specific task, we’re using unsupervised “pre-training” such that the general model can perform on a variety of tasks; and 2, because it’s the second model using this approach, following the first GPT model.

TLDR: The model is pretty good at generating fiction and fantasy, but it’s bad at math and at telling jokes. Skip to the end for my favorite excerpts.

Model Overview

The GPT-2 model uses conditional probability language modeling with a Transformer neural network architecture that relies on self-attention mechanisms (inspired by attention mechanisms from image processing tasks) in lieu of recurrence or convolution. (Side note: interesting to see how advancements in neural networks for image and language processing co-evolve.)

The model is trained on about 8 million documents, or about 40 GB of text, from web pages. The dataset, scraped for this model, is called WebText, and is the result of scraping outbound links from Reddit with at least 3 karma. (Some thoughts on this later. See section on “Training Data”)

In the original GPT model, the unsupervised pre-training was used as an initial step, followed by a supervised fine-tuning step for various tasks, such as question answering. GPT-2, however, is assessed using only the pre-training step, without the supervised fine-tuning. In other words, the model performs well in a zero shot setting.

First Impressions

When I first saw the blog post, I was both very impressed and also highly skeptical of the results.

Generated Final Fantasy MIDI files, visualized

Generating Jazz Music with an LSTM Recurrent Neural Network

February 25, 2019

Last week was my first week at the Recurse Center! I’m having so much fun lol. While here, I’m exploring creative applications of deep learning.

As my starter project, I wanted to generate jazz music using a neural network. LSTM stands for Long Short-Term Memory, and is a type of recurrent neural network that is capable of processing sequences. You can think of this as having short-term memory capable of learning long-term dependencies.

Using this tutorial as a starting point, I trained an LSTM model on two datasets: Final Fantasy music (conveniently provided from the tutorial, which let me focus on the model building over finding data), and Herbie Hancock jazz music (my original goal!).

Here are the results:

Final Fantasy

For this composition, I generated a bunch of MIDI files using the model, picked 3 I liked, set them to different instruments, and composited them into one piece.

Herbie Hancock

For these, I didn’t compose or edit the songs by much. After generating a few MIDI files, I picked some I liked, and set each one individually to an instrument I thought sounded nice.

My favorite parts are 0:22-0:45 in the Wurlitzer piece, and the first 5 seconds of the Vibraphone piece.

Honestly I’m quite happy with the results! This was my first time working with music data. I also had access to the Recurse Center’s GPU cluster, which made this project possible. I’ve pushed my code up to this Github repo for reference.

Project components:

Learning about the MIDI file format and how to encode it using the Python Music21 library
Finding MIDI files for my training data
Getting set up on the GPU cluster (and using screen so I don’t disconnect and interrupt my training session when I leave for the day!)
Training the LSTM model using Keras, saving the weights as I go. Learned from a friend: if you have access to a GPU, you’ll want to use CuDNNLSTM rather than LSTM layers, to save on training times! Generating doesn’t take that long but it would improve on generating times as well.
Generating music using the LSTM model (same architecture, load up the most recent weights file). Using the first 100 notes, predict the next note. Shift the window for the input sequence by one note, repeat. Stop whenever you feel your song is long enough lol. The songs in this post are about 250 notes each.
Opening up the MIDI files in Garage Band so I can play it with various fun instruments and sounds :-)

There are a number of possible extensions from here. For example, right now the rhythm is pretty straightforward, as notes are set to offset from the last note by 0.5 seconds. A possible extension is to encode the rhythms of the training data (doable since MIDI file formats are essentially note+time offset). Another extension is to add some music rules, e.g. counterpoint, harmony, consonance, etc. which I would have to research more about. It would also be really interesting to train a network on multiple instrumental parts, such as an orchestral score, where different instruments would have musical relationships or dependencies with one another.

Sheet Music

For fun, I used an online converter to generate sheet music for piano from the output MIDI files :-)

Here’s the first song (Wurlitzer Electric):

Here’s the second song (Vibraphone):

Predicting Readmission Risk after Orthopedic Surgery

May 23, 2018

My colleagues and I from the Clinical Research Informatics Core at Penn Medicine gave poster presentations at the Public Health session of the Symposium on Data Science and Statistics last week.

Here's the abstract:

Our project examined hospital readmissions after knee and hip replacement surgeries that took place within the University of Pennsylvania health system. We used a variety of information available within patient electronic health records and an assortment of machine learning tools to predict the risk of readmission for any given patient at the time of discharge after a primary joint replacement surgery. We faced challenges related to missing data. We used a number of different machine learning models such as logistic regression, random forest and gradient boosted trees. We also used an automated machine learning pipeline tool, TPOT, that uses a genetic algorithm to search through the machine learning model/parameter space to automatically suggest successful machine learning pipelines. We trained multiple models that predicted readmissions better than the existing clinical methods, with statistically significant increases in AUC over the clinical baseline. Finally our models suggested a number of features useful for readmission prediction that are not used at all in the existing clinician model. We hope our new models can be used in practice to help target patients at high risk of readmission after joint replacement surgery, and to help inform which interventions may be most useful.

Machine Learning for Healthcare

May 3, 2018

Yesterday I gave a dev talk at Philly Tech Week on machine learning for healthcare, slides embedded below.

Description: "How are machine learning and data science being adopted in healthcare? From diagnostics, risk predictions, and more, this session will provide an overview of machine learning applications using electronic health records, walk through the process of how a model might be trained and used, and discuss methods for improving interpretability to augment medical decision-making."

Here's a link to the talk slides with notes included.

I think the talk went pretty well. In fact, I think I am actually a pretty good speaker, although I'm not sure how much I get out of speaking personally. The talk was pretty well attended, and I received a lot of positive feedback, so hopefully I inspired some people in healthcare or machine learning in some way or another.

Music and Mood: Assessing the Predictive Value of Audio Features on Lyrical Sentiment

January 3, 2018

aka - what's the relationship between the audio features of a song and how positive or negative its lyrics are?

aka - data analysis of my spotify music data + sentiment analysis + supervised machine learning

aka - my senior thesis

the full jupyter notebook used to conduct this data analysis can be found on my github here: Spotify Data Analysis

(pg. 32 and onward is just the full python jupyter notebook in the appendix.)

Algorithmic Bias

June 8, 2016

I recently wrote a final paper for my Digital Culture course, titled "Algorithmic Bias and the Myth of Big Data Neutrality" - a really interesting and really important topic to consider in moving forward in our increasingly technological society.

Computational Creativity

May 26, 2016

I gave a presentation this week about some applications of artificial neural networks in computational creativity. It consists of an overview and discussion of 3 different papers:

Here are the slides: