Localhost Talk: creative applications of deep learning, aka, neural networks for fun and not profit :-)

Earlier this week I gave a talk at Localhost, the Recurse Center’s public-facing technical speaker series. Slides embedded below. Here’s also a link to the talk slides if you want to see my notes included.

The talk covers some of the creative deep learning projects I’ve worked on while at RC:

Overall I received a lot of enthusiastic positive feedback and felt pretty good about how it went! I do feel somewhat proud of all of the fun projects I was able to explore while at RC, and it feels nice to be able to share that with others.

Implementing char-RNN from Scratch in PyTorch, and Generating Fake Book Titles

This week, I implemented a character-level recurrent neural network (or char-rnn for short) in PyTorch, and used it to generate fake book titles. The code, training data, and pre-trained models can be found on my GitHub repo.

 
Heart in the Dark
Me the Bean
Be the Life
Yours
 

Model Overview

 
Diagram of the char-rnn network architecture. Source.

Diagram of the char-rnn network architecture. Source.

 

The char-rnn language model is a recurrent neural network that makes predictions on the character level. In contrast, many language models operate on the word level.

Making character-level predictions can be a bit more chaotic, but might be better for making up fake words (e.g. Harry Potter spells, band names, fake slang, fake cities, fantasy terms, etc.). Word-level language models might have an advantage for generating longer pieces of text, like summaries or fiction, as they don’t need to figure out how to spell, in a sense.

There do exist character-word hybrid approaches. For example, the GPT-2 model uses byte pair encoding, an approach that interpolates between the word-level for common sequences and the character-level for rare sequences.

This particular char-rnn implementation is set up to handle multiple categories of text. In this use case, it is able to make predictions for different book genres, e.g. Romance, Fantasy, Young Adult, etc.

Training Data

The training data used for this model is a modified version of a Goodreads data scrape of 20K book titles. I transformed the CSV file into separate text files for the top 30 genres. The resulting split dataset can be found in my Github repo.

GPU training time with this model took about 20 minutes on an NVIDIA GeForce GTX 1080 Ti. Generating samples only takes a few seconds.

Results

The following results are a selected sampling of outputs. Note that I’m mainly including examples that consist of real words, with a few exceptions.

Romance

Heart in the Dark
Years of the Dark
You the Book
The Stove to the Story

Fantasy

Growing the Dark
Book of the Dark
Red Sande

Fiction

In the Bead Store
Jen the Bead
King the Bean

Historical

A to the Bean
Other and Story

Science Fiction

Darke Sers
Voringe
In the Beantire

Mystery

Bed Singe
Kiss of the Dark
Red Story

Classics

A Mander of the Suckers
Gorden the Story of Merica

Childrens

Dark Book of the Story of the Sures of the Surating
Late
Story of the Bean

Paranormal

A Store of the Store
Red Store
Stariss and Storiss
Wind Store

New Adult

Live Me Life
Growing Me
In the Bean
Me the Bean

Poetry

Yours
Me

Erotica

Volle the Story of Men
King of the Dark
Dork of the Dark
Work of the Dark
Bed Storys of the Dark
Your Mind

Biography

Be the Life
On Anger and Of Mand Anger

Comically, there are many book titles that revolve around beans, beads, stores, and darkness. While I did notice some subtle differences between genres, it doesn’t appear to be particularly drastic overall.

Dogspotting: Using Machine Learning to Draw Bounding Boxes around Dogs in Pictures

 
Dog in shark costume

Dog in shark costume

 

I wanted to try out a computer vision project, and what better way to do that than to point out where dogs are in photos??

Project Overview

I’ve included a Github repo and Jupyter notebook for this project.

This project uses the ImageAI computer vision library for Python, which offers support for RetinaNet, YOLOv3, and TinyYOLOv3 algorithms for object detection. The model used is a RetinaNet model pretrained on the ImageNet-1000 dataset, also provided by ImageAI.

Official guide and documentation for ImageAI detection classes are provided as well.

Overall Impressions

I was pleasantly surprised at how easily out-of-the-box object detection has become. The ImageAI library supports custom object detection for the following categories:

 

person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, fire hydrant, stop_sign, parking meter, bench, bird, cat, dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack, umbrella, handbag, tie, suitcase, frisbee, skis, snowboard, sports ball, kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket, bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange, broccoli, carrot, hot dog, pizza, donot, cake, chair, couch, potted plant, bed, dining table, toilet, tv, laptop, mouse, remote, keyboard, cell phone, microwave, oven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy bear, hair dryer, toothbrush.

 

This made it very easy to detect dogs specifically! All I had to do is set up my project, download the pretrained model, and set a few parameters and filepaths. The entire project only took about 20 minutes from setup to output image.

Some parameters of interest:

 

custom_objects = detector.CustomObjects(dog=True, cat=True)

 

Any of the object categories can be included here. We are not just limited to dogs, and we can include as many categories as we want -- or potentially all of them -- in the same detector.

 

detections = detector.detectCustomObjectsFromImage(input_image=input_path, output_image_path=output_path, custom_objects=custom_objects, minimum_percentage_probability=45)

 

minimum_percentage_probability refers to how confident the model should be before drawing a bounding box. We can set it to a low percentage, e.g. 15%, if we want it to flag everything it sees. We can set it to a high percentage, e.g. 85%, if we only want it to flag objects when the model is confident about what it’s detecting.

Results

Overall, fairly decent results, especially considering the short amount of time it took to set up. The boxes are bounding boxes for the dog (or animal, or bird) detected, with the object name and prediction probability.

Text Generation with GPT-2, OpenAI's Recently Released Language Model

Venus, planet of love Was destroyed by global warming, while the other suns have been ravaged by the tides of time. There are no suns and there are no tides, except the sun itself. A few suns exist now in Mars (the moon is now in the orbit of Sol and Jupiter), but they are gone in the future. (It is implied that we can go back home to Earth) Jupiter is the only other planet that is not the source; that planet would be the nearest known red planet to us. So is Earth.

The Earth's only visible source of energy is the sun itself. (In Greek it means "sun" or "heaven.") According to the Old Testament story, Jupiter was so cold that it was able to cause the death of children when they died in a ship. The only real star in the solar system that is capable of causing death is the sun, which must be one of the most powerful stars in the universe. Only the moon can cause death from its star at once, and Venus must be at least one of the most powerful star systems in the entire galaxy (more details here). Earth was never seen as an "open" planet.

Earlier this month, OpenAI released a new text generation model, called GPT-2. GPT-2 stands for “Generative Pre-Training 2”: generative, because we are generating text; pre-training, because instead of training the model for any one specific task, we’re using unsupervised “pre-training” such that the general model can perform on a variety of tasks; and 2, because it’s the second model using this approach, following the first GPT model.

TLDR: The model is pretty good at generating fiction and fantasy, but it’s bad at math and at telling jokes. Skip to the end for my favorite excerpts.

Model Overview

The GPT-2 model uses conditional probability language modeling with a Transformer neural network architecture that relies on self-attention mechanisms (inspired by attention mechanisms from image processing tasks) in lieu of recurrence or convolution. (Side note: interesting to see how advancements in neural networks for image and language processing co-evolve.)

The model is trained on about 8 million documents, or about 40 GB of text, from web pages. The dataset, scraped for this model, is called WebText, and is the result of scraping outbound links from Reddit with at least 3 karma. (Some thoughts on this later. See section on “Training Data”)

In the original GPT model, the unsupervised pre-training was used as an initial step, followed by a supervised fine-tuning step for various tasks, such as question answering. GPT-2, however, is assessed using only the pre-training step, without the supervised fine-tuning. In other words, the model performs well in a zero shot setting.

First Impressions

When I first saw the blog post, I was both very impressed and also highly skeptical of the results.


Read More

Predicting Readmission Risk after Orthopedic Surgery

My colleagues and I from the Clinical Research Informatics Core at Penn Medicine gave poster presentations at the Public Health session of the Symposium on Data Science and Statistics last week.

Here's the abstract:

Our project examined hospital readmissions after knee and hip replacement surgeries that took place within the University of Pennsylvania health system. We used a variety of information available within patient electronic health records and an assortment of machine learning tools to predict the risk of readmission for any given patient at the time of discharge after a primary joint replacement surgery. We faced challenges related to missing data. We used a number of different machine learning models such as logistic regression, random forest and gradient boosted trees. We also used an automated machine learning pipeline tool, TPOT, that uses a genetic algorithm to search through the machine learning model/parameter space to automatically suggest successful machine learning pipelines. We trained multiple models that predicted readmissions better than the existing clinical methods, with statistically significant increases in AUC over the clinical baseline. Finally our models suggested a number of features useful for readmission prediction that are not used at all in the existing clinician model. We hope our new models can be used in practice to help target patients at high risk of readmission after joint replacement surgery, and to help inform which interventions may be most useful.

 
SDSS Poster Presentation
 

Music and Mood: Assessing the Predictive Value of Audio Features on Lyrical Sentiment

 

aka - what's the relationship between the audio features of a song and how positive or negative its lyrics are? 

aka - data analysis of my spotify music data + sentiment analysis + supervised machine learning

aka - my senior thesis

the full jupyter notebook used to conduct this data analysis can be found on my github here: Spotify Data Analysis

(pg. 32 and onward is just the full python jupyter notebook in the appendix.)

Computational Creativity

I gave a presentation this week about some applications of artificial neural networks in computational creativity. It consists of an overview and discussion of 3 different papers:

  1. A Computational Model of Poetic Creativity with Neural Network as Measure of Adaptive Fitness

  2. A Neural Algorithm of Artistic Style

  3. What Happens Next? Event Prediction Using a Compositional Neural Network Model (part of the What-If Machine project)


Here are the slides: