Dogspotting: Using Machine Learning to Draw Bounding Boxes around Dogs in Pictures

Dog in shark costume

Dog in shark costume


I wanted to try out a computer vision project, and what better way to do that than to point out where dogs are in photos??

Project Overview

I’ve included a Github repo and Jupyter notebook for this project.

This project uses the ImageAI computer vision library for Python, which offers support for RetinaNet, YOLOv3, and TinyYOLOv3 algorithms for object detection. The model used is a RetinaNet model pretrained on the ImageNet-1000 dataset, also provided by ImageAI.

Official guide and documentation for ImageAI detection classes are provided as well.

Overall Impressions

I was pleasantly surprised at how easily out-of-the-box object detection has become. The ImageAI library supports custom object detection for the following categories:


person, bicycle, car, motorcycle, airplane, bus, train, truck, boat, traffic light, fire hydrant, stop_sign, parking meter, bench, bird, cat, dog, horse, sheep, cow, elephant, bear, zebra, giraffe, backpack, umbrella, handbag, tie, suitcase, frisbee, skis, snowboard, sports ball, kite, baseball bat, baseball glove, skateboard, surfboard, tennis racket, bottle, wine glass, cup, fork, knife, spoon, bowl, banana, apple, sandwich, orange, broccoli, carrot, hot dog, pizza, donot, cake, chair, couch, potted plant, bed, dining table, toilet, tv, laptop, mouse, remote, keyboard, cell phone, microwave, oven, toaster, sink, refrigerator, book, clock, vase, scissors, teddy bear, hair dryer, toothbrush.


This made it very easy to detect dogs specifically! All I had to do is set up my project, download the pretrained model, and set a few parameters and filepaths. The entire project only took about 20 minutes from setup to output image.

Some parameters of interest:


custom_objects = detector.CustomObjects(dog=True, cat=True)


Any of the object categories can be included here. We are not just limited to dogs, and we can include as many categories as we want -- or potentially all of them -- in the same detector.


detections = detector.detectCustomObjectsFromImage(input_image=input_path, output_image_path=output_path, custom_objects=custom_objects, minimum_percentage_probability=45)


minimum_percentage_probability refers to how confident the model should be before drawing a bounding box. We can set it to a low percentage, e.g. 15%, if we want it to flag everything it sees. We can set it to a high percentage, e.g. 85%, if we only want it to flag objects when the model is confident about what it’s detecting.


Overall, fairly decent results, especially considering the short amount of time it took to set up. The boxes are bounding boxes for the dog (or animal, or bird) detected, with the object name and prediction probability.

Text Generation with GPT-2, OpenAI's Recently Released Language Model

Earlier this month, OpenAI released a new text generation model, called GPT-2. GPT-2 stands for “Generative Pre-Training 2”: generative, because we are generating text; pre-training, because instead of training the model for any one specific task, we’re using unsupervised “pre-training” such that the general model can perform on a variety of tasks; and 2, because it’s the second model using this approach, following the first GPT model.

TLDR: The model is pretty good at generating fiction and fantasy, but it’s bad at math and at telling jokes. Skip to the end for my favorite excerpts.

Model Overview

The GPT-2 model uses conditional probability language modeling with a Transformer neural network architecture that relies on self-attention mechanisms (inspired by attention mechanisms from image processing tasks) in lieu of recurrence or convolution. (Side note: interesting to see how advancements in neural networks for image and language processing co-evolve.)

The model is trained on about 8 million documents, or about 40 GB of text, from web pages. The dataset, scraped for this model, is called WebText, and is the result of scraping outbound links from Reddit with at least 3 karma. (Some thoughts on this later. See section on “Training Data”)

In the original GPT model, the unsupervised pre-training was used as an initial step, followed by a supervised fine-tuning step for various tasks, such as question answering. GPT-2, however, is assessed using only the pre-training step, without the supervised fine-tuning. In other words, the model performs well in a zero shot setting.

First Impressions

When I first saw the blog post, I was both very impressed and also highly skeptical of the results.

Read More

Black Patients Miss Out On Promising Cancer Drugs

Wrapped up my summer fellowship at ProPublica last week when our investigative piece was published! Give it a read here:

Black Patients Miss Out On Promising Cancer Drugs

A ProPublica analysis found that black people and Native Americans are under-represented in clinical trials of new drugs, even when the treatment is aimed at a type of cancer that disproportionately affects them.

The accompanying data methodology is here: How We Compared Clinical Trial and Cancer Incidence Data

This story was co-published with STAT and can also be found on Mother Jones.


For this story, I pitched the idea and did a ton of research, data analysis, reporting, interviews, all the data visualization— a huge thank you to my wonderful co-author Caroline Chen and amazing editor Sisi Wei!

The story was on the front page the day it published and seemed to be received well. I’ve learned so much from this fellowship and have been super grateful for this opportunity from ProPublica and the Google News Lab.


Update—Statement of impact since our story was published:

Our story was featured on Information is the Best Medicine, a black-owned talk radio station in Pennsylvania, as well as Axios, Vice, Mother Jones and The Atlantic’s People v. Cancer forum. It was reprinted in the Boston Globe and Indianz, a Native American publication. Nonprofit BIO Ventures for Global Health also wrote an op-ed in response to our story, noting that “clinical trials are perpetuating existing health care disparities across the globe.”

In the course of interviewing these patients, we realized that many people don’t understand how trials work, which prompted us to create the Cancer Patient’s Guide to Clinical Trials. The guide has been shared by the Leukemia and Lymphoma Society.

Predicting Readmission Risk after Orthopedic Surgery

My colleagues and I from the Clinical Research Informatics Core at Penn Medicine gave poster presentations at the Public Health session of the Symposium on Data Science and Statistics last week.

Here's the abstract:

Our project examined hospital readmissions after knee and hip replacement surgeries that took place within the University of Pennsylvania health system. We used a variety of information available within patient electronic health records and an assortment of machine learning tools to predict the risk of readmission for any given patient at the time of discharge after a primary joint replacement surgery. We faced challenges related to missing data. We used a number of different machine learning models such as logistic regression, random forest and gradient boosted trees. We also used an automated machine learning pipeline tool, TPOT, that uses a genetic algorithm to search through the machine learning model/parameter space to automatically suggest successful machine learning pipelines. We trained multiple models that predicted readmissions better than the existing clinical methods, with statistically significant increases in AUC over the clinical baseline. Finally our models suggested a number of features useful for readmission prediction that are not used at all in the existing clinician model. We hope our new models can be used in practice to help target patients at high risk of readmission after joint replacement surgery, and to help inform which interventions may be most useful.

SDSS Poster Presentation

Machine Learning for Healthcare

Yesterday I gave a dev talk at Philly Tech Week on machine learning for healthcare, slides embedded below.

Description: "How are machine learning and data science being adopted in healthcare? From diagnostics, risk predictions, and more, this session will provide an overview of machine learning applications using electronic health records, walk through the process of how a model might be trained and used, and discuss methods for improving interpretability to augment medical decision-making."

Here's a link to the slides of you want to see my notes.

I think the talk went pretty well. In fact, I think I am actually a pretty good speaker, although I'm not sure how much I get out of speaking personally. The talk was pretty well attended, and I did receive a lot of positive feedback, so hopefully I inspired some people in healthcare or machine learning in some way or another.

Music and Mood: Assessing the Predictive Value of Audio Features on Lyrical Sentiment


aka - what's the relationship between the audio features of a song and how positive or negative its lyrics are? 

aka - data analysis of my spotify music data + sentiment analysis + supervised machine learning

aka - my senior thesis

the full jupyter notebook used to conduct this data analysis can be found on my github here: Spotify Data Analysis

(pg. 32 and onward is just the full python jupyter notebook in the appendix.)


Algorithmic Bias

I recently wrote a final paper for my Digital Culture course, titled "Algorithmic Bias and the Myth of Big Data Neutrality" - a really interesting and really important topic to consider in moving forward in our increasingly technological society.

Computational Creativity

I gave a presentation this week about some applications of artificial neural networks in computational creativity. It consists of an overview and discussion of 3 different papers:

  1. A Computational Model of Poetic Creativity with Neural Network as Measure of Adaptive Fitness

  2. A Neural Algorithm of Artistic Style

  3. What Happens Next? Event Prediction Using a Compositional Neural Network Model (part of the What-If Machine project)

Here are the slides:

penn play promotional profile pictures

for short, ppppp

adobe photoshop, lightroom  

fnar 247: environmental animation master post

I'm currently taking FNAR247: Environmental Animation. I am really enjoying the course so far so I figured I'd create a post documenting my progress. We have 1-week mini projects that are relatively open-ended but are based loosely on some theme. These are mainly modeled and scripted in 3DS Max using Maxscript, with post-processing work done in Adobe After Effects.

Here are gif-ified (aka a bit too compressed for my liking but also gifs are cool) versions of a few of the animations:


adobe photoshop, digital painting.


nice to get back in the art groove for the summer.


rotary telephone

maya 2014, rendered w mental ray

ambient occlusion pass

ambient occlusion pass

made in 3d computer modeling w scott white

it was our first project, but i went back the past few days and tightened it up, added the cord, played around with lighting. it's crazy to see how much i've improved over the semester!

definitely taught me a lot of patience. i'm pretty sure i've attempted to make every single piece on this thing at least twice. i have at least 8 copies of the body, and i know i made the rotary face at least just as many times. i'd like to think all of the hard work paid off, though!

next project: maya dartboard...

project dump

some misc work from the year

morton salt girl - work in progress 3d modeling, sp '14 made in maya 2014 and mudbox

morton salt girl - work in progress
3d modeling, sp '14
made in maya 2014 and mudbox

art design and digital culture, sp '14 made in blender and after effects

art design and digital culture, sp '14
made in blender and after effects

figure drawing, fall '13

figure drawing, fall '13