joke2punchline, punchline2joke: Using a Seq2Seq Neural Network to "Translate" Between Jokes and Punchlines

> what do you call an unpredictable chef ?
< ouch .

After implementing the seq2seq model, an encoder-decoder network with attention, I wanted to get it to translate between jokes and punchlines. The scripts, pre-trained models, and training data can be found on my GitHub repo.

Model Overview

The underlying model is a PyTorch implementation of the Sequence to Sequence model network, an encoder-decoder network with an attention mechanism. Seq2seq can translate any arbitrary text sequence to any arbitrary text sequence. A more useful application, for example, would be translating English to French or vice versa. For this project, I trained the seq2seq model on question-answer format jokes, so that it can output a punchline given a joke, or output a joke given a punchline.


Overall, the results were somewhat nonsensical, as one might expect. These results are curated by me based on whether or not they made me, at minimum, smile. Yes, I do laugh at my own jokes.

For the following examples, > represents the text input, < represents the model output. I’ve selected examples where the joke or punchline is not directly from the training set, i.e. excluding any output that is simply being regurgitated from the original dataset.


For the following examples, the first line is a fake joke I wrote up using words within the model’s joke vocabulary and fed into the model (>), and the second line is the punchline outputted by the model (<).

> what do you call an unpredictable chef ?
< ouch .

> what do you call a pile of pillowcases ?
< screw music

> why was the sun hospitalized ?
< because he was sitting on me .

> what do you call an unhappy star ?
< stay here !

> what do you call an unhappy star ?
< days numbered !

> what is a strawberry s favorite month ?
< a cat !

> who s there ?
< in the dictionary .

> what is red and bad for your teeth ?
< a a gummy bear

> what treasure can you find on a thursday ?
< the lettuce !

> when is a sun a moon ?
< a barber driver

> how many bananas can the moon eat ?
< fish and the train .

> what do you call an upside down egg ?
< the dough

> why was the sun unhappy ?
< because he wanted to tearable time paste !

> what did the skeleton say when they died the wrong year ?
< it march

> how many snails does it take to get to the moon ?
< to the hot hot shakespeare !

> why was the moon crying ?
< because he was on the deck !

> where do sheep go to school ?
< they take the mile bison of course !

> how many emotions does the sun have ?
< he got cents


For the following examples, I fed the model fake punchlines, written using words within the model’s punchline vocabulary, and the model outputted a joke that would result in the input punchline. The first line is the fake punchline I fed into the model (>), and the second line is the joke outputted by the model (<).

> two parents
< what has four wheels and flies over the world ?

> watermelon concentrate
< when do you stop at green and go at the orange juice factory ?

> cool space
< what do you call an alligator in a vest with a scoop of ice cream ?

> meteor milk
< what do you call a cow that is crossing ?

> one two three four
< what did the buffalo say to the bartender ?

> jalapeno ketchup
< what do you call a boy with no socks on ?

> ice cream salad !
< what did the fish say to the younger chimney ?

> the impossible !
< what did the worker say when he swam into the wall ?

> both !
< what do you call a ghosts mom and dad ?

> pasta party
< what do you call the sound a dog makes ?

> salad party
< what did the buffalo say to the patella ?

> dreams party
< what do you call the sound with a fever ?

> a thesaurus and a dictionary
< what kind of shorts do all spies wear ?



Training Data

To train the model, I needed a dataset of clean jokes in question-answer text format.

While I did find a dataset of question-answer format jokes, the jokes are scraped from Reddit’s r/jokes subreddit. Going through the file, I did not like most of the jokes at all, as most of them were highly problematic. They were often racist, sexist, queerphobic, etc., and I would rather compile my own than to feed bad data into my model.

One option would be to filter this dataset using a set of “bad” keywords, but trying to filter a heavily biased dataset was less appealing to me than to create a new set entirely. An alternative could be to write a scraper for r/cleanjokes, filtering in only question-answer format jokes, but I didn’t want to invest too much time/energy on this toy project, and I personally am not a fan of using Reddit for training data in general.

I ended up compiling my own small dataset of clean jokes in the question-answer format, consisting of a little over 500 jokes total. A major trade-off was that the model’s vocabulary is relatively limited, but I enjoyed the jokes much more and felt much better about the data I was feeding into the model.

Teacher Forcing

For the joke2punchline and punchline2joke models, the teacher forcing ratio was set to 0.5. I’d be curious to adjust this parameter and see the results. I would expect a lower ratio to result in more nonsensical output, whereas a higher ratio would likely result in more outputs that are directly from the training set.

I think an ideal setup would be to lower the teacher forcing ratio in addition to having a much larger training set.

Possible Extensions

I do think it would be fun to generate jokes and punchlines using an RNN or LSTM before feeding it into these models, such that there is less human intervention (i.e. writing fake jokes/punchlines manually).

I also think the model would be way more fun to play with if it I could train it with a much larger dataset, i.e. 10K+ jokes.