New Yorker Short Story Recommender

Abstract

The goal of this project is to build a recommendation system for New Yorker short stories. This recommender might be of use to any New Yorker fiction reader who wants to find similar stories based on topic, writing style, or author.

Design and Data

The data was scraped from the fiction section of The New Yorker website and includes 944 short stories by 333 different authors from 2001 to 2022. The word counts for the stories range from 593 to 16,122 with an average word count of 5,717 words.

The initial data included the story’s url, author, title, date of publication and text.

Algorithms

Preprocessing

Topic Modeling

Clustering

Recommender

Given a short story, the recommender can return a list of short story recommendations based on the cosine similarity of topics, style group, and author. Users can also adjust the weights of these criteria based on their own preferences.

Tools

Notebooks:

  1. Web Scraping
  2. Data Cleaning
  3. Preprocessing with spaCy
  4. Topic Modeling
  5. Clustering
  6. Recommendation System

Results

Clustering to find writing style of each short story (using Alice Munro stories): newplot

Recommender in action–recommending 5 short stories similar to “Unread Messages” by Sally Rooney: Screen Shot 2022-02-26 at 8 19 33 PM

See GitHub for the code notebooks and a .pdf file of the Google Slides presentation about the project!