
About This Project
GoodReads Scraper is a robust Python based data collection tool built to extract structured book metadata from Goodreads. Originally developed to seed the database for my ReadUniverse project, it automates the process of gathering large volumes of book data, including titles, authors, descriptions, ISBNs, publication dates, categories, ratings, and reviews. The scraper includes configurable options for URL batching, rate limiting, and multiple output format including JSON and CSV, ensuring efficient, scalable, and respectful data collection. This project showcases my ability to build practical tools that support real world development needs and streamline the process of populating complex databases with high quality dummy data.
Features
Gather book URLs from specific Goodreads shelves, search results, or lists with configurable limits and flexible export options.
Scrape detailed book data including titles, authors, descriptions, ISBNs, publication dates, genres, ratings, and reviews.
Built-in configurable delays between requests to respect Goodreads’ servers and minimize the risk of IP blocking.
Export results in JSON or CSV formats for seamless integration with databases, analytics pipelines, or development seeding.
Automatically generate additional dummy fields such as price, stock, and likes to support testing and database seeding.
Clean separation between URL collection and data extraction modules for easy customization and maintenance.
Tech Stack
