From fe9178672376327c88dd075da21ce030b3a598a2 Mon Sep 17 00:00:00 2001 From: Luis Roberto Mercado Diaz Date: Mon, 6 May 2024 12:31:19 -0400 Subject: [PATCH] Add functionality to fetch and save PubMed records by year This commit introduces significant enhancements to the PubMedDownloader class, enabling the automated fetching of PubMed records by topic and year using the NCBI E-utilities API. Each set of records fetched for a specific year is now saved in a dedicated text file formatted in MEDLINE style, facilitating easier access and organization of the data. Key Changes: - Added file saving functionality that organizes records into files named by topic and year. - Implemented error handling with retry logic and exponential backoff to manage network and API errors more robustly. - Configured the fetch function to retrieve records in MEDLINE format, ensuring that the data is structured according to PubMed's bibliographic standards. The records are stored in the './results/baseline_doc' directory, with each file representing a specific year's data on the chosen topic. This update is crucial for researchers needing structured and easily accessible bibliographic information from PubMed. --- code/step_1_data_collection_Luis.py | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/code/step_1_data_collection_Luis.py b/code/step_1_data_collection_Luis.py index 5945465..eb607da 100644 --- a/code/step_1_data_collection_Luis.py +++ b/code/step_1_data_collection_Luis.py @@ -1,3 +1,27 @@ +""" +Code created by: lrmercadod +Date: 5/6/2024 10:43:45 +PubMed Record Fetcher and Saver + +This script is designed to automate the retrieval of PubMed records based on a specific topic and year. It uses the NCBI E-utilities API to fetch data in MEDLINE format and saves each year's data in a separate text file within a structured directory. + +Features: +- Fetches PubMed records using a combination of the topic and year to form a query. +- Retrieves data in MEDLINE format, which includes structured bibliographic information. +- Saves the fetched data into text files, organizing them by topic and year under the './results/baseline_doc' directory. +- Handles network and API request errors by implementing retry logic with exponential backoff. + +Usage: +- The user must provide an NCBI API key and email for using NCBI's E-utilities. +- Modify the 'topic' variable and the year range in the script to fetch records for different topics or years. + +Dependencies: +- BioPython for interacting with NCBI's E-utilities. +- requests for making HTTP requests. + +Example: +To use the script, simply run it in a Python environment with the necessary dependencies installed. Ensure that the API key and email are correctly set up in the script. +""" import requests from Bio import Entrez from io import StringIO