diff --git a/code/step_1_data_collection_Luis.py b/code/step_1_data_collection_Luis.py index 5945465..eb607da 100644 --- a/code/step_1_data_collection_Luis.py +++ b/code/step_1_data_collection_Luis.py @@ -1,3 +1,27 @@ +""" +Code created by: lrmercadod +Date: 5/6/2024 10:43:45 +PubMed Record Fetcher and Saver + +This script is designed to automate the retrieval of PubMed records based on a specific topic and year. It uses the NCBI E-utilities API to fetch data in MEDLINE format and saves each year's data in a separate text file within a structured directory. + +Features: +- Fetches PubMed records using a combination of the topic and year to form a query. +- Retrieves data in MEDLINE format, which includes structured bibliographic information. +- Saves the fetched data into text files, organizing them by topic and year under the './results/baseline_doc' directory. +- Handles network and API request errors by implementing retry logic with exponential backoff. + +Usage: +- The user must provide an NCBI API key and email for using NCBI's E-utilities. +- Modify the 'topic' variable and the year range in the script to fetch records for different topics or years. + +Dependencies: +- BioPython for interacting with NCBI's E-utilities. +- requests for making HTTP requests. + +Example: +To use the script, simply run it in a Python environment with the necessary dependencies installed. Ensure that the API key and email are correctly set up in the script. +""" import requests from Bio import Entrez from io import StringIO