Luis #2

lrm22005 · 2024-05-06T16:31:36Z

Add functionality to fetch and save PubMed records by year

This commit introduces significant enhancements to the PubMedDownloader class, enabling the automated fetching of PubMed records by topic and year using the NCBI E-utilities API. Each set of records fetched for a specific year is now saved in a dedicated text file formatted in MEDLINE style, facilitating easier access and organization of the data.

Key Changes:

Added file saving functionality that organizes records into files named by topic and year.
Implemented error handling with retry logic and exponential backoff to manage network and API errors more robustly.
Configured the fetch function to retrieve records in MEDLINE format, ensuring that the data is structured according to PubMed's bibliographic standards.

The records are stored in the './results/baseline_doc' directory, with each file representing a specific year's data on the chosen topic. This update is crucial for researchers needing structured and easily accessible bibliographic information from PubMed.

This is my implementation that is downloading and creating the files with all the information from a specific year. This method is a way to work.

Enhance PubMed Data Fetching and Saving Mechanism This commit introduces several enhancements to the PubMedDownloader class, improving its functionality and usability: 1. **Dynamic Year Querying**: Added support for dynamic querying by year. This allows users to specify a range of years for which the PubMed records should be fetched. 2. **Structured Data Saving**: Implemented functionality to save the fetched PubMed records in MEDLINE format. Each year's data is saved in a separate text file, named according to the query and the year, facilitating easier data management and retrieval. 3. **Error Handling**: Enhanced error handling capabilities to manage network issues and API limitations more robustly. This includes retry mechanisms with exponential backoff and timeout settings to prevent hanging requests. 4. **Directory Management**: Automated directory creation for storing the output files, ensuring that the user does not need to manually create directories before running the script. These enhancements make the script more robust and user-friendly, suitable for handling large-scale data retrieval tasks in biomedical research environments.

This commit introduces significant enhancements to the PubMedDownloader class, enabling the automated fetching of PubMed records by topic and year using the NCBI E-utilities API. Each set of records fetched for a specific year is now saved in a dedicated text file formatted in MEDLINE style, facilitating easier access and organization of the data. Key Changes: - Added file saving functionality that organizes records into files named by topic and year. - Implemented error handling with retry logic and exponential backoff to manage network and API errors more robustly. - Configured the fetch function to retrieve records in MEDLINE format, ensuring that the data is structured according to PubMed's bibliographic standards. The records are stored in the './results/baseline_doc' directory, with each file representing a specific year's data on the chosen topic. This update is crucial for researchers needing structured and easily accessible bibliographic information from PubMed.

lrm22005 added 4 commits May 6, 2024 12:12

Found a method to fetch

afd4e22

This is my implementation that is downloading and creating the files with all the information from a specific year. This method is a way to work.

Delete pubmed_data.txt

77d0ba9

lrm22005 merged commit 1655990 into main May 6, 2024

Luis #2

Luis #2

Conversation

lrm22005 commented May 6, 2024