Skip to content

Luis #2

Merged
merged 4 commits into from
May 6, 2024
Merged

Luis #2

merged 4 commits into from
May 6, 2024

Conversation

lrm22005
Copy link
Owner

@lrm22005 lrm22005 commented May 6, 2024

Add functionality to fetch and save PubMed records by year

This commit introduces significant enhancements to the PubMedDownloader class, enabling the automated fetching of PubMed records by topic and year using the NCBI E-utilities API. Each set of records fetched for a specific year is now saved in a dedicated text file formatted in MEDLINE style, facilitating easier access and organization of the data.

Key Changes:

  • Added file saving functionality that organizes records into files named by topic and year.
  • Implemented error handling with retry logic and exponential backoff to manage network and API errors more robustly.
  • Configured the fetch function to retrieve records in MEDLINE format, ensuring that the data is structured according to PubMed's bibliographic standards.

The records are stored in the './results/baseline_doc' directory, with each file representing a specific year's data on the chosen topic. This update is crucial for researchers needing structured and easily accessible bibliographic information from PubMed.

lrm22005 added 4 commits May 6, 2024 12:12
This is my implementation that is downloading and creating the files with all the information from a specific year. This method is a way to work.
Enhance PubMed Data Fetching and Saving Mechanism

This commit introduces several enhancements to the PubMedDownloader class, improving its functionality and usability:

1. **Dynamic Year Querying**: Added support for dynamic querying by year. This allows users to specify a range of years for which the PubMed records should be fetched.

2. **Structured Data Saving**: Implemented functionality to save the fetched PubMed records in MEDLINE format. Each year's data is saved in a separate text file, named according to the query and the year, facilitating easier data management and retrieval.

3. **Error Handling**: Enhanced error handling capabilities to manage network issues and API limitations more robustly. This includes retry mechanisms with exponential backoff and timeout settings to prevent hanging requests.

4. **Directory Management**: Automated directory creation for storing the output files, ensuring that the user does not need to manually create directories before running the script.

These enhancements make the script more robust and user-friendly, suitable for handling large-scale data retrieval tasks in biomedical research environments.
This commit introduces significant enhancements to the PubMedDownloader class, enabling the automated fetching of PubMed records by topic and year using the NCBI E-utilities API. Each set of records fetched for a specific year is now saved in a dedicated text file formatted in MEDLINE style, facilitating easier access and organization of the data.

Key Changes:
- Added file saving functionality that organizes records into files named by topic and year.
- Implemented error handling with retry logic and exponential backoff to manage network and API errors more robustly.
- Configured the fetch function to retrieve records in MEDLINE format, ensuring that the data is structured according to PubMed's bibliographic standards.

The records are stored in the './results/baseline_doc' directory, with each file representing a specific year's data on the chosen topic. This update is crucial for researchers needing structured and easily accessible bibliographic information from PubMed.
@lrm22005 lrm22005 merged commit 1655990 into main May 6, 2024
Sign in to join this conversation on GitHub.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant