Changelog
Source:NEWS.md
ethord 0.1.0 (2026-01-14)
Major Changes - Complete Dataset Restructuring
This release represents a breaking change with a complete restructuring of the package datasets to provide more granular and structured access to ETH Board Open Research Data (ORD) program information.
Removed Datasets
The following datasets from version 0.0.3 have been removed and replaced with a new structure:
-
portal- Project metadata from ORD portal -
docs_detail- Detailed project information -
docs_proposal- Project proposal documents -
docs_report- Project report data
New Dataset Structure
The package now provides 10 structured datasets organized by data type:
Application Data
-
application_budget- Budget information from project applications (78 projects, 16 variables)- Personnel costs (senior staff, postdocs, PhDs, students, technicians)
- Material, travel, publication, and other costs
- Total budget per project
-
application_ethics- Ethics-related information from applications (77 projects)- Human subjects research declarations
- Animal research declarations
- Recombinant DNA usage
- Hazardous materials handling
-
application_metadata- Core project metadata (105 projects, comprehensive variables)- Project titles, acronyms, abstracts
- Keywords and descriptions
- Start dates and duration
- Funding amounts requested
-
application_metadata_applicants- Detailed applicant information (114 records)- Applicant names, titles, ORCID IDs
- Institutions, departments, laboratories
- Primary vs. secondary applicants
-
application_metadata_keywords- Project keywords (304 keyword entries)- Structured keyword associations
- Searchable by project
-
application_metadata_work_packages- Work package descriptions (274 entries)- Detailed work package information per project
Reporting Data
-
report_metadata- Metadata from project reports (64 reports)- Reporting periods
- Project progress information
-
report_metadata_coapplicants- Co-applicant information from reports (133 entries)- Co-applicant details and roles
-
report_output- Project outputs and deliverables (1034 output records)- Publications, datasets, software
- Presentations and other research outputs
- Organized by output category and metrics
Data Improvements
Data Quality Enhancements
- Standardized date formats - All dates converted to YYYY-MM-DD format
- Removed empty placeholder columns - Cleaner, more focused datasets
-
Consistent project identifiers -
project_idused across all datasets for joining - Cleaned edge cases - Better handling of missing values and data inconsistencies
- Tidy data structure - Normalized tables following tidy data principles
Package Infrastructure
Technical Improvements
- Updated minimum R version requirement to R >= 4.1.0 (required for native pipe
|>) - Added package dependencies: dplyr, tidyr, stringr, lubridate, purrr, readr, fs
- Excluded raw data files from package build for portability
- Added comprehensive package documentation
- Created
CLAUDE.mdfor AI-assisted development
Migration Guide
Users of version 0.0.3 will need to update their code:
New Code (v0.1.0)
library(ethord)
# Access application metadata
application_metadata |>
left_join(project_mapping, by = "project_id") |>
filter(project_category == "Contribute")
# Access budget information
application_budget |>
left_join(project_mapping, by = "project_id")
# Access research outputs
report_output |>
left_join(project_mapping, by = "project_id")Key Changes for Users
-
Join datasets using
project_id- Datasets are now normalized; use joins to combine information -
Use
project_mappingfor categories - Project categories (Contribute/Explore/Establish) are inproject_mapping - More granular access - Access specific data types directly instead of one large table
- Better column names - All variables follow consistent naming conventions