Show code
# Load required packages for data analysis with ethord R data package
library(ggthemes)
library(tidyverse)
library(ggtext)
library(gt)
library(ethord)Publishing Open Metadata for Open Research Data Projects of the ETH Domain
# Load required packages for data analysis with ethord R data package
library(ggthemes)
library(tidyverse)
library(ggtext)
library(gt)
library(ethord)This manuscript presents an analysis of ETH Board Open Research Data (ORD) Program metadata and is currently a work in progress. It serves as an internal reporting tool during the data collection and validation phase. The data is not yet complete and has not been fully validated. Results and interpretations should be considered preliminary and subject to revision as additional project data becomes available and quality checks are completed.
The ETH Domain Open Research Data (ORD) Program represents a significant investment in advancing open research practices across Swiss federal institutions. This report provides a draft analysis of data extracted from 76 out 96 funded projects, focusing on the program’s structure, budget distribution, and metadata accessibility.
The primary goal of the measure is to support ETH researchers to engage in, and develop ORD practices and to become ORD leaders in their fields.
The program has funded 96 projects with a total investment of 15 million CHF.
The ETH Domain maintains an ORD portal that showcases funded projects. While the portal provides basic information, there are significant opportunities for improvement in data accessibility and structured metadata provision.

The current portal has several limitations:

The following visualization shows how projects are distributed across institutions and project categories using the ethord package. It’s a first example of how we can use the structured metadata to analyze the program’s reach and impact.
application_metadata |>
mutate(category = case_when(
project_category == "Contribute" ~ "Contribute (30k)",
project_category == "Explore" ~ "Explore (150k)",
project_category == "Establish" ~ "Establish (1.5m)"
)) |>
mutate(category = factor(category,
levels = c("Contribute (30k)",
"Explore (150k)",
"Establish (1.5m)"))) |>
count(main_applicant_institution, category) |>
mutate(main_applicant_institution = str_wrap(main_applicant_institution, width = 30)) |>
ggplot(aes(x = fct_reorder(main_applicant_institution, n),
y = n,
fill = category)) +
geom_col(position = "dodge") +
geom_label(aes(label = n),
position = position_dodge(width = 0.9),
show.legend = FALSE,
color = "white",
fontface = "bold",
size = 3) +
coord_flip() +
labs(
title = "Open Research Data Program of the ETH Board",
subtitle = "Number of funded projects per institution of lead applicant and project category",
y = "Number of projects",
x = NULL,
fill = "Project category:"
) +
scale_fill_colorblind() +
theme_minimal(base_size = 10) +
theme(panel.grid.major.y = element_blank(),
axis.text.y = element_text(size = 8))
Data from: Massari, Schöbitz, and Tilley (2025)
While proposals, scientific reports, and lists of outputs contain valuable information, this data is not publicly available as open, structured, machine-readable data. This creates several challenges:
Without structured metadata, we cannot easily answer questions such as:
The primary limitations stem from confidentiality requirements:
This has several consequences:
Our approach to addressing these challenges:
The ethord R data package provides structured, open access to metadata from the ETH Domain ORD Program.
Package details:
Reach out to project leads to:
application_metadata - Core project info, timeline, budgetapplication_budget - Detailed budget breakdown by categoryapplication_ethics - Ethics considerations and flagsapplication_metadata_applicants - Co-applicant informationapplication_metadata_keywords - Research keywordsapplication_metadata_work_packages - Work package structurereport_metadata - Project reports and updatesreport_output - Project outputs and user reachreport_metadata_coapplicants - Co-applicant contributionsHow is the budget distributed across different cost categories? We can answer this question using the structured metadata in the ethord package.
The results in Table 1 show that the program was largely used for personnel costs, representing approximately 90% of the total budget across all projects. This substantial investment in human resources reflects the program’s emphasis on capacity building and expertise development. An important consideration for future program planning is whether funded staff were already employed at the institutions or hired specifically for these projects. Additionally, any follow-up program should consider sustainability mechanisms, as many staff hired for these projects may face funding gaps when projects conclude, potentially limiting the long-term impact of developed expertise and infrastructure.
# Calculate total budgets by category
budget_summary <- application_budget |>
summarise(
Personnel = sum(total_budget_personnel_total_direct, na.rm = TRUE),
Travel = sum(total_budget_travel, na.rm = TRUE),
Equipment = sum(total_budget_equipment, na.rm = TRUE),
`Other Direct` = sum(total_budget_other_total_direct, na.rm = TRUE),
Subcontracting = sum(total_budget_subcontracting, na.rm = TRUE)
) |>
pivot_longer(cols = everything(),
names_to = "cost_category",
values_to = "total_chf") |>
filter(total_chf > 0) |>
mutate(percent = total_chf / sum(total_chf) * 100) |>
arrange(desc(total_chf))
# Display as formatted table
budget_summary |>
gt() |>
tab_header(
title = "ORD Program Budget Distribution",
subtitle = str_glue("Cost breakdown across {nrow(application_metadata)} funded projects")
) |>
cols_label(
cost_category = "Cost Category",
total_chf = "Total (CHF)",
percent = "Percentage"
) |>
fmt_number(
columns = total_chf,
decimals = 0,
use_seps = TRUE
) |>
fmt_percent(
columns = percent,
decimals = 1,
scale_values = FALSE
) |>
tab_style(
style = list(
cell_text(weight = "bold")
),
locations = cells_column_labels()
) |>
tab_options(
table.font.size = px(14),
heading.title.font.size = px(18),
heading.subtitle.font.size = px(14)
)| ORD Program Budget Distribution | ||
| Cost breakdown across 76 funded projects | ||
| Cost Category | Total (CHF) | Percentage |
|---|---|---|
| Personnel | 6,701,489 | 83.6% |
| Other Direct | 689,569 | 8.6% |
| Equipment | 302,240 | 3.8% |
| Travel | 181,770 | 2.3% |
| Subcontracting | 143,000 | 1.8% |
Data from: Massari, Schöbitz, and Tilley (2025)
A treemap provides an alternative view of the budget distribution, with area proportional to spending.
application_budget_long <- application_budget |>
select(project_id, !ends_with("total_direct"), -phase, -total_budget_total_costs) |>
pivot_longer(cols = !project_id, names_to = "budget_item", values_to = "amount") |>
filter(!is.na(amount), amount > 0) |>
mutate(budget_item = str_remove(budget_item, "total_budget_")) |>
mutate(
main_category = case_when(
str_starts(budget_item, "personnel_") ~ "Personnel",
str_starts(budget_item, "other_") ~ "Other Costs",
budget_item == "travel" ~ "Travel",
budget_item == "equipment" ~ "Equipment",
budget_item == "subcontracting" ~ "Subcontracting",
TRUE ~ "Other Costs"
),
sub_category = str_remove(budget_item, "^(personnel|other)_") |>
str_replace_all("_", " ") |>
str_to_title()
)
library(treemap)
# Prepare data for treemap
treemap_data <- application_budget_long |>
group_by(main_category, sub_category) |>
summarise(total = sum(amount), .groups = "drop") |>
mutate(
label = str_glue("{sub_category}\nCHF {scales::comma(total, accuracy = 1)}")
)
# Create treemap with annotations
treemap(
treemap_data,
index = c("main_category", "sub_category"),
vSize = "total",
vColor = "main_category",
type = "categorical",
palette = "Set2",
title = "Budget Distribution Treemap",
fontsize.labels = c(14, 10),
fontcolor.labels = c("white", "black"),
fontface.labels = c(2, 1),
bg.labels = c("transparent"),
align.labels = list(c("center", "center"), c("center", "center")),
overlap.labels = 0.5,
border.col = c("white", "gray90"),
border.lwds = c(4, 2)
)
Data from: Massari, Schöbitz, and Tilley (2025)
Table 2 provides a detailed breakdown of the budget allocation, with particular focus on the Personnel category. Within the personnel budget of approximately CHF 7.4 million, the distribution reveals strategic investment across different career stages: Senior Staff accounts for 40.4% (CHF 3.0 million across 25 projects), Postdocs represent 30.9% (CHF 2.3 million across 29 projects), while the “Other” category—likely encompassing technical staff, research assistants, and other support roles—comprises 23.8% (CHF 1.8 million across 43 projects). Student positions received the smallest allocation at 4.8% (CHF 356,000 across 23 projects). This distribution pattern suggests that projects prioritized experienced researchers and professional staff.
# Create detailed budget table with grouping and totals
budget_table_data <- application_budget_long |>
group_by(main_category, sub_category) |>
summarise(total_chf = sum(amount), .groups = "drop") |>
group_by(main_category) |>
mutate(
category_total = sum(total_chf),
percent_of_category = total_chf / category_total * 100
) |>
ungroup() |>
arrange(desc(category_total), desc(total_chf))
budget_table_data |>
gt(groupname_col = "main_category") |>
tab_header(
title = "Detailed Budget Breakdown",
subtitle = "Data of 76/95 projects"
) |>
cols_label(
sub_category = "Subcategory",
total_chf = "Amount (CHF)",
percent_of_category = "% of Category"
) |>
# Add summary rows for each group
summary_rows(
groups = everything(),
columns = total_chf,
fns = list(
"Category Total" = ~sum(., na.rm = TRUE)
),
fmt = ~fmt_number(., decimals = 0, use_seps = TRUE)
) |>
# Add grand total row
grand_summary_rows(
columns = total_chf,
fns = list(
"Grand Total" = ~sum(., na.rm = TRUE)
),
fmt = ~fmt_number(., decimals = 0, use_seps = TRUE)
) |>
fmt_number(
columns = total_chf,
decimals = 0,
use_seps = TRUE
) |>
fmt_percent(
columns = percent_of_category,
decimals = 1,
scale_values = FALSE
) |>
tab_style(
style = cell_text(weight = "bold"),
locations = cells_row_groups()
) |>
tab_style(
style = cell_fill(color = "gray95"),
locations = cells_body(columns = everything(), rows = percent_of_category > 50)
) |>
tab_style(
style = list(
cell_fill(color = "lightblue"),
cell_text(weight = "bold")
),
locations = cells_summary()
) |>
tab_style(
style = list(
cell_fill(color = "steelblue"),
cell_text(weight = "bold", color = "white")
),
locations = cells_grand_summary()
) |>
tab_options(
table.font.size = px(11),
heading.title.font.size = px(16),
heading.subtitle.font.size = px(12),
row_group.background.color = "lightblue",
row_group.font.weight = "bold",
summary_row.background.color = "lightblue"
) |>
cols_hide(columns = category_total)| Detailed Budget Breakdown | |||
| Data of 76/95 projects | |||
| Subcategory | Amount (CHF) | % of Category | |
|---|---|---|---|
| Personnel | |||
| Senior Staff | 2,979,824 | 40.4% | |
| Postdocs | 2,280,533 | 30.9% | |
| Other | 1,752,967 | 23.8% | |
| Students | 355,750 | 4.8% | |
| Category Total | — | 7,369,074 | — |
| Equipment | |||
| Equipment | 302,240 | 100.0% | |
| Category Total | — | 302,240 | — |
| Other Costs | |||
| Conferences Workshops | 165,455 | 73.3% | |
| Publication Fees | 35,250 | 15.6% | |
| Other | 24,927 | 11.0% | |
| Category Total | — | 225,632 | — |
| Travel | |||
| Travel | 181,770 | 100.0% | |
| Category Total | — | 181,770 | — |
| Subcontracting | |||
| Subcontracting | 143,000 | 100.0% | |
| Category Total | — | 143,000 | — |
| Grand Total | — | 8,221,716 | — |
Data from: Massari, Schöbitz, and Tilley (2025)
The ethord package has a comprehensive website with documentation, vignettes, and examples.
Visit the package website: global-health-engineering.github.io/ethord/
Recommendations for future program iterations:
How could we make such administrative data “open by default” in the future?
Switzerland has established a comprehensive framework for open data publication:
Federal Foundation: Open Government Data Strategy 2019-2023 (Swiss Federal Council 2019) established “open by default” principle for all federal agencies.
Legal Mandate: Federal Act EMBAG Article 10 (Swiss Federal Assembly 2024) legally requires open data publication unless restricted by privacy or security.
Implementation: OGD Masterplan 2024-2027 (Federal Statistical Office 2024a) operationalizes through:
Switzerland is unique in having legally mandated open by default - not just policy recommendations. This creates accountability and consistency across agencies. The FAIR alignment shows how government OGD principles directly apply to research data contexts.
The ethord package demonstrates how structured, open metadata can enhance transparency and enable deeper analysis of research funding programs. By applying FAIR principles to administrative data, we create opportunities for better understanding program outcomes and researcher engagement with open research data practices.
All data and analyses presented in this report are available through the ethord R package (Massari, Schöbitz, and Tilley 2025):