Skip to contents

The goal of ghedata is to share data resources that document the work of the Global Health Engineering group at ETH Zurich. The data can be used for research, teaching, and learning purposes. The data is available as an R data package, in CSV and XLSX formats.

Installation

You can install the development version of ghedata from GitHub with:

# install.packages("devtools")
devtools::install_github("global-health-engineering/ghedata")
## Run the following code in console if you don't have the packages
## install.packages(c("dplyr", "knitr", "readr", "stringr", "gt", "kableExtra"))
library(dplyr)
library(knitr)
library(readr)
library(stringr)
library(gt)
library(kableExtra)
library(ggthemes)
library(ggplot2)

Alternatively, you can download the individual datasets as a CSV or XLSX file from the table below.

  1. Click Download CSV. A window opens that displays the CSV in your browser.
  2. Right-click anywhere inside the window and select “Save Page As…”.
  3. Save the file in a folder of your choice.
dataset CSV XLSX
people Download CSV Download XLSX

Data

The package provides access to one data resource.

people

The dataset people contains data about people that have worked for the group. This includes staff, as well as undergraduate student projects.

It has 129 observations and 7 variables.

people |> 
  head(3) |> 
  gt::gt() |>
  gt::as_raw_html()
title degree type b_m_student start_date year thesis_title
2021-bsc-thesis-225343 bsc thesis yes 2021-02-01 2021 Self-cleaning Preliminary Screening Mechanisms to Improve Health and Wastewater Treatment Efficiency in Low-income Countries
2021-msc-thesis-554324 msc thesis yes 2021-09-19 2021 Barriers to successful data management in the Global South: Case Study of Malawi
2022-bsc-thesis-164543 bsc thesis yes 2022-03-25 2022 The development of a recycled plastic wheel for cardboard collector trollies - improving the working conditions of informal cardboard recyclers in Durban, South Africa

For an overview of the variable names, see the following table.

variable_name variable_type description
project_id character Unique identifier for each person using a combination of other metadata. Folder name on Google Drive.
degree character Categorical variable with four levels: bsc, msc, phd, staff.
type character Categorical variable with six levels: hiwi, intern, post-doc, scientific-assitant, sem-proj, thesis.
b_m_student character Binary variable to identify if person is BSc or MSc students. Levels: yes, no.
start_date Date Start date of the person.
year numeric Year of the start date.
thesis_title character Title of the thesis.

Example

library(ghedata)

undergrad_students <- people |> 
  filter(b_m_student == "yes") |>
  filter(!is.na(title)) 

So far, GHE has supervised 108 projects of which 25 were done by BSc and 83 by MSc students.

undergrad_students |> 
  count(degree, year) |> 
  ggplot(aes(x = year, y = n, label = n, fill = degree, color = degree)) +
  geom_col(position = "dodge") +
  geom_text(position=position_dodge(width=0.9), 
            vjust=-0.5,
            show.legend = FALSE) +
  labs(x = "",
       y = "Number of students") +
  scale_fill_colorblind() +
  scale_color_colorblind() +
  theme_minimal() +
  theme(panel.grid = element_blank(),
        axis.text.y = element_blank())

License

Data are available as CC-BY.

Citation

Please cite this package using:

citation("ghedata")
#> To cite package 'ghedata' in publications use:
#> 
#>   Schöbitz L (2024). _ghedata: What the Package Does (One Line, Title
#>   Case)_. R package version 0.0.0.9000,
#>   <https://github.com/global-health-engineering/ghedata>.
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Manual{,
#>     title = {ghedata: What the Package Does (One Line, Title Case)},
#>     author = {Lars Schöbitz},
#>     year = {2024},
#>     note = {R package version 0.0.0.9000},
#>     url = {https://github.com/global-health-engineering/ghedata},
#>   }