The Future of NA Data

Nicholas Tierney

Telethon Kids Institute

rstudio::conf(2022)

I hate missing data

Redacted

I fully hate missing data. It disrupts your data analysis, because you need to stop, and work out how much is missing, and think: why it is missing, why!?

I redacted hate missing data. It redacted your data analysis. Because redacted, and work out how redacted is missing and think: why redacted redacted redactedwhy!?

I ❤️ Missing Data

Plan

  1. Explore missing data: Overview –> Relationship

  2. Brief tour of missing data visualisations.

Overview

vis_miss(oceanbuoys)

Missing Relationship

ggplot(oceanbuoys, aes(x = air_temp_c, y = humidity)) + 
  geom_point()
Warning: Removed 171 rows containing missing values (geom_point).

Missing Relationship

ggplot(oceanbuoys, aes(x = air_temp_c, y = humidity)) + 
  geom_miss_point()

Missing relationship

ggplot(oceanbuoys, aes(x = air_temp_c, y = humidity)) + 
  geom_miss_point()

Missing relationship

ggplot(oceanbuoys, aes(x = air_temp_c, y = humidity)) + 
  geom_miss_point()

Missingness relationship + explore

ggplot(oceanbuoys, aes(x = air_temp_c, y = humidity)) + 
  geom_miss_point() +
  facet_wrap(~year)

Moar missingness vis

Missingness in Variables

gg_miss_var(oceanbuoys)

Missingness in Variables %

gg_miss_var(oceanbuoys, show_pct = TRUE)

Missingness in Variables + facetted

gg_miss_var(oceanbuoys, facet = year)

Combinations of missings

gg_miss_upset(oceanbuoys)

(Complex) Combinations of missings

gg_miss_upset(riskfactors)

gg_miss_fct()

gg_miss_fct(x = riskfactors, fct = marital)

Future work: moar geom s

Future work: geom_miss_histogram()

Future work: geom_imputed_point()

A book: “The Missing Book”

Nicholas Tierney & Allison Horst

The Future of Missing Data is presence