r - Replace NaNs in dataframe with values from another dataframe based on two criteria -
hi first post stackoverflow. have been trying solve problem, have not been able figure out answer alone nor find other posts answer question.
i need replace missing values dataset values dataframe; however, gets tricky values need match have factor associated them, matching dates.
here simplified version of first dataframe:
> df1 date site value 1991-07-08 22.5 1991-07-09 nan 1992-07-13 b 23.1 1992-07-14 nan 1993-07-07 b 27.3 here simplified version of second dataframe:
> df2 date site value 1991-07-08 22.5 1991-07-09 nan 1992-07-14 nan 1991-07-08 b 10.6 1992-07-09 b 23 1992-07-14 b nan 1992-07-09 c 11.3 1992-07-14 c 12.4 what want when there missing value replace value b (with same date), , if there not value b, using value of c (with same date). thus, resulting dataframe this:
> dffin date site value 1991-07-08 22.5 1991-07-09 23 1992-07-13 b 23.1 1992-07-14 12.4 1993-07-07 b 27.3 this have come far:
dffin<-replace(df1[which(df1$site=="a"),], df1$value[which(df$value=="nan")], df2$value[which(df2$site=="b" & df2$date==df1$date[which(df1$value=="nan" & df1$site=="a")])]) however, following error message:
error in [<-.data.frame(*tmp*, list, value = numeric(0)) : missing values not allowed in subscripted assignments of data frames and have not incorporated site c yet. not quite sure , appreciate help.
welcome so! first of all, problem seems bit underdefined, went ahead , made several alterations. i'm starting 2 data frames:
df1 <- read.table(text = " date site value 1991-07-08 22.5 1991-07-09 nan 1992-07-13 b 23.1 1992-07-14 nan 1993-07-07 b 27.3 ", head = t) df2 <- read.table(text = " date site value 1991-07-08 22.5 1991-07-09 nan 1992-07-14 nan 1991-07-08 b 10.6 1991-07-09 b 23 1992-07-14 b nan 1992-07-09 c 11.3 1992-07-14 c 12.4 ", head = t) replacing nan more traditional na:
df1$value[is.nan(df1$value)] <- na df2$value[is.nan(df2$value)] <- na merging (left joining) data frames cast long wide format (reshape2), date serves key:
library(reshape2) dd1 <- dcast(df1, date ~ site) dd2 <- dcast(df2, date ~ site) dm <- merge(dd1, dd2, = "date", all.x = true, suffixes = c("", ".y")) dm looks so:
date b a.y b.y c 1 1991-07-08 22.5 na 22.5 10.6 na 2 1991-07-09 na na na 23.0 na 3 1992-07-13 na 23.1 na na na 4 1992-07-14 na na na na 12.4 5 1993-07-07 na 27.3 na na na now super easy replace na want without need bother dates. i'm using following rule: if a missing, use b.y, if b.y missing, use c.
dm$a <- ifelse(is.na(dm$a), ifelse(is.na(dm$b.y), dm$c, dm$b.y), dm$a) now restore original format:
dffin <- na.omit(melt(dm[, c("date", "a", "b")], id = "date", variable.name = "site")) date site value 1 1991-07-08 22.5 2 1991-07-09 23.0 4 1992-07-14 12.4 8 1992-07-13 b 23.1 10 1993-07-07 b 27.3
Comments
Post a Comment