r - Replace NaNs in dataframe with values from another dataframe based on two criteria -
hi first post stackoverflow. have been trying solve problem, have not been able figure out answer alone nor find other posts answer question.
i need replace missing values dataset values dataframe; however, gets tricky values need match have factor associated them, matching dates.
here simplified version of first dataframe:
> df1 date site value 1991-07-08 22.5 1991-07-09 nan 1992-07-13 b 23.1 1992-07-14 nan 1993-07-07 b 27.3
here simplified version of second dataframe:
> df2 date site value 1991-07-08 22.5 1991-07-09 nan 1992-07-14 nan 1991-07-08 b 10.6 1992-07-09 b 23 1992-07-14 b nan 1992-07-09 c 11.3 1992-07-14 c 12.4
what want when there missing value replace value b (with same date), , if there not value b, using value of c (with same date). thus, resulting dataframe this:
> dffin date site value 1991-07-08 22.5 1991-07-09 23 1992-07-13 b 23.1 1992-07-14 12.4 1993-07-07 b 27.3
this have come far:
dffin<-replace(df1[which(df1$site=="a"),], df1$value[which(df$value=="nan")], df2$value[which(df2$site=="b" & df2$date==df1$date[which(df1$value=="nan" & df1$site=="a")])])
however, following error message:
error in [<-.data.frame(*tmp*, list, value = numeric(0)) : missing values not allowed in subscripted assignments of data frames
and have not incorporated site c yet. not quite sure , appreciate help.
welcome so! first of all, problem seems bit underdefined, went ahead , made several alterations. i'm starting 2 data frames:
df1 <- read.table(text = " date site value 1991-07-08 22.5 1991-07-09 nan 1992-07-13 b 23.1 1992-07-14 nan 1993-07-07 b 27.3 ", head = t) df2 <- read.table(text = " date site value 1991-07-08 22.5 1991-07-09 nan 1992-07-14 nan 1991-07-08 b 10.6 1991-07-09 b 23 1992-07-14 b nan 1992-07-09 c 11.3 1992-07-14 c 12.4 ", head = t)
replacing nan
more traditional na
:
df1$value[is.nan(df1$value)] <- na df2$value[is.nan(df2$value)] <- na
merging (left joining) data frames cast long wide format (reshape2
), date serves key:
library(reshape2) dd1 <- dcast(df1, date ~ site) dd2 <- dcast(df2, date ~ site) dm <- merge(dd1, dd2, = "date", all.x = true, suffixes = c("", ".y"))
dm
looks so:
date b a.y b.y c 1 1991-07-08 22.5 na 22.5 10.6 na 2 1991-07-09 na na na 23.0 na 3 1992-07-13 na 23.1 na na na 4 1992-07-14 na na na na 12.4 5 1993-07-07 na 27.3 na na na
now super easy replace na
want without need bother dates. i'm using following rule: if a
missing, use b.y
, if b.y
missing, use c
.
dm$a <- ifelse(is.na(dm$a), ifelse(is.na(dm$b.y), dm$c, dm$b.y), dm$a)
now restore original format:
dffin <- na.omit(melt(dm[, c("date", "a", "b")], id = "date", variable.name = "site")) date site value 1 1991-07-08 22.5 2 1991-07-09 23.0 4 1992-07-14 12.4 8 1992-07-13 b 23.1 10 1993-07-07 b 27.3
Comments
Post a Comment