r - Replace NaNs in dataframe with values from another dataframe based on two criteria -


hi first post stackoverflow. have been trying solve problem, have not been able figure out answer alone nor find other posts answer question.

i need replace missing values dataset values dataframe; however, gets tricky values need match have factor associated them, matching dates.

here simplified version of first dataframe:

> df1  date       site       value    1991-07-08          22.5  1991-07-09          nan  1992-07-13 b          23.1  1992-07-14          nan  1993-07-07 b          27.3 

here simplified version of second dataframe:

> df2 date       site         value 1991-07-08          22.5 1991-07-09           nan 1992-07-14           nan 1991-07-08 b          10.6 1992-07-09 b            23 1992-07-14 b           nan 1992-07-09 c          11.3 1992-07-14 c          12.4 

what want when there missing value replace value b (with same date), , if there not value b, using value of c (with same date). thus, resulting dataframe this:

> dffin date       site       value   1991-07-08          22.5 1991-07-09            23 1992-07-13 b          23.1 1992-07-14          12.4 1993-07-07 b          27.3 

this have come far:

dffin<-replace(df1[which(df1$site=="a"),],            df1$value[which(df$value=="nan")],            df2$value[which(df2$site=="b" &                    df2$date==df1$date[which(df1$value=="nan" & df1$site=="a")])]) 

however, following error message:

error in [<-.data.frame(*tmp*, list, value = numeric(0)) :      missing values not allowed in subscripted assignments of data frames 

and have not incorporated site c yet. not quite sure , appreciate help.

welcome so! first of all, problem seems bit underdefined, went ahead , made several alterations. i'm starting 2 data frames:

df1 <- read.table(text = " date       site       value   1991-07-08          22.5 1991-07-09          nan 1992-07-13 b          23.1 1992-07-14          nan 1993-07-07 b          27.3 ", head = t) df2 <- read.table(text = " date       site         value 1991-07-08          22.5 1991-07-09           nan 1992-07-14           nan 1991-07-08 b          10.6 1991-07-09 b            23 1992-07-14 b           nan 1992-07-09 c          11.3 1992-07-14 c          12.4 ", head = t) 

replacing nan more traditional na:

df1$value[is.nan(df1$value)] <- na df2$value[is.nan(df2$value)] <- na 

merging (left joining) data frames cast long wide format (reshape2), date serves key:

library(reshape2) dd1 <- dcast(df1, date ~ site) dd2 <- dcast(df2, date ~ site)  dm <- merge(dd1, dd2, = "date", all.x = true, suffixes = c("", ".y")) 

dm looks so:

        date       b  a.y  b.y    c 1 1991-07-08 22.5   na 22.5 10.6   na 2 1991-07-09   na   na   na 23.0   na 3 1992-07-13   na 23.1   na   na   na 4 1992-07-14   na   na   na   na 12.4 5 1993-07-07   na 27.3   na   na   na 

now super easy replace na want without need bother dates. i'm using following rule: if a missing, use b.y, if b.y missing, use c.

dm$a <- ifelse(is.na(dm$a),                 ifelse(is.na(dm$b.y),                       dm$c, dm$b.y),                 dm$a) 

now restore original format:

dffin <- na.omit(melt(dm[, c("date", "a", "b")], id = "date", variable.name = "site"))           date site value 1  1991-07-08     22.5 2  1991-07-09     23.0 4  1992-07-14     12.4 8  1992-07-13    b  23.1 10 1993-07-07    b  27.3 

Comments

Popular posts from this blog

css - SVG using textPath a symbol not rendering in Firefox -

Java 8 + Maven Javadoc plugin: Error fetching URL -

order - Notification for user in user account opencart -