r - Merge based on 2 columns values - Check for duplicate key values error -


i have 2 data tables following columns - ddate,fnumber,file,model , fnumber,ddate,model,model_id,file. update first table values second table matched ddate , fnumber columns.

if use merge:

dtpt <- merge(dtpt, dtat, = c("fnumber", "ddate"), all.x = true) 

then following error -

error in vecseq(f__, len__, if (allow.cartesian || notjoin) null else as.integer(max(nrow(x), : join results in 8568291 rows; more 8537179 = max(nrow(x),nrow(i)). check duplicate key values in i, each of join same group in x on , on again. if that's ok, try including j , dropping by (by-without-by) j runs each group avoid large allocation. if sure wish proceed, rerun allow.cartesian=true. otherwise, please search error message in faq, wiki, stack overflow , datatable-help advice.

i tried search duplicated records in dtat:

setkeyv(dtat, c("fnumber", "ddate")) dtat[duplicated(dtat)] 

but returns 0 rows.

i tried use match instead, doesn't work expected (sample data used below):

tpt <- "ddate,fnumber,file,model 2014-05-26,s71149,ps1.csv,320 2014-09-26,,ps2.csv, 2014-10-16,,ps3.csv,"   tat <-  "fnumber,ddate,model,model_id,file s71149,2014-05-26,319,vu-bhp,as1.csv s71149,2014-05-25,320,,as2.csv s71149,2014-05-23,322,vu-btp,as3.csv"  columnclasses <- c("posixct", "factor", "character", "factor") dtpt <- read.csv(text=tpt, header = true, sep = ",", na.strings = c(""), colclasses = columnclasses) dtpt <- as.data.table(dtpt)  columnclasses <- c("character", "posixct", "character", "factor", "factor") dtat <- read.csv(text=tat, header = true, sep = ",", na.strings = c(""), colclasses = columnclasses) dtat <- as.data.table(dtat)  dtpt$model_code <- dtat[match(paste(dtat$fnumber, dtat$ddate), paste(dtpt$fnumber, dtpt$ddate)), dtat$model]  

the code above in result assigns model_code values rows:

        ddate fnumber    file model model_code 1: 2014-05-26  s71149 ps1.csv   320        319 2: 2014-09-26      na ps2.csv    na        320 3: 2014-10-16      na ps3.csv    na        322 

when 320 should assigned first row only:

        ddate fnumber    file model model_code 1: 2014-05-26  s71149 ps1.csv   320        319 2: 2014-09-26      na ps2.csv    na         na 3: 2014-10-16      na ps3.csv    na         na 

since second , third rows there no match combination of ddate , fnumber.

(the merge, of course, works example below, returns error production data)

i update first table values second table

instead of merge function there update on join feature in data.table, faster updates reference , don't need materialize bigger dataset after joins.
general syntax is

dt1[dt2, value := i.lookup_value] 

so going example

library(data.table)  tpt <- "ddate,fnumber,file,model 2014-05-26,s71149,ps1.csv,320 2014-09-26,,ps2.csv, 2014-10-16,,ps3.csv,"   tat <- "fnumber,ddate,model,model_id,file s71149,2014-05-26,319,vu-bhp,as1.csv s71149,2014-05-25,320,,as2.csv s71149,2014-05-23,322,vu-btp,as3.csv"  columnclasses <- c("posixct", "factor", "character", "factor") dtpt <- read.csv(text=tpt, header = true, sep = ",", na.strings = c(""), colclasses = columnclasses) dtpt <- as.data.table(dtpt)  columnclasses <- c("character", "posixct", "character", "factor", "factor") dtat <- read.csv(text=tat, header = true, sep = ",", na.strings = c(""), colclasses = columnclasses) dtat <- as.data.table(dtat)  setkeyv(dtpt, c("fnumber","ddate")) setkeyv(dtat, c("fnumber","ddate"))  dtpt[dtat, model_code := i.model] #         ddate fnumber    file model model_code # 1: 2014-09-26      na ps2.csv    na         na # 2: 2014-10-16      na ps3.csv    na         na # 3: 2014-05-26  s71149 ps1.csv   320        319 

if mentioned error should start checking duplicates not on full rows on composite key. simple code below.

dtpt[,.n,c("fnumber","ddate")][n>1l] dtat[,.n,c("fnumber","ddate")][n>1l] 

Comments

Popular posts from this blog

css - SVG using textPath a symbol not rendering in Firefox -

Java 8 + Maven Javadoc plugin: Error fetching URL -

order - Notification for user in user account opencart -