r - Merge based on 2 columns values - Check for duplicate key values error -
i have 2 data tables following columns - ddate,fnumber,file,model
, fnumber,ddate,model,model_id,file
. update first table values second table matched ddate
, fnumber
columns.
if use merge
:
dtpt <- merge(dtpt, dtat, = c("fnumber", "ddate"), all.x = true)
then following error -
error in vecseq(f__, len__, if (allow.cartesian || notjoin) null else as.integer(max(nrow(x), : join results in 8568291 rows; more 8537179 = max(nrow(x),nrow(i)). check duplicate key values in i, each of join same group in x on , on again. if that's ok, try including
j
, droppingby
(by-without-by) j runs each group avoid large allocation. if sure wish proceed, rerun allow.cartesian=true. otherwise, please search error message in faq, wiki, stack overflow , datatable-help advice.
i tried search duplicated records in dtat
:
setkeyv(dtat, c("fnumber", "ddate")) dtat[duplicated(dtat)]
but returns 0 rows.
i tried use match
instead, doesn't work expected (sample data used below):
tpt <- "ddate,fnumber,file,model 2014-05-26,s71149,ps1.csv,320 2014-09-26,,ps2.csv, 2014-10-16,,ps3.csv," tat <- "fnumber,ddate,model,model_id,file s71149,2014-05-26,319,vu-bhp,as1.csv s71149,2014-05-25,320,,as2.csv s71149,2014-05-23,322,vu-btp,as3.csv" columnclasses <- c("posixct", "factor", "character", "factor") dtpt <- read.csv(text=tpt, header = true, sep = ",", na.strings = c(""), colclasses = columnclasses) dtpt <- as.data.table(dtpt) columnclasses <- c("character", "posixct", "character", "factor", "factor") dtat <- read.csv(text=tat, header = true, sep = ",", na.strings = c(""), colclasses = columnclasses) dtat <- as.data.table(dtat) dtpt$model_code <- dtat[match(paste(dtat$fnumber, dtat$ddate), paste(dtpt$fnumber, dtpt$ddate)), dtat$model]
the code above in result assigns model_code
values rows:
ddate fnumber file model model_code 1: 2014-05-26 s71149 ps1.csv 320 319 2: 2014-09-26 na ps2.csv na 320 3: 2014-10-16 na ps3.csv na 322
when 320
should assigned first row only:
ddate fnumber file model model_code 1: 2014-05-26 s71149 ps1.csv 320 319 2: 2014-09-26 na ps2.csv na na 3: 2014-10-16 na ps3.csv na na
since second , third rows there no match combination of ddate
, fnumber
.
(the merge
, of course, works example below, returns error production data)
i update first table values second table
instead of merge
function there update on join feature in data.table, faster updates reference , don't need materialize bigger dataset after joins.
general syntax is
dt1[dt2, value := i.lookup_value]
so going example
library(data.table) tpt <- "ddate,fnumber,file,model 2014-05-26,s71149,ps1.csv,320 2014-09-26,,ps2.csv, 2014-10-16,,ps3.csv," tat <- "fnumber,ddate,model,model_id,file s71149,2014-05-26,319,vu-bhp,as1.csv s71149,2014-05-25,320,,as2.csv s71149,2014-05-23,322,vu-btp,as3.csv" columnclasses <- c("posixct", "factor", "character", "factor") dtpt <- read.csv(text=tpt, header = true, sep = ",", na.strings = c(""), colclasses = columnclasses) dtpt <- as.data.table(dtpt) columnclasses <- c("character", "posixct", "character", "factor", "factor") dtat <- read.csv(text=tat, header = true, sep = ",", na.strings = c(""), colclasses = columnclasses) dtat <- as.data.table(dtat) setkeyv(dtpt, c("fnumber","ddate")) setkeyv(dtat, c("fnumber","ddate")) dtpt[dtat, model_code := i.model] # ddate fnumber file model model_code # 1: 2014-09-26 na ps2.csv na na # 2: 2014-10-16 na ps3.csv na na # 3: 2014-05-26 s71149 ps1.csv 320 319
if mentioned error should start checking duplicates not on full rows on composite key. simple code below.
dtpt[,.n,c("fnumber","ddate")][n>1l] dtat[,.n,c("fnumber","ddate")][n>1l]
Comments
Post a Comment