java - How to process two files in Hadoop Mapreduce? -
i have process 2 related files in hadoop environment using mapreduce. first file huge log files has been logging user's activity. , second file relatively small file contains details users. both .txt files. first file(log file) has format of:
userid | logintime | logouttime | roomnum | machineid
this file huge(couple of tb).
the second file(user file small file 20mb) is:
userid | userfname | userlname | dob | address
i have find out frequency of users's usage of lab machines, frequent user , list names.
i know how process 1 file if in there. since user details in other folder becoming hard me process it. new mapreduce seeking , advices here. problems similar joining 2 table in rdbms foreign key me.
you can use distributed cache save small file. distributed cache stored in memory , distributed across clusters running map reduce tasks.
add file distributed cache in following way.
configuration conf = new configuration(); distributedcache.addcachefile(new uri("/user/xxxx/cachefile/exmple.txt"), conf); job job = new job(conf, "wordcount");
and file setup method of mapper , play around data in map or reduce method.
public void setup(context context) throws ioexception, interruptedexception{ configuration conf = context.getconfiguration(); path[] localfiles = distributedcache.getlocalcachefiles(conf); //etc }
alternatively can use different mappers process
Comments
Post a Comment