java - How to process two files in Hadoop Mapreduce? -


i have process 2 related files in hadoop environment using mapreduce. first file huge log files has been logging user's activity. , second file relatively small file contains details users. both .txt files. first file(log file) has format of:

userid | logintime | logouttime | roomnum | machineid 

this file huge(couple of tb).

the second file(user file small file 20mb) is:

userid | userfname | userlname | dob | address 

i have find out frequency of users's usage of lab machines, frequent user , list names.

i know how process 1 file if in there. since user details in other folder becoming hard me process it. new mapreduce seeking , advices here. problems similar joining 2 table in rdbms foreign key me.

you can use distributed cache save small file. distributed cache stored in memory , distributed across clusters running map reduce tasks.

add file distributed cache in following way.

configuration conf = new configuration(); distributedcache.addcachefile(new uri("/user/xxxx/cachefile/exmple.txt"), conf); job job = new job(conf, "wordcount"); 

and file setup method of mapper , play around data in map or reduce method.

public void setup(context context) throws ioexception, interruptedexception{     configuration conf = context.getconfiguration();     path[] localfiles = distributedcache.getlocalcachefiles(conf);     //etc } 

alternatively can use different mappers process


Comments

Popular posts from this blog

css - SVG using textPath a symbol not rendering in Firefox -

Java 8 + Maven Javadoc plugin: Error fetching URL -

c - Expected expression before 'struct' - Error -