hadoop - NameNode HA when using hdfs:// URI -


with hdfs or hftp uri scheme (e.g. hdfs://namenode/path/to/file) can access hdfs clusters without requiring xml configuration files. handy when running shell commands hdfs dfs -get, hadoop distcp or reading files spark sc.hadoopfile(), because don't have copy , manage xml files relevant hdfs clusters nodes codes might potentially run.

one drawback of approach have use active namenode's hostname, otherwise hadoop throw exception complaining nn standby.

a usual workaround try 1 , try if exception caught, or connect zookeeper directly , parse binary data using protobuf.

both of these methods cumbersome, when compared (for example) mysql's loadbalance uri or zookeeper's connection string can comma-separate hosts in uri , driver automatically finds node talk to.

say have active , standby namenode hosts nn1 , nn2. simplest way refer specific path of hdfs, which:

  • can used in command-line tools hdfs, hadoop
  • can used in hadoop java api (and tools depending on spark) minimum configuration
  • works regardless of namenode active.

in scenarion instead of checking active namenode host , port combination, should use nameservice as, nameservice automatically transfer client requests active namenode.

name service acts proxy among namenodes, divert hdfs request active namenode

example: hdfs://nameservice_id/file/path/in/hdfs


sample steps create nameservice

in hdfs-site.xml file

create nameservice adding id it(here nameservice_id mycluster)

<property>   <name>dfs.nameservices</name>   <value>mycluster</value>   <description>logical name new nameservice</description> </property> 

now specify namenode ids determine namenodes in cluster

dfs.ha.namenodes.[$nameservice id]:

<property>   <name>dfs.ha.namenodes.mycluster</name>   <value>nn1,nn2</value>   <description>unique identifiers each namenode in nameservice</description> </property> 

then link namenode ids namenode hosts

dfs.namenode.rpc-address.[$nameservice id].[$name node id]

<property>   <name>dfs.namenode.rpc-address.mycluster.nn1</name>   <value>machine1.example.com:8020</value> </property> <property>   <name>dfs.namenode.rpc-address.mycluster.nn2</name>   <value>machine2.example.com:8020</value> </property> 

there many properties involved configure namenode ha nameservice

with setup hdfs url file looks this

hdfs://mycluster/file/location/in/hdfs/wo/namenode/host 

edit:

applying properties java code

configuration conf = new configuration(false); conf.set("dfs.nameservices","mycluster"); conf.set("dfs.ha.namenodes.mycluster","nn1,nn2"); conf.set("dfs.namenode.rpc-address.mycluster.nn1","machine1.example.com:8020"); conf.set("dfs.namenode.rpc-address.mycluster.nn2","machine2.example.com:8020");  filesystem fsobj =  filesystem.get("relative/path/of/file/or/dir", conf);  // use fsobj perform hdfs shell operations fsobj ... 

Comments

Popular posts from this blog

Java 8 + Maven Javadoc plugin: Error fetching URL -

css - SVG using textPath a symbol not rendering in Firefox -

order - Notification for user in user account opencart -