hadoop - NameNode HA when using hdfs:// URI -
with hdfs or hftp uri scheme (e.g. hdfs://namenode/path/to/file
) can access hdfs clusters without requiring xml configuration files. handy when running shell commands hdfs dfs -get
, hadoop distcp
or reading files spark sc.hadoopfile()
, because don't have copy , manage xml files relevant hdfs clusters nodes codes might potentially run.
one drawback of approach have use active namenode's hostname, otherwise hadoop throw exception complaining nn standby.
a usual workaround try 1 , try if exception caught, or connect zookeeper directly , parse binary data using protobuf.
both of these methods cumbersome, when compared (for example) mysql's loadbalance uri or zookeeper's connection string can comma-separate hosts in uri , driver automatically finds node talk to.
say have active , standby namenode hosts nn1
, nn2
. simplest way refer specific path of hdfs, which:
- can used in command-line tools
hdfs
,hadoop
- can used in hadoop java api (and tools depending on spark) minimum configuration
- works regardless of namenode active.
in scenarion instead of checking active namenode host , port combination, should use nameservice as, nameservice automatically transfer client requests active namenode.
name service acts proxy among namenodes, divert hdfs request active namenode
example: hdfs://nameservice_id/file/path/in/hdfs
sample steps create nameservice
in hdfs-site.xml file
create nameservice adding id it(here nameservice_id mycluster)
<property> <name>dfs.nameservices</name> <value>mycluster</value> <description>logical name new nameservice</description> </property>
now specify namenode ids determine namenodes in cluster
dfs.ha.namenodes.[$nameservice id]:
<property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> <description>unique identifiers each namenode in nameservice</description> </property>
then link namenode ids namenode hosts
dfs.namenode.rpc-address.[$nameservice id].[$name node id]
<property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>machine1.example.com:8020</value> </property> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>machine2.example.com:8020</value> </property>
there many properties involved configure namenode ha nameservice
with setup hdfs url file looks this
hdfs://mycluster/file/location/in/hdfs/wo/namenode/host
edit:
applying properties java code
configuration conf = new configuration(false); conf.set("dfs.nameservices","mycluster"); conf.set("dfs.ha.namenodes.mycluster","nn1,nn2"); conf.set("dfs.namenode.rpc-address.mycluster.nn1","machine1.example.com:8020"); conf.set("dfs.namenode.rpc-address.mycluster.nn2","machine2.example.com:8020"); filesystem fsobj = filesystem.get("relative/path/of/file/or/dir", conf); // use fsobj perform hdfs shell operations fsobj ...
Comments
Post a Comment