HDFS estimates the network bandwidth between two nodes by their distance. The distance from a node to its parent node is assumed to be one. A shorter distance between two nodes means that the greater bandwidth they can utilize to transfer data.
The placement of replica is critical to HDFS data reliability and read/write performance. A good replica placement policy should improve data reliability, availability and network bandwidth utilization. Currently HDFS provides a conﬁgurable block placement policy interface so that the users and researchers can experiment and test any policy that’s optimal for applications.
The default HDFS block placement policy tries to maintain a tradeoff between minimizing the write cost and maximizing data reliability, availability and aggregate read bandwidth. Upon the creation of a new block, the ﬁrst replica is placed on the node where the writer is located, the second and the third replicas on two different nodes in…
View original post 1,219 more words