好得很程序员自学网

<tfoot draggable='sEl'></tfoot>

Hadoop源码分析三启动及脚本剖析

1、 启动

hadoop的启动是通过其sbin目录下的脚本来启动的。与启动相关的叫脚本有以下几个:

start-all.sh 、 start-dfs.sh 、 start-yarn.sh 、 hadoop-daemon.sh 、 yarn-daemon.sh 。

hadoop-daemon.sh 是用来启动与hdfs相关的服务的

yarn-daemon.sh 是用来启动和yarn相关的服务

start-dfs.sh 是用来启动hdfs集群的

start-yarn.sh 是用来启动yarn集群

start-all.sh 是用来启动yarn和hdfs集群的

这几个start开头的脚本都是通过调用那两个 daemon 脚本来启动的。

2、 脚本分析

这里先从start-all.sh开始分析,然后逐步分析其脚本的调用。

start-all.sh脚本内容如下:

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

#!/usr/bin/env bash

 

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements.  See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License.  You may obtain a copy of the License at

#

#     http://HdhCmsTestapache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

 

 

# Start all hadoop daemons.  Run this on master node.

 

echo "This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh"

 

bin=`dirname "${BASH_SOURCE-$0}" `

bin=`cd "$bin" ; pwd`

 

DEFAULT_LIBEXEC_DIR= "$bin" /libexec

HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}

. $HADOOP_LIBEXEC_DIR/hadoop-config.sh

 

# start hdfs daemons if hdfs is present

if [ -f "${HADOOP_HDFS_HOME}" /sbin/start-dfs.sh ]; then

   "${HADOOP_HDFS_HOME}" /sbin/start-dfs.sh --config $HADOOP_CONF_DIR

fi

 

# start yarn daemons if yarn is present

if [ -f "${HADOOP_YARN_HOME}" /sbin/start-yarn.sh ]; then

   "${HADOOP_YARN_HOME}" /sbin/start-yarn.sh --config $HADOOP_CONF_DIR

fi

这个脚本的重点在第31行到末尾,这里是两个if语句,第一个if语句里(第34行)调用的是start-dfs.sh脚本,第二个if语句里(第37行)调用的是start-yarn.sh。

然后我们以start-dfs.sh为例,继续向下分析。

start-dfs.sh的内容如下:

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

#!/usr/bin/env bash

 

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements.  See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License.  You may obtain a copy of the License at

#

#     http://HdhCmsTestapache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

 

 

# Start hadoop dfs daemons.

# Optinally upgrade or rollback dfs state.

# Run this on master node.

 

usage= "Usage: start-dfs.sh [-upgrade|-rollback] [other options such as -clusterId]"

 

bin=`dirname "${BASH_SOURCE-$0}" `

bin=`cd "$bin" ; pwd`

 

DEFAULT_LIBEXEC_DIR= "$bin" /libexec

HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}

. $HADOOP_LIBEXEC_DIR/hdfs-config.sh

 

# get arguments

if [[ $# -ge 1 ]]; then

   startOpt= "$1"

   shift

   case "$startOpt" in

     -upgrade)

       nameStartOpt= "$startOpt"

     ;;

     -rollback)

       dataStartOpt= "$startOpt"

     ;;

     *)

       echo $usage

       exit 1

     ;;

   esac

fi

 

#Add other possible options

nameStartOpt= "$nameStartOpt $@"

 

#---------------------------------------------------------

# namenodes

 

NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes)

 

echo "Starting namenodes on [$NAMENODES]"

 

"$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \

   --config "$HADOOP_CONF_DIR" \

   --hostnames "$NAMENODES" \

   --script "$bin/hdfs" start namenode $nameStartOpt

 

#---------------------------------------------------------

# datanodes (using default slaves file)

 

if [ -n "$HADOOP_SECURE_DN_USER" ]; then

   echo \

     "Attempting to start secure cluster, skipping datanodes. " \

     "Run start-secure-dns.sh as root to complete startup."

else

   "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \

     --config "$HADOOP_CONF_DIR" \

     --script "$bin/hdfs" start datanode $dataStartOpt

fi

 

#---------------------------------------------------------

# secondary namenodes (if any)

 

SECONDARY_NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -secondarynamenodes 2>/dev/null)

 

if [ -n "$SECONDARY_NAMENODES" ]; then

   echo "Starting secondary namenodes [$SECONDARY_NAMENODES]"

 

   "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \

       --config "$HADOOP_CONF_DIR" \

       --hostnames "$SECONDARY_NAMENODES" \

       --script "$bin/hdfs" start secondarynamenode

fi

 

#---------------------------------------------------------

# quorumjournal nodes (if any)

 

SHARED_EDITS_DIR=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.namenode.shared.edits.dir 2>&-)

 

case "$SHARED_EDITS_DIR" in

qjournal: //*)

   JOURNAL_NODES=$(echo "$SHARED_EDITS_DIR" | sed 's,qjournal://\([^/]*\)/.*,\1,g; s/;/ /g; s/:[0-9]*//g' )

   echo "Starting journal nodes [$JOURNAL_NODES]"

   "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \

       --config "$HADOOP_CONF_DIR" \

       --hostnames "$JOURNAL_NODES" \

       --script "$bin/hdfs" start journalnode ;;

esac

 

#---------------------------------------------------------

# ZK Failover controllers, if auto-HA is enabled

AUTOHA_ENABLED=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.ha.automatic-failover.enabled)

if [ "$(echo " $AUTOHA_ENABLED " | tr A-Z a-z)" = "true" ]; then

   echo "Starting ZK Failover Controllers on NN hosts [$NAMENODES]"

   "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \

     --config "$HADOOP_CONF_DIR" \

     --hostnames "$NAMENODES" \

     --script "$bin/hdfs" start zkfc

fi

 

# eof

 首先是第23行到30行,这里是在处理hadoop的路径。

然后是第32行到51行是在处理脚本传入的参数。最后就启动hdfs的各个角色了。

首先启动的是namenode,在第60行到63行,这段代码是调用了hadoop-daemons.sh脚本来启动,这个脚本和之前提到的hadoop-daemon.sh脚本的区别在于这个脚本可以在集群的其他机器上启动该启动的角色,而hadoop-daemon.sh只能启动当前机器上的角色。其实hadoop-daemons.sh也是通过调用hadoop-daemon.sh来启动的,这个稍后再分析。这个脚本还有几个参数,其中最重要的是:start namenode。它表示启动namenode。

然后是第65行到第76行,这里是在启动datanode,启动的方式和namenode相同。启动DataNode的代码在第73行。

然后是第78行到第90行,这里在启动secondarynamenode。如果是配置了namenode的高可用,secondarynamenode便不会启动。

然后是第92行到第105行,这里在启动journalnode。如果配置了namenode的高可用,journalnode才会启动。

最后是第107行到末尾,这里启动的是zkfc。同样这个也是要配置高可用才会启动。

如果按照文档(2)中配置的高可用来看,这里启动的角色应该为:namenode、datanode、journalnode、zkfc。

启动上述角色调用的hadoop-daemons.sh脚本内容如下:

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

#!/usr/bin/env bash

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements.  See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License.  You may obtain a copy of the License at

#

#     http://HdhCmsTestapache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

# Run a Hadoop command on all slave hosts.

usage= "Usage: hadoop-daemons.sh [--config confdir] [--hosts hostlistfile] [start|stop] command args..."

# if no args specified, show usage

if [ $# -le 1 ]; then

   echo $usage

   exit 1

fi

bin=`dirname "${BASH_SOURCE-$0}" `

bin=`cd "$bin" ; pwd`

DEFAULT_LIBEXEC_DIR= "$bin" /libexec

HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}

. $HADOOP_LIBEXEC_DIR/hadoop-config.sh

exec "$bin/slaves.sh" --config $HADOOP_CONF_DIR cd "$HADOOP_PREFIX" \; "$bin/hadoop-daemon.sh" --config $HADOOP_CONF_DIR "$@"

这个脚本的重点在最后一行。这里有两个脚本slaves.sh和hadoop-daemon.sh。slaves.sh脚本会使用ssh登录到指定的服务器中然后执行其中的hadoop-daemon.sh脚本。这里就不分析slaves.sh脚本了。

我们继续看hadoop-daemon.sh脚本。

其内容如下:

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

#!/usr/bin/env bash

 

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements.  See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License.  You may obtain a copy of the License at

#

#     http://HdhCmsTestapache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

 

 

# Runs a Hadoop command as a daemon.

#

# Environment Variables

#

#   HADOOP_CONF_DIR  Alternate conf dir. Default is ${HADOOP_PREFIX}/conf.

#   HADOOP_LOG_DIR   Where log files are stored.  PWD by default.

#   HADOOP_MASTER    host:path where hadoop code should be rsync'd from

#   HADOOP_PID_DIR   The pid files are stored. /tmp by default.

#   HADOOP_IDENT_STRING   A string representing this instance of hadoop. $USER by default

#   HADOOP_NICENESS The scheduling priority for daemons. Defaults to 0.

##

 

usage= "Usage: hadoop-daemon.sh [--config <conf-dir>] [--hosts hostlistfile] [--script script] (start|stop) <hadoop-command> <args...>"

 

# if no args specified, show usage

if [ $# -le 1 ]; then

   echo $usage

   exit 1

fi

 

bin=`dirname "${BASH_SOURCE-$0}" `

bin=`cd "$bin" ; pwd`

 

DEFAULT_LIBEXEC_DIR= "$bin" /libexec

HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}

. $HADOOP_LIBEXEC_DIR/hadoop-config.sh

 

# get arguments

 

#default value

hadoopScript= "$HADOOP_PREFIX" /bin/hadoop

if [ "--script" = "$1" ]

   then

     shift

     hadoopScript=$1

     shift

fi

startStop=$1

shift

command=$1

shift

 

hadoop_rotate_log ()

{

     log =$1;

     num=5;

     if [ -n "$2" ]; then

     num=$2

     fi

     if [ -f "$log" ]; then # rotate logs

     while [ $num -gt 1 ]; do

         prev=`expr $num - 1`

         [ -f "$log.$prev" ] && mv "$log.$prev" "$log.$num"

         num=$prev

     done

     mv "$log" "$log.$num" ;

     fi

}

 

if [ -f "${HADOOP_CONF_DIR}/hadoop-env.sh" ]; then

   . "${HADOOP_CONF_DIR}/hadoop-env.sh"

fi

 

# Determine if we're starting a secure datanode, and if so, redefine appropriate variables

if [ "$command" == "datanode" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_SECURE_DN_USER" ]; then

   export HADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIR

   export HADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIR

   export HADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USER

   starting_secure_dn= "true"

fi

 

#Determine if we're starting a privileged NFS, if so, redefine the appropriate variables

if [ "$command" == "nfs3" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_PRIVILEGED_NFS_USER" ]; then

     export HADOOP_PID_DIR=$HADOOP_PRIVILEGED_NFS_PID_DIR

     export HADOOP_LOG_DIR=$HADOOP_PRIVILEGED_NFS_LOG_DIR

     export HADOOP_IDENT_STRING=$HADOOP_PRIVILEGED_NFS_USER

     starting_privileged_nfs= "true"

fi

 

if [ "$HADOOP_IDENT_STRING" = "" ]; then

   export HADOOP_IDENT_STRING= "$USER"

fi

 

 

# get log directory

if [ "$HADOOP_LOG_DIR" = "" ]; then

   export HADOOP_LOG_DIR= "$HADOOP_PREFIX/logs"

fi

 

if [ ! -w "$HADOOP_LOG_DIR" ] ; then

   mkdir -p "$HADOOP_LOG_DIR"

   chown $HADOOP_IDENT_STRING $HADOOP_LOG_DIR

fi

 

if [ "$HADOOP_PID_DIR" = "" ]; then

   HADOOP_PID_DIR=/tmp

fi

 

# some variables

export HADOOP_LOGFILE=hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME. log

export HADOOP_ROOT_LOGGER=${HADOOP_ROOT_LOGGER:- "INFO,RFA" }

export HADOOP_SECURITY_LOGGER=${HADOOP_SECURITY_LOGGER:- "INFO,RFAS" }

export HDFS_AUDIT_LOGGER=${HDFS_AUDIT_LOGGER:- "INFO,NullAppender" }

log =$HADOOP_LOG_DIR/hadoop-$HADOOP_IDENT_STRING-$command-$HOSTNAME.out

pid=$HADOOP_PID_DIR/hadoop-$HADOOP_IDENT_STRING-$command.pid

HADOOP_STOP_TIMEOUT=${HADOOP_STOP_TIMEOUT:-5}

 

# Set default scheduling priority

if [ "$HADOOP_NICENESS" = "" ]; then

     export HADOOP_NICENESS=0

fi

 

case $startStop in

 

   (start)

 

     [ -w "$HADOOP_PID_DIR" ] ||  mkdir -p "$HADOOP_PID_DIR"

 

     if [ -f $pid ]; then

       if kill -0 `cat $pid` > /dev/null 2>&1; then

         echo $command running as process `cat $pid`.  Stop it first.

         exit 1

       fi

     fi

 

     if [ "$HADOOP_MASTER" != "" ]; then

       echo rsync from $HADOOP_MASTER

       rsync -a -e ssh -- delete --exclude=.svn --exclude= 'logs/*' --exclude= 'contrib/hod/logs/*' $HADOOP_MASTER/ "$HADOOP_PREFIX"

     fi

 

     hadoop_rotate_log $ log

     echo starting $command, logging to $ log

     cd "$HADOOP_PREFIX"

     case $command in

       namenode|secondarynamenode|datanode|journalnode|dfs|dfsadmin|fsck|balancer|zkfc)

         if [ -z "$HADOOP_HDFS_HOME" ]; then

           hdfsScript= "$HADOOP_PREFIX" /bin/hdfs

         else

           hdfsScript= "$HADOOP_HDFS_HOME" /bin/hdfs

         fi

         nohup nice -n $HADOOP_NICENESS $hdfsScript --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &

       ;;

       (*)

         nohup nice -n $HADOOP_NICENESS $hadoopScript --config $HADOOP_CONF_DIR $command "$@" > "$log" 2>&1 < /dev/null &

       ;;

     esac

     echo $! > $pid

     sleep 1

     head "$log"

     # capture the ulimit output

     if [ "true" = "$starting_secure_dn" ]; then

       echo "ulimit -a for secure datanode user $HADOOP_SECURE_DN_USER" >> $ log

       # capture the ulimit info for the appropriate user

       su --shell=/bin/bash $HADOOP_SECURE_DN_USER -c 'ulimit -a' >> $ log 2>&1

     elif [ "true" = "$starting_privileged_nfs" ]; then

         echo "ulimit -a for privileged nfs user $HADOOP_PRIVILEGED_NFS_USER" >> $ log

         su --shell=/bin/bash $HADOOP_PRIVILEGED_NFS_USER -c 'ulimit -a' >> $ log 2>&1

     else

       echo "ulimit -a for user $USER" >> $ log

       ulimit -a >> $ log 2>&1

     fi

     sleep 3;

     if ! ps -p $! > /dev/null ; then

       exit 1

     fi

     ;;

 

   (stop)

 

     if [ -f $pid ]; then

       TARGET_PID=`cat $pid`

       if kill -0 $TARGET_PID > /dev/null 2>&1; then

         echo stopping $command

         kill $TARGET_PID

         sleep $HADOOP_STOP_TIMEOUT

         if kill -0 $TARGET_PID > /dev/null 2>&1; then

           echo "$command did not stop gracefully after $HADOOP_STOP_TIMEOUT seconds: killing with kill -9"

           kill -9 $TARGET_PID

         fi

       else

         echo no $command to stop

       fi

       rm -f $pid

     else

       echo no $command to stop

     fi

     ;;

 

   (*)

     echo $usage

     exit 1

     ;;

 

esac

这段代码的重点在第131行到结束。这里是真正在启动服务的代码,这个文件在调用的时候,会传入两个重要的参数start/stop xxx。用于启动或停止某些服务。以启动服务为例,其重点在第153行,这里会执行一个hdfsScript脚本。这个参数的定义在第155行,

这里可以看见它实际是hadoop的bin目录下的hdfs文件

文件的内容如下:

?

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

#!/usr/bin/env bash

 

# Licensed to the Apache Software Foundation (ASF) under one or more

# contributor license agreements.  See the NOTICE file distributed with

# this work for additional information regarding copyright ownership.

# The ASF licenses this file to You under the Apache License, Version 2.0

# (the "License"); you may not use this file except in compliance with

# the License.  You may obtain a copy of the License at

#

#     http://HdhCmsTestapache.org/licenses/LICENSE-2.0

#

# Unless required by applicable law or agreed to in writing, software

# distributed under the License is distributed on an "AS IS" BASIS,

# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

# See the License for the specific language governing permissions and

# limitations under the License.

 

# Environment Variables

#

#   JSVC_HOME  home directory of jsvc binary.  Required for starting secure

#              datanode.

#

#   JSVC_OUTFILE  path to jsvc output file.  Defaults to

#                 $HADOOP_LOG_DIR/jsvc.out.

#

#   JSVC_ERRFILE  path to jsvc error file.  Defaults to $HADOOP_LOG_DIR/jsvc.err.

 

bin=`which $0`

bin=`dirname ${bin}`

bin=`cd "$bin" > /dev/null; pwd`

 

DEFAULT_LIBEXEC_DIR= "$bin" /libexec

 

HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR}

. $HADOOP_LIBEXEC_DIR/hdfs-config.sh

 

function print_usage(){

   echo "Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND"

   echo "       where COMMAND is one of:"

   echo "  dfs                  run a filesystem command on the file systems supported in Hadoop."

   echo "  classpath            prints the classpath"

   echo "  namenode -format     format the DFS filesystem"

   echo "  secondarynamenode    run the DFS secondary namenode"

   echo "  namenode             run the DFS namenode"

   echo "  journalnode          run the DFS journalnode"

   echo "  zkfc                 run the ZK Failover Controller daemon"

   echo "  datanode             run a DFS datanode"

   echo "  dfsadmin             run a DFS admin client"

   echo "  haadmin              run a DFS HA admin client"

   echo "  fsck                 run a DFS filesystem checking utility"

   echo "  balancer             run a cluster balancing utility"

   echo "  jmxget               get JMX exported values from NameNode or DataNode."

   echo "  mover                run a utility to move block replicas across"

   echo "                       storage types"

   echo "  oiv                  apply the offline fsimage viewer to an fsimage"

   echo "  oiv_legacy           apply the offline fsimage viewer to an legacy fsimage"

   echo "  oev                  apply the offline edits viewer to an edits file"

   echo "  fetchdt              fetch a delegation token from the NameNode"

   echo "  getconf              get config values from configuration"

   echo "  groups               get the groups which users belong to"

   echo "  snapshotDiff         diff two snapshots of a directory or diff the"

   echo "                       current directory contents with a snapshot"

   echo "  lsSnapshottableDir   list all snapshottable dirs owned by the current user"

   echo "                        Use -help to see options"

   echo "  portmap              run a portmap service"

   echo "  nfs3                 run an NFS version 3 gateway"

   echo "  cacheadmin           configure the HDFS cache"

   echo "  crypto               configure HDFS encryption zones"

   echo "  storagepolicies      list/get/set block storage policies"

   echo "  version              print the version"

   echo ""

   echo "Most commands print help when invoked w/o parameters."

   # There are also debug commands, but they don't show up in this listing.

}

 

if [ $# = 0 ]; then

   print_usage

   exit

fi

 

COMMAND=$1

shift

 

case $COMMAND in

   # usage flags

   --help|-help|-h)

     print_usage

     exit

     ;;

esac

 

# Determine if we're starting a secure datanode, and if so, redefine appropriate variables

if [ "$COMMAND" == "datanode" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_SECURE_DN_USER" ]; then

   if [ -n "$JSVC_HOME" ]; then

     if [ -n "$HADOOP_SECURE_DN_PID_DIR" ]; then

       HADOOP_PID_DIR=$HADOOP_SECURE_DN_PID_DIR

     fi

 

     if [ -n "$HADOOP_SECURE_DN_LOG_DIR" ]; then

       HADOOP_LOG_DIR=$HADOOP_SECURE_DN_LOG_DIR

       HADOOP_OPTS= "$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR"

     fi

 

     HADOOP_IDENT_STRING=$HADOOP_SECURE_DN_USER

     HADOOP_OPTS= "$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING"

     starting_secure_dn= "true"

   else

     echo "It looks like you're trying to start a secure DN, but \$JSVC_HOME" \

       "isn't set. Falling back to starting insecure DN."

   fi

fi

 

# Determine if we're starting a privileged NFS daemon, and if so, redefine appropriate variables

if [ "$COMMAND" == "nfs3" ] && [ "$EUID" -eq 0 ] && [ -n "$HADOOP_PRIVILEGED_NFS_USER" ]; then

   if [ -n "$JSVC_HOME" ]; then

     if [ -n "$HADOOP_PRIVILEGED_NFS_PID_DIR" ]; then

       HADOOP_PID_DIR=$HADOOP_PRIVILEGED_NFS_PID_DIR

     fi

 

     if [ -n "$HADOOP_PRIVILEGED_NFS_LOG_DIR" ]; then

       HADOOP_LOG_DIR=$HADOOP_PRIVILEGED_NFS_LOG_DIR

       HADOOP_OPTS= "$HADOOP_OPTS -Dhadoop.log.dir=$HADOOP_LOG_DIR"

     fi

 

     HADOOP_IDENT_STRING=$HADOOP_PRIVILEGED_NFS_USER

     HADOOP_OPTS= "$HADOOP_OPTS -Dhadoop.id.str=$HADOOP_IDENT_STRING"

     starting_privileged_nfs= "true"

   else

     echo "It looks like you're trying to start a privileged NFS server, but" \

       "\$JSVC_HOME isn't set. Falling back to starting unprivileged NFS server."

   fi

fi

 

if [ "$COMMAND" = "namenode" ] ; then

   CLASS= 'org.apache.hadoop.hdfs.server.namenode.NameNode'

   HADOOP_OPTS= "$HADOOP_OPTS $HADOOP_NAMENODE_OPTS"

elif [ "$COMMAND" = "zkfc" ] ; then

   CLASS= 'org.apache.hadoop.hdfs.tools.DFSZKFailoverController'

   HADOOP_OPTS= "$HADOOP_OPTS $HADOOP_ZKFC_OPTS"

elif [ "$COMMAND" = "secondarynamenode" ] ; then

   CLASS= 'org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode'

   HADOOP_OPTS= "$HADOOP_OPTS $HADOOP_SECONDARYNAMENODE_OPTS"

elif [ "$COMMAND" = "datanode" ] ; then

   CLASS= 'org.apache.hadoop.hdfs.server.datanode.DataNode'

   if [ "$starting_secure_dn" = "true" ]; then

     HADOOP_OPTS= "$HADOOP_OPTS -jvm server $HADOOP_DATANODE_OPTS"

   else

     HADOOP_OPTS= "$HADOOP_OPTS -server $HADOOP_DATANODE_OPTS"

   fi

elif [ "$COMMAND" = "journalnode" ] ; then

   CLASS= 'org.apache.hadoop.hdfs.qjournal.server.JournalNode'

   HADOOP_OPTS= "$HADOOP_OPTS $HADOOP_JOURNALNODE_OPTS"

elif [ "$COMMAND" = "dfs" ] ; then

   CLASS=org.apache.hadoop.fs.FsShell

   HADOOP_OPTS= "$HADOOP_OPTS $HADOOP_CLIENT_OPTS"

elif [ "$COMMAND" = "dfsadmin" ] ; then

   CLASS=org.apache.hadoop.hdfs.tools.DFSAdmin

   HADOOP_OPTS= "$HADOOP_OPTS $HADOOP_CLIENT_OPTS"

elif [ "$COMMAND" = "haadmin" ] ; then

   CLASS=org.apache.hadoop.hdfs.tools.DFSHAAdmin

   CLASSPATH=${CLASSPATH}:${TOOL_PATH}

   HADOOP_OPTS= "$HADOOP_OPTS $HADOOP_CLIENT_OPTS"

elif [ "$COMMAND" = "fsck" ] ; then

   CLASS=org.apache.hadoop.hdfs.tools.DFSck

   HADOOP_OPTS= "$HADOOP_OPTS $HADOOP_CLIENT_OPTS"

elif [ "$COMMAND" = "balancer" ] ; then

   CLASS=org.apache.hadoop.hdfs.server.balancer.Balancer

   HADOOP_OPTS= "$HADOOP_OPTS $HADOOP_BALANCER_OPTS"

elif [ "$COMMAND" = "mover" ] ; then

   CLASS=org.apache.hadoop.hdfs.server.mover.Mover

   HADOOP_OPTS= "${HADOOP_OPTS} ${HADOOP_MOVER_OPTS}"

elif [ "$COMMAND" = "storagepolicies" ] ; then

   CLASS=org.apache.hadoop.hdfs.tools.StoragePolicyAdmin

elif [ "$COMMAND" = "jmxget" ] ; then

   CLASS=org.apache.hadoop.hdfs.tools.JMXGet

elif [ "$COMMAND" = "oiv" ] ; then

   CLASS=org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewerPB

elif [ "$COMMAND" = "oiv_legacy" ] ; then

   CLASS=org.apache.hadoop.hdfs.tools.offlineImageViewer.OfflineImageViewer

elif [ "$COMMAND" = "oev" ] ; then

   CLASS=org.apache.hadoop.hdfs.tools.offlineEditsViewer.OfflineEditsViewer

elif [ "$COMMAND" = "fetchdt" ] ; then

   CLASS=org.apache.hadoop.hdfs.tools.DelegationTokenFetcher

elif [ "$COMMAND" = "getconf" ] ; then

   CLASS=org.apache.hadoop.hdfs.tools.GetConf

elif [ "$COMMAND" = "groups" ] ; then

   CLASS=org.apache.hadoop.hdfs.tools.GetGroups

elif [ "$COMMAND" = "snapshotDiff" ] ; then

   CLASS=org.apache.hadoop.hdfs.tools.snapshot.SnapshotDiff

elif [ "$COMMAND" = "lsSnapshottableDir" ] ; then

   CLASS=org.apache.hadoop.hdfs.tools.snapshot.LsSnapshottableDir

elif [ "$COMMAND" = "portmap" ] ; then

   CLASS=org.apache.hadoop.portmap.Portmap

   HADOOP_OPTS= "$HADOOP_OPTS $HADOOP_PORTMAP_OPTS"

elif [ "$COMMAND" = "nfs3" ] ; then

   CLASS=org.apache.hadoop.hdfs.nfs.nfs3.Nfs3

   HADOOP_OPTS= "$HADOOP_OPTS $HADOOP_NFS3_OPTS"

elif [ "$COMMAND" = "cacheadmin" ] ; then

   CLASS=org.apache.hadoop.hdfs.tools.CacheAdmin

elif [ "$COMMAND" = "crypto" ] ; then

   CLASS=org.apache.hadoop.hdfs.tools.CryptoAdmin

elif [ "$COMMAND" = "version" ] ; then

   CLASS=org.apache.hadoop.util.VersionInfo

elif [ "$COMMAND" = "debug" ]; then

   CLASS=org.apache.hadoop.hdfs.tools.DebugAdmin

elif [ "$COMMAND" = "classpath" ]; then

   if [ "$#" -gt 0 ]; then

     CLASS=org.apache.hadoop.util.Classpath

   else

     # No need to bother starting up a JVM for this simple case.

     if $cygwin; then

       CLASSPATH=$(cygpath -p -w "$CLASSPATH" 2>/dev/null)

     fi

     echo $CLASSPATH

     exit 0

   fi

else

   CLASS= "$COMMAND"

fi

 

# cygwin path translation

if $cygwin; then

   CLASSPATH=$(cygpath -p -w "$CLASSPATH" 2>/dev/null)

   HADOOP_LOG_DIR=$(cygpath -w "$HADOOP_LOG_DIR" 2>/dev/null)

   HADOOP_PREFIX=$(cygpath -w "$HADOOP_PREFIX" 2>/dev/null)

   HADOOP_CONF_DIR=$(cygpath -w "$HADOOP_CONF_DIR" 2>/dev/null)

   HADOOP_COMMON_HOME=$(cygpath -w "$HADOOP_COMMON_HOME" 2>/dev/null)

   HADOOP_HDFS_HOME=$(cygpath -w "$HADOOP_HDFS_HOME" 2>/dev/null)

   HADOOP_YARN_HOME=$(cygpath -w "$HADOOP_YARN_HOME" 2>/dev/null)

   HADOOP_MAPRED_HOME=$(cygpath -w "$HADOOP_MAPRED_HOME" 2>/dev/null)

fi

 

export CLASSPATH=$CLASSPATH

 

HADOOP_OPTS= "$HADOOP_OPTS -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,NullAppender}"

 

# Check to see if we should start a secure datanode

if [ "$starting_secure_dn" = "true" ]; then

   if [ "$HADOOP_PID_DIR" = "" ]; then

     HADOOP_SECURE_DN_PID= "/tmp/hadoop_secure_dn.pid"

   else

     HADOOP_SECURE_DN_PID= "$HADOOP_PID_DIR/hadoop_secure_dn.pid"

   fi

 

   JSVC=$JSVC_HOME/jsvc

   if [ ! -f $JSVC ]; then

     echo "JSVC_HOME is not set correctly so jsvc cannot be found. jsvc is required to run secure datanodes. "

     echo "Please download and install jsvc from http://archive.apache.org/dist/commons/daemon/binaries/ " \

       "and set JSVC_HOME to the directory containing the jsvc binary."

     exit

   fi

 

   if [[ ! $JSVC_OUTFILE ]]; then

     JSVC_OUTFILE= "$HADOOP_LOG_DIR/jsvc.out"

   fi

 

   if [[ ! $JSVC_ERRFILE ]]; then

     JSVC_ERRFILE= "$HADOOP_LOG_DIR/jsvc.err"

   fi

 

   exec "$JSVC" \

            -Dproc_$COMMAND -outfile "$JSVC_OUTFILE" \

            -errfile "$JSVC_ERRFILE" \

            -pidfile "$HADOOP_SECURE_DN_PID" \

            -nodetach \

            -user "$HADOOP_SECURE_DN_USER" \

             -cp "$CLASSPATH" \

            $JAVA_HEAP_MAX $HADOOP_OPTS \

            org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter "$@"

elif [ "$starting_privileged_nfs" = "true" ] ; then

   if [ "$HADOOP_PID_DIR" = "" ]; then

     HADOOP_PRIVILEGED_NFS_PID= "/tmp/hadoop_privileged_nfs3.pid"

   else

     HADOOP_PRIVILEGED_NFS_PID= "$HADOOP_PID_DIR/hadoop_privileged_nfs3.pid"

   fi

 

   JSVC=$JSVC_HOME/jsvc

   if [ ! -f $JSVC ]; then

     echo "JSVC_HOME is not set correctly so jsvc cannot be found. jsvc is required to run privileged NFS gateways. "

     echo "Please download and install jsvc from http://archive.apache.org/dist/commons/daemon/binaries/ " \

       "and set JSVC_HOME to the directory containing the jsvc binary."

     exit

   fi

 

   if [[ ! $JSVC_OUTFILE ]]; then

     JSVC_OUTFILE= "$HADOOP_LOG_DIR/nfs3_jsvc.out"

   fi

 

   if [[ ! $JSVC_ERRFILE ]]; then

     JSVC_ERRFILE= "$HADOOP_LOG_DIR/nfs3_jsvc.err"

   fi

 

   exec "$JSVC" \

            -Dproc_$COMMAND -outfile "$JSVC_OUTFILE" \

            -errfile "$JSVC_ERRFILE" \

            -pidfile "$HADOOP_PRIVILEGED_NFS_PID" \

            -nodetach \

            -user "$HADOOP_PRIVILEGED_NFS_USER" \

            -cp "$CLASSPATH" \

            $JAVA_HEAP_MAX $HADOOP_OPTS \

            org.apache.hadoop.hdfs.nfs.nfs3.PrivilegedNfsGatewayStarter "$@"

else

   # run it

   exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $HADOOP_OPTS $CLASS "$@"

fi

这段代码的重点在第134行到219行和第304行。首先看第134行到219行,这段代码虽然很长但全是一堆if else语句。其逻辑也很简单,就是根据传入的COMMAND的值来为CLASS和HADOOP_OPTS赋值。然后是第304行,执行CLASS类。以第134行的namenode类为例,这里CLASS的值为org.apache.hadoop.hdfs.server.namenode.NameNode,这是一个java类随后便会执行这个类启动namenode。

如果想要debug hdfs的源码,最好在这里设置远程调试。因为这里有单个的服务类与启动参数,可以准确的定位需要的服务。

以上就是Hadoop源码分析之启动及脚本剖析的详细内容,本系列下一篇文章传送门 Hadoop源码分析四远程debug调试 更多关于Hadoop源码分析分析的资料请继续关注其它相关文章!

原文链接:https://blog.csdn.net/qq_39210987/article/details/113922124

查看更多关于Hadoop源码分析三启动及脚本剖析的详细内容...

  阅读:15次