pastergeorgia.blogg.se

Install apache spark over hadoop cluster
Install apache spark over hadoop cluster















Added JAR file:/home/hadoop/piapplication/count.jar.MemoryStore started with capacity 267.3 MB.successfully started service 'sparkDriver' on port 42954.If you carefully read the following output, you will find different things, such as −

install apache spark over hadoop cluster install apache spark over hadoop cluster

The OK letting in the following output is for user identification and that is the last line of the program. If it is executed successfully, then you will find the output given below. Spark-submit -class SparkWordCount -master local wordcount.jar Submit the spark application using the following command − Jar -cvf wordcount.jar SparkWordCount*.class spark-core_2.10-1.3.0.jar/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar Here, wordcount is the file name for jar file. $ scalac -classpath "spark-core_2.10-1.3.0.jar:/usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar" SparkPi.scalaĬreate a jar file of the spark application using the following command. Here, /usr/local/spark/lib/spark-assembly-1.4.0-hadoop2.6.0.jar is a Hadoop support jar taken from Spark library. This command should be executed from the spark-application directory. Step 2: Compile programĬompile the above program using the command given below.

#Install apache spark over hadoop cluster download

Spark core jar is required for compilation, therefore, download spark-core_2.10-1.3.0.jar from the following link Spark core jar and move the jar file from download directory to spark-application directory. Execute all steps in the spark-application directory through the terminal. Use the following steps to submit this application. Note − While transforming the inputRDD into countRDD, we are using flatMap() for tokenizing the lines (from text file) into words, map() method for counting the word frequency and reduceByKey() method for counting each word repetition. Save the above program into a file named SparkWordCount.scala and place it in a user-defined directory named spark-application. * saveAsTextFile method is an action that effects on the RDD */ Val count = input.flatMap(line ⇒ line.split(" ")) * Transform the inputRDD into countRDD */ *creating an inputRDD to read text file (in.txt) through Spark context*/ * /usr/local/spark = Spark Home Nil = jars Map = environment */ * local = master URL Word Count = application name */

install apache spark over hadoop cluster

Val sc = new SparkContext( "local", "Word Count", "/usr/local/spark", Nil, Map(), Map()) Look at the following program − SparkWordCount.scala People are not as beautiful as they look,

install apache spark over hadoop cluster

The following text is the input data and the file named is in.txt. Here, we consider the same example as a spark application. Let us take the same example of word count, we used before, using shell commands. Therefore, you do not have to configure your application for each one. It uses all respective cluster managers through a uniform interface. Spark application, using spark-submit, is a shell command used to deploy the Spark application on a cluster.















Install apache spark over hadoop cluster