Home » Java » How to solve java.lang.ClassNotFoundException: org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer-Exceptionshub

How to solve java.lang.ClassNotFoundException: org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer-Exceptionshub

Posted by: admin February 25, 2020 Leave a comment

Questions:

I am using tinkerpop + Janus Graph + Spark

build.gradle

compile group: 'org.apache.tinkerpop', name: 'spark-gremlin', version: '3.1.0-incubating'

below is some critical configuration that we have

spark.serializer: org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer

In the logs corresponding long entry which refer the jar containing the above class is loaded

{"@timestamp":"2020-02-18T07:24:21.720+00:00","@version":1,"message":"Added JAR /opt/data/janusgraph/applib2/spark-gremlin-827a65ae26.jar at spark://gdp-identity-stage.target.com:38876/jars/spark-gremlin-827a65ae26.jar with timestamp 1582010661720","logger_name":"o.a.s.SparkContext","thread_name":"SparkGraphComputer-boss","level":"INFO","level_value":20000}

but my spark job submitted by SparkGraphComputer is failed, when we see executor logs, we saw

Caused by: java.lang.ClassNotFoundException: org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer

Why this exception is coming even though the corresponding jar is loaded?

Anyone, please suggest on this.

As I mention seeing this exception in spark executor when I opened one of the worker logs below complete exception

Spark Executor Command: "/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.222.b10-0.el7_6.x86_64/bin/java" "-cp" "/opt/spark/spark-2.4.0/conf/:/opt/spark/spark-2.4.0/jars/*:/opt/hadoop/hadoop-3_1_1/etc/hadoop/" "-Xmx56320M" "-Dspark.driver.port=43137" "-XX:+UseG1GC" "-XX:+PrintGCDetails" "-XX:+PrintGCTimeStamps" "-Xloggc:/opt/spark/gc.log" "-Dtinkerpop.gremlin.io.kryoShimService=org.apache.tinkerpop.gremlin.hadoop.structure.io.HadoopPoolShimService" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://[email protected]:43137" "--executor-id" "43392" "--hostname" "192.168.192.10" "--cores" "6" "--app-id" "app-20200220094335-0001" "--worker-url" "spark://[email protected]:36845"
========================================

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
    at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:281)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.tinkerpop.gremlin.spark.structure.io.gryo.GryoSerializer
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
    at org.apache.spark.SparkEnv$.instantiateClass$1(SparkEnv.scala:259)
    at org.apache.spark.SparkEnv$.instantiateClassFromConf$1(SparkEnv.scala:280)
    at org.apache.spark.SparkEnv$.create(SparkEnv.scala:283)
    at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:200)
    at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:221)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65)
    at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    ... 4 more

when I am setting the spark. jars property on graph, am passing this jar location also

Jar which we created from the application is of fat jar type means it contains the actual code and all the required dependencies also, please see below screenshotsenter image description here .enter image description here

How to&Answers:

If you look at the logs, you see this

java" "-cp" "/opt/spark/spark-2.4.0/conf/:/opt/spark/spark-2.4.0/jars/*:/opt/hadoop/hadoop-3_1_1/etc/hadoop/"

Unless you have the gremlin JARs in your /opt/spark/spark-2.4.0/jars/* folder on each Spark worker, then the class you’re using doesn’t exist.

The recommended way to include it for your specific application would be the Gradle Shadow plugin rather than --packages or spark.jars