Home » Java » java – Requested array size exceeds VM limit in Spark Executors-Exceptionshub

java – Requested array size exceeds VM limit in Spark Executors-Exceptionshub

Posted by: admin February 25, 2020 Leave a comment

Questions:

I have one directory having many log files.I have to parse log files based on file name.
So 1st approach we have done is create a wholeTextFileRDD to parallelize my data as below

Now when I pass file content to individual parser for parsing log files.I am getting error as java.lang.OutOfMemoryError: Requested array size exceeds VM limit for files which are having data above 700 MB

Spark Deploy mode is Cluster
Driver Memory-20 GB
Executor Memory-16 GB

 val fileRDD = spark.sparkContext.wholeTextFiles(logBundle.path.trim)

 fileRDD.map(tupleOfFileAndContent =>parseLog(tupleOfFileAndContent._2))


  def parseLog(logfilecontent: String): List[Map[String, String]] = {
        val txt = new Scanner(logfilecontent)
        var linNum = 1
        val logEntries = new ListBuffer[Map[String, String]]()
        while (txt.hasNextLine) {
          val line = txt.nextLine()
          var logEntry = Map[String, String]()
          if (line.startsWith("    EVENT SENDING:")){
            logEntry += ("line_number" -> linNum.toString)
            logEntry += ("event_sending" -> ( line.splitAt(18)._2.trim))
            logEntries += logEntry

          }
          linNum += 1
        }
        logEntries.toList
      }


How to&Answers: