Home » Java » java – Flink CEP not Working in event time but working in Processing Time-Exceptionshub

java – Flink CEP not Working in event time but working in Processing Time-Exceptionshub

Posted by: admin February 25, 2020 Leave a comment

Questions:

When I am using Flink CEP code for processing time (which is by default config) I am able to get the required patter match but while configuring the env to Event Time I am unable to get any pattern match.

 def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
    env.enableCheckpointing(3000) // checkpoint every 3000 msec
     val lines = env.addSource(consumerKafkaSource.consume("bank_transaction_2", "192.168.2.201:9092", "192.168.2.201:2181", "http://192.168.2.201:8081"))

  val eventdate = ExtractAndAssignEventTime.assign(lines, "unix", "datetime", 3) //Extracting date time here

    val event = eventdate.keyBy(v => v.get("customer_id").toString.toInt)
   val pattern1 = Pattern.begin[GenericRecord]("start").where(v=>v.get("state").toString=="FAILED").next("d").where(v=>v.get("state").toString=="FAILED")
      val patternStream = CEP.pattern(event, pattern1)
    val warnID = patternStream.sideOutputLateData(latedata).select(value =>  {
      val v = value.mapValues(c => c.toList.toString)
      Json(DefaultFormats).write(v).replace("\\"", "\"")
        //.replace("List(","{").replace(")","}")
    })
    val latedatastream = warnID.getSideOutput(latedata)
    latedatastream.print("late_data")


    warnID.print("warning")
    event.print("event")

Timestamp extraction code

object ExtractAndAssignEventTime {
  def assign(stream:DataStream[GenericRecord],timeFormat:String,timeColumn:String,OutofOrderTime:Int ):DataStream[GenericRecord] ={
    if(!(timeFormat.equalsIgnoreCase("Unix"))){
      val EventTimeStream=stream.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[GenericRecord](Time.seconds(3)) {
        override def extractTimestamp(t: GenericRecord): Long = {
          new java.text.SimpleDateFormat(timeFormat).parse(t.get(timeColumn).toString).getTime
        }
      })
      EventTimeStream
    }
    else{
      val EventTimeStream=stream.assignTimestampsAndWatermarks(new BoundedOutOfOrdernessTimestampExtractor[GenericRecord](Time.seconds(OutofOrderTime)) {
        override def extractTimestamp(t: GenericRecord): Long = {
          (t.get(timeColumn).toString.toLong)
        }
      })
      EventTimeStream
    }
  }

Please help me solve this issue. Thanks in advance.!

How to&Answers:

Since You are using the AssingerWithPeriodicWatermark You also need to set up the setAutowatermarkInterval so that Flink will use this interval to generate watermarks.

You can do this by calling env.getConfig.setAutoWatermarkInterval([interval]).

For Event Time CEP bases on Watermarks so If they are not generated then there will be basically no output.

Answer:

I had this same problem and I “solved” it just now, but the answer doesn’t make much sense (at least to me), as you’ll see.

Explanation:

In my original code, I had this:

var env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.setParallelism(1)
env.getConfig.setAutoWatermarkInterval(1)

...

var stream : DataStream[String] = env.readTextFile("/home/luca/Desktop/input")


var tupleStream = stream.map(new S2TMapFunction())
tupleStream.assignTimestampsAndWatermarks(new PlacasPunctualTimestampAssigner())

val pattern = Pattern.begin[(String,Double,Double,String,Int,Int)]("follow").where(new SameRegionFunction())

val patternStream = CEP.pattern(newTupleStream,pattern)

val result = patternStream.process(new MyPatternProcessFunction())

Acording to my logging, I saw that neither SameRegionFunction nor MyPatternProcessFunction were being executed, what is very unexpected, to say the least.

Answer:

Since I was clueless, I decided to test making my stream go through one more transformation function, just to check if my events really were being inserted in the stream. So, I submitted tupleStream to a map operation, generating newTupleStream, like this:

var env = StreamExecutionEnvironment.getExecutionEnvironment
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
env.setParallelism(1)
env.getConfig.setAutoWatermarkInterval(1)

...

var stream : DataStream[String] = env.readTextFile("/home/luca/Desktop/input")


/* I created 'DoNothingMapFunction', where the output event = input event*/
var tupleStream = stream.map(new S2TMapFunction())
var newTupleStream = tupleStream.assignTimestampsAndWatermarks(new PlacasPunctualTimestampAssigner()).map(new DoNothingMapFunction())


val pattern = Pattern.begin[(String,Double,Double,String,Int,Int)]("follow").where(new SameRegionFunction())

val patternStream = CEP.pattern(newTupleStream,pattern)

val result = patternStream.process(new MyPatternProcessFunction())

And then SameRegionFunction and MyPatternProcessFunction decided to run.

Obs:

I changed the line:

var newTupleStream = tupleStream.assignTimestampsAndWatermarks(new PlacasPunctualTimestampAssigner()).map(new DoNothingMapFunction())

to this:

var newTupleStream = tupleStream.assignTimestampsAndWatermarks(new PlacasPunctualTimestampAssigner())

and it also worked. Apparently just another level of indirection is enough to make it work, although it’s not clear to me why it happens.