Home » Java » java – Apache Beam create timeseries from event stream-Exceptionshub

java – Apache Beam create timeseries from event stream-Exceptionshub

Posted by: admin February 25, 2020 Leave a comment

Questions:

I am trying to create a timeseries of the count of events that happened over a given time.

The events are encoded as

PCollection<KV<String, Long>> events;

Where the String is the id of the event source, and long is the timestamp of the event.

What I want out is a PCollection<Timeseries> of timeseries that have the form

class Timeseries  {
  String id;
  List<TimeseriesWindow> windows;
}

class TimeseriesWindow  {
  long timestamp;
  long count;
}

Toy example with a fixed window size (is this the correct term?) of 10 seconds, and an total timeseries duration of 60 seconds:

Input:

[("one", 1), ("one", 13), ("one", 2), ("one", 43), ("two", 3)]

Output:

[
  {
    id: "one"
    windows: [
      {
        timestamp: 0,
        count: 2
      },
      {
        timestamp: 10,
        count: 1
      },
      {
        timestamp: 20,
        count: 0
      },
      {
        timestamp: 30,
        count: 0
      },
      {
        timestamp: 40,
        count: 1
      },
      {
        timestamp: 50,
        count: 0
      }
    ]
  },
  {
    id: "two"
    windows: [
      {
        timestamp: 0,
        count: 1
      },
      {
        timestamp: 10,
        count: 0
      },
      {
        timestamp: 20,
        count: 0
      },
      {
        timestamp: 30,
        count: 0
      },
      {
        timestamp: 40,
        count: 0
      },
      {
        timestamp: 50,
        count: 0
      }
    ]
  }
]

I hope this makes sense 🙂

How to&Answers:

You can do a GroupByKey to transform your input into

[
    ("one", [1, 13, 2, 43]),
    ("two", [3]),
]

at which point you can apply a DoFn to convert the list of integers into a Timeseries object (e.g. by creating the list of TimeseriesWindow at the appropriate times, and then iterating over the values incrementing the counts.)

You may also look into the builtin windowing capabilities to see if that will meet your needs.