Home » Python » python – How to make calls to REST end-point from PySpark?-Exceptionshub

python – How to make calls to REST end-point from PySpark?-Exceptionshub

Posted by: admin February 24, 2020 Leave a comment

Questions:

I have the following Spark dataframe df:

 col1    col2    col3
  1       0.5     10
  1       0.3     11
  5       1.4     1
  3       1.5     2
  1       0.9     10
  3       0.4     7
  1       1.2     9
  3       0.1     11

I want to make a call to REST end-point that can receive N rows of data as an input, and produces the outputs of the same dimensions, e.g.

 col1    col2    col3
  3       0.5     11
  3       0.3     9
  4       1.1     1
  5       1.3     2
  1       0.8     11
  2       0.3     8
  2       1.3     8
  2       0.2     10

Thus, it is possible to pass the whole input data as an input. Or, if the input data set is huge, then it’s also possible to make many calls to this REST endpoint submitting batches of data, and then merging all outputs into the output dataset.

How can I implement any of these two solutions in Spark? I was searching for PySpark REST clients, but haven’t found anything. I was also thinking of using mapPartitions

Any recommendations and examples are highly welcome.

Thanks.

How to&Answers: