How to configure an EMR Spark Datasource

This article is for comments to:

Please comment +1 if this How to is important to you.

1 Like

It seems using the S3GlobReaderBatchKwargsGenerator will translate and s3:// path into s3a:// which messes up with spark being able to open file within the EMRFS context.
I might be doing something wrong, having documentation will uncover if it’s a bug or not

Just joined so I could +1 this! I’ve been playing with ge for a few weeks locally now. Good work so far, very impressed! I see a lot of potential to assist us with our data quality problems and the next step would be to try it on a bigger scale on our AWS instance.
Keep up the good work :+1: