im trying save spark dataframe (of more 20g) single json file in amazon s3, code save dataframe :
dataframe.repartition(1).save("s3n://mybucket/testfile","json")
but im getting error s3 "your proposed upload exceeds maximum allowed size", know maximum file size allowed amazon 5gb.
is possible use s3 multipart upload spark? or there way solve this?
btw need data in single file because user going download after.
*im using apache spark 1.3.1 in 3-node cluster created spark-ec2 script.
thanks lot
jg
i try separating large dataframe series of smaller dataframes append same file in target.
df.write.mode('append').json(yourtargetpath)