i have read questions asking things multiprocessing module serialism on workers cstringio.stringo can't pickled. however, have cstringio.stringi (notice last capital 'i', 'input' presume). have no idea how fix this. im trying map series of s3 bucket keys pdf objects using pypdf2. far if on local system, if on workers goes wrong. code error message:
def createpdfobject(pdfidkey): pdfid = str(pdfidkey[0]) pdfkey = pdfidkey[1] pdfkey.get_contents_to_filename(pdfid+'.pdf') try: pdfobj = pdf.pdffilereader(pdfid+'.pdf') except pdf.utils.pdfreaderror ex: pdfobj = none message = "traceback: {0}\nexception: {1} pdf id: {2}. arguments:\n{3!r}\nignoring file. returned nonetype"\ .format(traceback.format_exc(), type(ex).__name__, pdfid, ex.args) print message pass return (pdfid, pdfobj) pdfdatardd = filesrdd.map(lambda x: createpdfobject(x)) the error without full traceback on databricks server (which super long):
picklingerror: can't pickle <type 'cstringio.stringi'>: attribute lookup cstringio.stringi failed
Comments
Post a Comment