mysql - How to cleanse 'emojis' from JSON file, before inputting into SQL -


i'm taking text fields json file , putting them mysql database. sql not care emoji code \xf0\x9f\x98\x81 instance. says 'incorrect string value' when attempting insert data row.

i'm bit out of depth here, take utf-8 code emoji, in case smiley face.

i'm using java-based etl client (spoon pentaho) take json data , pass mysql database.

the thing is, i'm not sure how write code cleanse this. pentaho, can use java or javascript attempt fix this.

the json file, in browser, seems render emojis. pentaho seems write them question mark (?) in output files ... however, internally seems pass actual symbol/ utf-8 code, hence error.

i'm confused because java-based system passing around utf code '\xf0\x9f\x98\x81' .. correct?

well no, it's not, because mysql gladly insert string '\xf0\x9f\x98\x81' ... row, won't insert 'smiley face' or tiny man or whatever row ... physically being passed, , how can fix code? ... bit confused, maybe need read on utf-8 here. feel there 2 levels of data here ...

to make things more interesting, ms sql seems reference symbol (upon attempted insertion , subsequent error) it's utf-16 code.

the question is, how can reference data in java/ javascript , destroy emojis?

alright ... easy fix...

if helps else...

i used custom java script code found elsewhere on stack overflow....

var new_custom_fields = custom_fields.replace(/([\ue000-\uf8ff]|\ud83c[\udf00-\udfff]|\ud83d[\udc00-\uddff])/g, '')

i believe javascript speaks in utf-16, yeah. has removed emoji garbage free form fields in zendesk (zendesk not permit input mask ticket fields, order numbers, sadly).


Comments