i need verifying training steps below , can add classifier -loadclassifier list?
-loadclassifier sample-ner-model.ser.gz, classifiers/english.all.3class.distsim.crf.ser.gz,classifiers/english.conll.4class.distsim.crf.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz \
sample.txt
the fate of lehman brothers, beleaguered investment bank, hung in balance on sunday federal reserve officials , leaders of major financial institutions continued gather in emergency meetings trying complete plan rescue stricken bank. several possible plans emerged talks, held @ federal reserve bank of new york , led timothy r. geithner, president of new york fed, , treasury secretary henry m. paulson jr.
step 1 tokenize
java -cp stanford-ner.jar edu.stanford.nlp.process.ptbtokenizer sample.txt > sample.tok
the fate of lehman brothers , beleaguered investment bank , hung in balance
. . .
president of new york fed , , treasury secretary henry m. paulson jr. .
step 2 classify
need better command replace eol "\n" "\to\n" . perl chomp not working. edited sample.tzv manually.
perl -ne 'chomp; print "$_\to"' sample.tok > sample.tsv
the 0 fate 0 of 0 lehman 0 brothers 0 , 0 0 beleaguered 0 investment 0 bank 0 , 0 hung 0 in 0 0 balance 0 . . . president 0 of 0 0 new 0 york 0 fed 0 , 0 , 0 treasury 0 secretary 0 henry 0 m. 0 paulson 0 jr. 0 . 0
step 3 adjust properties (sample.prop)
# location of training file trainfile = sample.tsv # location save (serialize) # classifier; adding .gz @ end automatically gzips file, # making smaller, , faster load serializeto = sample-ner-model.ser.gz . . . usetypeysequences=true wordshape=chris2uselc step 4 modify gold standard (sample.tsv)
the 0 fate 0 of 0 lehman org brothers org , 0 0 beleaguered 0 investment 0 bank 0 , 0 hung 0 in 0 0 balance 0 . . . president 0 of 0 0 new org york org fed org , 0 , 0 treasury pers secretary pers henry pers m. pers paulson pers jr. pers . 0
step 4 train
java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.crfclassifier -prop sample.prop
step 5 test , verify
java -cp stanford-ner.jar edu.stanford.nlp.ie.crf.crfclassifier -loadclassifier sample-ner-model.ser.gz -testfile sample.tsv
production maybe:
java -mx1g edu.stanford.nlp.ie.nerclassifiercombiner -textfile sample.txt -ner.model \ -loadclassifier classifiers/english.all.3class.distsim.crf.ser.gz,classifiers/english.conll.4class.distsim.crf.ser.gz,classifiers/english.muc.7class.distsim.crf.ser.gz \ -outputformat tabbedentities -textfile sample.txt > samplenew.tsv
this seems correct me.
yes, if build new model stanford corenlp can add list.
note models run in order, , earlier ner taggers in list tag first, , later models cannot overwrite tags (e.g. org, per) written previous ones (except o of course). put models matters, closer front takes priority.
also ner.combinationmode = high_recall allow every classifier in list apply of tags. ner.combinationmode = normal means first classifier applies tag (e.g. org, per) can apply it. can set ner.combinationmode in .prop file.
Comments
Post a Comment