i trying create bigrams text following different approaches proposed on web. simplest example following:
library(rtexttools) texts <- c("this first document.", "this second file.", "this third text.") matrix <- create_matrix(texts,ngramlength=3) which results in:
error in simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allterms), : 'i, j, v' different lengths in addition: warning messages: 1: in mclapply(unname(content(x)), termfreq, control) : scheduled cores encountered errors in user code 2: in simple_triplet_matrix(i = i, j = j, v = as.numeric(v), nrow = length(allterms), : nas introduced coercion this set up:
r version 3.1.1 (2014-07-10) platform: x86_64-pc-linux-gnu (64-bit) locale: [1] lc_ctype=en_gb.utf-8 lc_numeric=c [3] lc_time=en_gb.utf-8 lc_collate=en_gb.utf-8 [5] lc_monetary=en_gb.utf-8 lc_messages=en_gb.utf-8 [7] lc_paper=en_gb.utf-8 lc_name=c [9] lc_address=c lc_telephone=c [11] lc_measurement=en_gb.utf-8 lc_identification=c attached base packages: [1] graphics grdevices utils datasets stats methods base other attached packages: [1] tm_0.6-2 nlp_0.1-8 rtexttools_1.4.2 sparsem_1.6 [5] snowballc_0.5.1 reshape2_1.4.1 ggplot2_1.0.0 plyr_1.8.1 loaded via namespace (and not attached): [1] bitops_1.0-6 catools_1.17.1 class_7.3-11 [4] codetools_0.2-9 colorspace_1.2-4 digest_0.6.8 [7] e1071_1.6-4 foreach_1.4.2 glmnet_2.0-2 [10] grid_3.1.1 gtable_0.1.2 ipred_0.9-4 [13] iterators_1.0.7 lattice_0.20-29 lava_1.4.1 [16] mass_7.3-34 matrix_1.1-4 maxent_1.3.3.1 [19] munsell_0.4.2 nnet_7.3-8 parallel_3.1.1 [22] prodlim_1.5.1 proto_0.3-10 randomforest_4.6-10 [25] rcpp_0.11.3 rpart_4.1-8 scales_0.2.4 [28] slam_0.1-32 splines_3.1.1 stringr_0.6.2 [31] survival_2.37-7 tau_0.0-18 tcltk_3.1.1 [34] tools_3.1.1 tree_1.0-36 any hints on happening?
Comments
Post a Comment