i want input list<string[]> ,
the output dictionary keys unique strings used index , values array of floats each position in array representing count of key string[] in list<string[]>
so far here attempted
static class ct { //counts terms in array public static dictionary<string, float[]> termfreq(list<string[]> text) { list<string> unique = new list<string>(); foreach (string[] s in text) { list<string> groups = s.distinct().tolist(); unique.addrange(groups); } string[] index = unique.distinct().toarray(); dictionary<string, float[]> countset = new dictionary<string, float[]>(); return countset; } } static void main() { /* local variable definition */ list<string[]> doc = new list<string[]>(); string[] = { "that", "is", "a", "cat" }; string[] b = { "that", "bat", "flew","over","the", "cat" }; doc.add(a); doc.add(b); // console.writeline(doc); dictionary<string, float[]> ret = ct.termfreq(doc); foreach (keyvaluepair<string, float[]> kvp in ret) { console.writeline("key = {0}, value = {1}", kvp.key, kvp.value); } console.readline(); } i got stuck on dictionary part. effective way implement this?
it sounds use like:
var dictionary = doc .selectmany(array => array) .distinct() .todictionary(word => word, word => doc.select(array => array.count(x => x == word)) .toarray()); in other words, first find distinct set of words, each word, create mapping.
to create mapping, @ each array in original document, , find count of occurrences of word in array. (so each array maps int.) use linq perform mapping on whole document, toarray creating int[] particular word... , that's value word's dictionary entry.
note creates dictionary<string, int[]> rather dictionary<string, float[]> - seems more sensible me, cast result of count float if really wanted to.
Comments
Post a Comment