i work large data sets (1.5gb+) , partial string searches on it.
i able write script work, takes long:
fhand = open('c:/users/promotor/documents/tce-sagres/tce-pb-sagres-empenhos_esfera_municipal.txt','r') pergunta = raw_input('pesquisa: ') fresult = open('resultado.csv','w') line in fhand : #linha = linha + 0.001 #update_progress(int(linha)*1000) if pergunta in line : print line fresult.write(line) print "terminado.""" i wondering if there faster way on pandas. tried str.contains, search on column. wondering if there faster way. tried "str.contains" search on 1 column.
best regards.
you iterating on loop , taking lot of time. recommend reading whole file string , using regex match pattern.
try following code,
import re open(your_file_name,'r') f: lines=f.read() name = input('pattern :') pattern_to_match = r'(?<=\n).*%s.*(?=\n)'%name matched_pattern = re.findall(pattern_to_match, lines, re.ignorecase) print (matched_pattern)
Comments
Post a Comment