python - file separation on the basis of matching character -


  atom    856  ce alys 104       0.809   0.146  26.161  0.54 29.14           c   atom    857  ce blys 104       0.984  -0.018  26.394  0.46 31.19           c   atom    858  nz alys 104       1.988   0.923  26.662  0.54 33.17           n   atom    859  nz blys 104       1.708   0.302  27.659  0.46 37.61           n   atom    860  oxt lys 104      -0.726  -6.025  27.180  1.00 26.53           o   atom    862  n   lys b 276      17.010 -16.138   9.618  1.00 41.00           n   atom    863  ca  lys b 276      16.764 -16.524  11.005  1.00 31.05           c   atom    864  c   lys b 276      16.428 -15.306  11.884  1.00 26.93           c   atom    865  o   lys b 276      16.258 -15.447  13.090  1.00 29.67           o   atom    866  cb  lys b 276      17.863 -17.347  11.617  1.00 33.62           c 

i have above text file , need make 2 text files on basis of differences @ position 21 in line. wrote script can print required results. if not know character @ column 21, how can job. following script tried. suppose not know whether line 21 "a" , "b" or "b" , "g" or other combination , need separate on basis of line 21. how can this?

  import sys    fn in sys.argv[1:]:      f=open(fn,'r')       while 1:         line=f.readline()         if not line: break         if line[21:22] == 'b':            chns = line[0:80]            print chns 

  • storing previous value of 21st character previous line, adding newline every non-match (which means group of same letters) prints grouped lines based on 21st character.

  • take note groups lines matching 21st character based on line sequence in file, means non-sorted lines have more 1 separated groups of same 21st character.

    modified file show case:

    atom    856  ce alys 104       0.809   0.146  26.161  0.54 29.14           c atom    857  ce blys 104       0.984  -0.018  26.394  0.46 31.19           c atom    862  n   lys b 276      17.010 -16.138   9.618  1.00 41.00           n atom    863  ca  lys b 276      16.764 -16.524  11.005  1.00 31.05           c atom    864  c   lys b 276      16.428 -15.306  11.884  1.00 26.93           c atom    865  o   lys b 276      16.258 -15.447  13.090  1.00 29.67           o atom    866  cb  lys b 276      17.863 -17.347  11.617  1.00 33.62           c atom    858  nz alys 104       1.988   0.923  26.662  0.54 33.17           n atom    859  nz blys 104       1.708   0.302  27.659  0.46 37.61           n atom    860  oxt lys 104      -0.726  -6.025  27.180  1.00 26.53           o 

    code producing case (without sorting lines):

    import sys  fn in sys.argv[1:]:      open(fn,'r') file:         prev = 0         line in file:             line = line.strip()             if line[21:22] != prev:                 # new line separator each group                 print ''             print line             prev = line[21:22] 

    a sample output showing case:

    atom    856  ce alys 104       0.809   0.146  26.161  0.54 29.14           c atom    857  ce blys 104       0.984  -0.018  26.394  0.46 31.19           c  atom    862  n   lys b 276      17.010 -16.138   9.618  1.00 41.00           n atom    863  ca  lys b 276      16.764 -16.524  11.005  1.00 31.05           c atom    864  c   lys b 276      16.428 -15.306  11.884  1.00 26.93           c atom    865  o   lys b 276      16.258 -15.447  13.090  1.00 29.67           o atom    866  cb  lys b 276      17.863 -17.347  11.617  1.00 33.62           c  atom    858  nz alys 104       1.988   0.923  26.662  0.54 33.17           n atom    859  nz blys 104       1.708   0.302  27.659  0.46 37.61           n atom    860  oxt lys 104      -0.726  -6.025  27.180  1.00 26.53           o 
  • so, if want only 1 group each same 21st character, putting lines in list , sorting using list.sort() do.

    code (sorting lines first before grouping):

    import sys  fn in sys.argv[1:]:      open(fn,'r') file:          lines = file.readlines()          # creates list or pairs (21st char, line) within list         lines = [ [line[21:22], line.strip() ] line in lines ]          # sorts lines based on key (21st char)         lines.sort()          # brings list of lines original state,          # order not reverted since sorted         lines = [ line[1] line in lines ]          prev = 0         line in lines:             if line[21:22] != prev:                 # new line separator each group                 print ''             print line             prev = line[21:22] 

    outputs to:

    atom    856  ce alys 104       0.809   0.146  26.161  0.54 29.14           c atom    857  ce blys 104       0.984  -0.018  26.394  0.46 31.19           c atom    858  nz alys 104       1.988   0.923  26.662  0.54 33.17           n atom    859  nz blys 104       1.708   0.302  27.659  0.46 37.61           n atom    860  oxt lys 104      -0.726  -6.025  27.180  1.00 26.53           o  atom    862  n   lys b 276      17.010 -16.138   9.618  1.00 41.00           n atom    863  ca  lys b 276      16.764 -16.524  11.005  1.00 31.05           c atom    864  c   lys b 276      16.428 -15.306  11.884  1.00 26.93           c atom    865  o   lys b 276      16.258 -15.447  13.090  1.00 29.67           o atom    866  cb  lys b 276      17.863 -17.347  11.617  1.00 33.62           c 

edit:

writing grouped lines in different files not need checking previous line's value because changing filename based on 21st character opens new file, separating lines. here, used prev created file same filename won't appended may cause clutter or inconsistency on file's contents.

import sys  fn in sys.argv[1:]:     open(fn,'r') file:          lines = file.readlines()          # creates list or pairs (21st char, line) within list         lines = [ [line[21:22], line ] line in lines ]          # sorts lines based on key (21st char)         lines.sort()          # brings list of lines original state,          # order not reverted since sorted         lines = [ line[1] line in lines ]          filename = 'file'         prev = 0         line in lines:             if line[21:22] != prev:                 # creates new file                 file = open(filename + line[21:22] + '.txt', 'w')             else:                 # appends file                 file = open(filename + line[21:22] + '.txt', 'a')              file.write(line)             prev = line[21:22] 

the file writing part can simplified if appending created files not problem. but, risks writing file same filename not created script or created script during earlier executions/sessions.

filename = 'file' line in lines:     file = open(filename + line[21:22] + '.txt', 'a')     file.write(line) 

Comments