bash - Linux - Split records and generate required format -


i have hundreds of thousands of these records in text file.

2015-05-16      testing112 alpha1        {}      {}      {beta1} 2015-05-16      testing124  gamma1   {xbgtd1} {}      {hjhjje;g76gr} 2015-05-16      testing124  asdasdg   {xbgtd1;dfdfgg} {}      {hjhjje;g76gr} 
  1. the file has 6 columns.
  2. space (one or many) delimiter between fields.
  3. the first, second & third fields never empty.
  4. 4th, 5th & 6th fields enclosed between {}. if there no value, there 2 braces {}. if these fields have more 1 value, values within curly brace separated semicolon such {a;b}.

i'd below each of fields in each line

loop through each line in file & generate following

1) <some sentence>field1, field2,field3; 2) <some sentence>field1, field2, field4; 3) <some sentence>field1, field2,field5; 4) <some sentence>field1, field2,field6; 

in case of (2), (3) & (4) above, in case there multiple values fields within curly bracket, separated semicolon , generate same statement each of fields below:

1) <some sentence>field1, field2, field4_first; 2) <some sentence>field1, field2, field4_second; 3) <some sentence>field1, field2, field5_first; 4) <some sentence>field1, field2, field5_second; 5) <some sentence>field1, field2, field6_first; 6) <some sentence>field1, field2, field6_second; 

i'm trying use perl achieve this. however, split of strings isn't coming out right. using on lines of split(s/ {1,}//,$_) there can number of spaces between fields. not working. tried few other options don't seem work. please me here?

am running on centos. language fine me achieve result.

code below using parse , print see values before proceeding further write file:

#!/usr/bin/perl -w  $i_file      =   'input.txt'; $o_file      =   'output.txt'; $text_cont   =   ""; our $ins_1  =   ""; our $ins_2  =   ""; our $ins_3  =   ""; our $ins_4  =   "";  open (file, $i_file) or die "could not read $i_file, program halting.";     while(<file>) {         (my $map_date,my $nam,my $ins_name_1, $ins_name_2, $ins_name_3, $ins_name_4) = split(s/ \{1,\}//,$_);          $name1_refined   =   $ins_name_1 =~ s/\{|\}//;         $name2_refined   =   $ins_name_2 =~ s/\{|\}//;         $name3_refined   =   $ins_name_3 =~ s/\{|\}//;         $name4_refined   =   $ins_name_4 =~ s/\{|\}//;          @nam1_values =   split(';', $name1_refined);         @nam2_values =   split(';', $name2_refined);         @nam3_values =   split(';', $name3_refined);         @aod_values  =   split(';', $name4_refined);          print "$name1_refined\n";         print "$name2_refined\n";         print "$name3_refined\n";         print "$name4_refined\n";     }  close file; 

for first part - i'd suggest you're on thinking it.

split no arguments splits on whitespace.

so taking input data:

#!/usr/local/bin/perl use strict; use warnings;  use data::dumper;  while ( <data> ) {    @stuff = split;    print dumper \@stuff; } __data__ 2015-05-16      testing112 alpha1        {}      {}      {beta1} 2015-05-16      testing124  gamma1   {xbgtd1} {}      {hjhjje;g76gr} 2015-05-16      testing124  asdasdg   {xbgtd1;dfdfgg} {}      {hjhjje;g76gr} 

you array with:

$var1 = [           '2015-05-16',           'testing124',           'asdasdg',           '{xbgtd1;dfdfgg}',           '{}',           '{hjhjje;g76gr}'         ]; 

etc.

you can apply cleanup/split again on subfields.

 @subfields = map {  s/^{|}$//g; split( /;/ ) } @stuff[ 3 .. 5 ]; print dumper \@subfields; 

this - using map - split each element in fields 3-5 (remember perl starts @ zero), , remove outside squiggly brackets.

map quite clever higher order function, sort of foreach loop, in 'transforms' every element of list , returns new list applying code block in each item in list in turn (and 'returning' implicitly result of last call, e.g. elements out of split function).

giving (for last row):

$var1 = [           'xbgtd1',           'dfdfgg',           'hjhjje',           'g76gr'         ]; 

so can:

foreach $field ( @subfields ) {     print "some_sentence $stuff[0] $stuff[1] $field\n"; } 

note - in last row, has skipped empty field '5'. that's not hard keep if need it. first thought on doing altering map little:

my @subfields =     map { s/^{|}$//g; m/./ ? split( /;/ ) : '' }     @stuff[ 3 .. 5 ]; 

which means map now:

  • iterates elements 3 - 5.

  • applies 'remove brackets' transform.

  • tests if there left

  • either returns split string, or empty string depending.

also - couple of general points on code - you'd better off with:

  • change open 3 arg open lexical filehandles. e.g. open ( $input_fh, "<", $i_file ) or die $!;

  • you can my ( $var1, $var2 ) = split; instead.

  • when you're numbering similar variables, often means want using list instead.

so making code bit more like:

#!/usr/local/bin/perl use strict; use warnings;  use data::dumper;  $i_file = 'input.txt'; open( $input_fh, "<", $i_file )     or die "could not read $i_file, program halting : $!";  while (<$input_fh>) {     ( $map_date, $nam, @ins_name ) = split;     print dumper \@ins_name;     @subfields =         map { s/^{|}$//g; m/./ ? split(/;/) : '' } @ins_name;     print dumper \@subfields;      foreach $field (@subfields) {         print "some_sentence $map_date $nam $field\n";     } } close($input_fh); 

(you can remove dumper - it's there printing diagnostics).


Comments