Shrink
I recently wrote a small program that really made me realise how important it is to try and get all your requirements defined before you start to code. I needed to shrink a text file to no more than 80 characters wide so I thought I’d write a small perl script to do this for me. The text file had paragraphs split on the “%” symbol. On the face of it the requirements seemed really straightforward:
- Read in a text file with sentences longer than 80 characters.
- Output a text file with sentences no longer than 80 characters, maintaining the the “%” between paragraphs.
So i thought about the following algorithm,
- Pass two arguments to the program; first the file F you want to split and second the new length NL, you want the sentences not to exceed.
- Take the text file and read it into an array A, putting each paragraph into an element in A.
- Loop through A and put each element of A into a new array B.
- Loop through B and print each character to stdout. Maintain a counter X as you go.
- If X reaches 80 print out a newline (\n).
- After printing the last character in B, print a “%”.
The problem with this as I found was it indiscriminately split words in sentences in order to maintain the NL character limit. So I thought you probably need to check whether character at NL is a space. If it is not, backtrack through the array until you find one, then output a newline.
This introduced a new problem though; if you did not find a space at position NL but found one at say NL-10, you would have to start you search for you next newline at NL+(NL-10) not at NL+NL. So, I had to maintain a way of remembering where you cut off the previous sentence and using that the tell you where to start looking for your next NL. Here is the main loop of the program:
#array holds whole text file foreach(@array){ $i = 0; @small_array = {}; $prev_end_of_line = 0; #read the current paragraph into an new array @small_array = split(//,$_); foreach(@small_array){ #option 1: space found at standard position if(($small_array[$prev_end_of_line+$new_length] =~ /s/)&&$found!=1){ $prev_end_of_line = $prev_end_of_line+$new_length; $found = 1; } elsif($found!=1){ #start backtracking for($j=($prev_end_of_line+$new_length)-1; ;$j–){ #option 2: space found at position $j if($small_array[$j] =~ /s/){ $prev_end_of_line = $j; $found = 1; last; } } } print $_; $i++; if($i eq $prev_end_of_line){ print “\n“; $found = 0; } } print “\n%\n“; $found = 0; }
Once I had solved that I started experimenting with NL’s other than 80, and found that if you entered a really low one like 2 it broke the program. It seemed I had made another oversight where I had not allowed for the possibility where there was no space between NL and the start of the line. I decided in order to deal with this I would have the simply split the array at NL regardless of whether is cut up a word or not. This is when the real fun and games started. I thought in order to deal with sentences without space I thought I could check if we had reached the start of the current line, if we had that meant there was no space and just cut out losses and put the newline in at the default position:
#start backtracking for($j=($prev_end_of_line+$new_length)-1; ;$j–){ #option 2: space found at position $j if($small_array[$j] =~ /s/){ $prev_end_of_line = $j; $found = 1; last; } #option 3: space not found elsif($j==$prev_end_of_line){ $prev_end_of_line = $prev_end_of_line+$new_length; $found = 1; last; } }
For the longest time I could not work out why this would not work. But eventually I realised due to the way I backtracked up to the start of the line, looking for a space, I had not realised if I was dealing with a very small NL and there was a space at the start of the line, there was a chance $j was be incorrectly set. Instead what I needed to do was this:
#start backtracking for($j=($prev_end_of_line+$new_length)-1; ;$j–){ #option 3: no space found if($j==$i&&$i>0){ $prev_end_of_line = $prev_end_of_line+$new_length; $found = 1; last; } #option 2: space found at position $j if($small_array[$j] =~ /s/){ $prev_end_of_line = $j; $found = 1; last; } }
I hope my slap dash review of the code has not been entirely un-educational. You may get a better picture by downloading the full source and trying it out for yourself. Any problems or comments, please let me know!
Update
As seems to be the way with perl, there is always more than one way to do something. And after consulting with a perl guru, it seems substr()ing is the way forward when doing anything with strings. This script is a far more elegant solution than mine but was not half as much stress fun as to create!
After setting up a
Over the years, I have found myself a victim of the bright and flashy, lighted machines known to those in the trade as "fruities". You'll find them scattered throughout just about every watering hole in the land and I myself have on more occasions than I care to admit have wasted a small fortune on them.