Archive for October, 2006

Shrink

Thursday, October 12th, 2006 at 9:21 pm

I recently wrote a small program that really made me realise how important it is to try and get all your requirements defined before you start to code. I needed to shrink a text file to no more than 80 characters wide so I thought I’d write a small perl script to do this for me. The text file had paragraphs split on the “%” symbol. On the face of it the requirements seemed really straightforward:

  • Read in a text file with sentences longer than 80 characters.
  • Output a text file with sentences no longer than 80 characters, maintaining the the “%” between paragraphs.

So i thought about the following algorithm,

  1. Pass two arguments to the program; first the file F you want to split and second the new length NL, you want the sentences not to exceed.
  2. Take the text file and read it into an array A, putting each paragraph into an element in A.
  3. Loop through A and put each element of A into a new array B.
  4. Loop through B and print each character to stdout. Maintain a counter X as you go.
  5. If X reaches 80 print out a newline (\n).
  6. After printing the last character in B, print a “%”.

The problem with this as I found was it indiscriminately split words in sentences in order to maintain the NL character limit. So I thought you probably need to check whether character at NL is a space. If it is not, backtrack through the array until you find one, then output a newline.

This introduced a new problem though; if you did not find a space at position NL but found one at say NL-10, you would have to start you search for you next newline at NL+(NL-10) not at NL+NL. So, I had to maintain a way of remembering where you cut off the previous sentence and using that the tell you where to start looking for your next NL. Here is the main loop of the program:

#array holds whole text file
foreach(@array){
	$i = 0;
	@small_array = {};
	$prev_end_of_line = 0;
	#read the current paragraph into an new array
	@small_array = split(//,$_);
	foreach(@small_array){
		#option 1: space found at standard position
		if(($small_array[$prev_end_of_line+$new_length] =~ /s/)&&$found!=1){
			$prev_end_of_line = $prev_end_of_line+$new_length;
			$found = 1;
		}
		elsif($found!=1){
			#start backtracking
			for($j=($prev_end_of_line+$new_length)-1; ;$j){
				#option 2: space found at position $j
				if($small_array[$j] =~ /s/){
					$prev_end_of_line = $j;
					$found = 1;
					last;
				}
			}
		}
		print $_;
		$i++;
		if($i eq $prev_end_of_line){
			print \n;
			$found = 0;
		}
	}
	print \n%\n;
	$found = 0;

}

Once I had solved that I started experimenting with NL’s other than 80, and found that if you entered a really low one like 2 it broke the program. It seemed I had made another oversight where I had not allowed for the possibility where there was no space between NL and the start of the line. I decided in order to deal with this I would have the simply split the array at NL regardless of whether is cut up a word or not. This is when the real fun and games started. I thought in order to deal with sentences without space I thought I could check if we had reached the start of the current line, if we had that meant there was no space and just cut out losses and put the newline in at the default position:

#start backtracking
for($j=($prev_end_of_line+$new_length)-1; ;$j){
	#option 2: space found at position $j
	if($small_array[$j] =~ /s/){
		$prev_end_of_line = $j;
		$found = 1;
		last;
	}
	#option 3: space not found
	elsif($j==$prev_end_of_line){
		$prev_end_of_line = $prev_end_of_line+$new_length;
		$found = 1;
		last;
	}
}

For the longest time I could not work out why this would not work. But eventually I realised due to the way I backtracked up to the start of the line, looking for a space, I had not realised if I was dealing with a very small NL and there was a space at the start of the line, there was a chance $j was be incorrectly set. Instead what I needed to do was this:

#start backtracking
for($j=($prev_end_of_line+$new_length)-1; ;$j){
	#option 3: no space found
	if($j==$i&&$i>0){
		$prev_end_of_line =  $prev_end_of_line+$new_length;
		$found = 1;
		last;
	}
	#option 2: space found at position $j
	if($small_array[$j] =~ /s/){
		$prev_end_of_line = $j;
		$found = 1;
		last;
	}
}

I hope my slap dash review of the code has not been entirely un-educational. You may get a better picture by downloading the full source and trying it out for yourself. Any problems or comments, please let me know!

Update

As seems to be the way with perl, there is always more than one way to do something. And after consulting with a perl guru, it seems substr()ing is the way forward when doing anything with strings. This script is a far more elegant solution than mine but was not half as much stress fun as to create!

Posted in Geeky
by Hopkins

My Birthday

Monday, October 2nd, 2006 at 1:56 pm

my 25th birthday in st albansI think you can tell how old you are getting by the presents you receive. The quality and the quantity seem to diminish as you grow older but going on this year’s list you would think I was 135. This does make me sound quite ungrateful, which I am, but I can’t say it bothered me for too long. What more than made up for it was a nice little drink in St Albans this Saturday with a top bunch of chaps. We visited the St Albans Beer Festival which was not quite the on-a-par-with-Oktoberfest event I had envisaged, but served up a vast array of strange beers and even stranger people all the same. Caleb I think hit the nail squarely on the head when he commented his pint tasted as though it had been filtered through a pair of granny’s knickers.
Read the rest of this entry »

Posted in Birthdays
by Hopkins

Zenphoto

Sunday, October 1st, 2006 at 7:59 pm

I’ve made a few updates to the site; bikes has now got it’s own page (with some trackday pictures on) but more noticeably I created a page called zenphoto, which uses some software called unsurprisingly Zenphoto. It will allow me to upload photos that don’t warrant a whole story to go with them. Hopefully I’ll update it more frequently and it’ll grow nicely over time.

Posted in Other
by Hopkins
Comments Off