Text Box 1

 

 

 

 

$INPTBffr =~ s/([^\x0A\x0D\x20-\x7E]+)//g;

 

The single line of Perl code shown above will strip all non-text data (other than line endings) from any file leaving only text behind. The entire contents of the file are loaded into $INPTBffr by a simple READ() statement, and then binary information is stripped from it in place by the substitution regular expression construct on the right. This tiny fragment of code is not by any means a whole program, but for a single statement, it does do a surprising amount of the work necessary to transform files into a suitable form for input to a semantic analysis engine.