<-- & --> [1] EarthWeb [2][dotclear.gif] [3]Home [4]Account Info [5]Subscribe [6]Login [7]Search [8]My ITknowlege [9]FAQ [10]Sitemap [11]Contact Us Free ITKnowledge Newsletters [12]____________ [13]Signup [14]Current Newsletters [15]___________ [16]Go [17] Search Tips [18] Advanced Search [19]___________ [20]Go [21][(1)__Title....] [22][(1)__Please Select...........] Perl 5 By Example Perl 5 By Example by David Medinets Que, Macmillan Computer Publishing ISBN: 0789708663 Pub Date: 10/03/96 [23] Buy It [24]Bookmark It Search this book: [25]_________________________ [26]Go Chapter 1 Getting Your Feet Wet ____________________________________________________________________________________________________________________ CONTENTS * [27]Origins * [28]Similar to C? * [29]Cost and Licensing * [30]Do You Have Perl Installed? * [31]Getting and Installing Perl * [32]Your First Perl Program + [33]Creating the Program + [34]Invocation * [35]Comments in Your Program * [36]Summary * [37]Review Questions * [38]Review Exercises ____________________________________________________________________________________________________________________ You are about to embark on a journey through the world of Perl programming. You'll find that the trip has been `made easier by many examples liberally sprinkled along the trail. The beginning of the trip covers the basic concepts of the Perl language. Then you move on to some of the more advanced concepts-how to create Perl statements and whole programs. At the end of the trip, some guideposts are placed-in the form of Internet sites-to show you how to explore more advanced programming topics on your own. Do you know any other programming languages? If so, then learning Perl will be a snap. If not, take it slow, try all of the examples, and have fun experimenting as you read. I thought about adding a section here about programming ideals. Or perhaps, a discussion about the future of Perl. Then, I realized that when I was first learning computer languages, I didn't really care about that stuff. I just wanted to know about the language and what I could do with it. With that in mind, the next section on Perl's origin is very short. After all, you can read all the background information you'd like using a web browser by starting at [39]http://www.perl.com -the Perl Home Page. Origins Perl began as the result of one man's frustration and, by his own account, inordinate laziness. It is a unique language in ways that cannot be conveyed simply by describing the technical details of the language. Perl is a state of mind as much as a language grammar. One of the oddities of the language is that its name has been given quite a few definitions. Originally, Perl meant the Practical Extraction Report Language. However, programmers also refer to is as the Pathologically Eclectic Rubbish Lister. Or even, Practically Everything Really Likable. Let's take a few minutes to look at the external forces which provoked Perl into being-it should give you an insight into the way Perl was meant to be used. Back in 1986, Larry Wall found himself working on a task which involved generating reports from a lot of text files with cross references. Being a UNIX programmer, and because the problem involved manipulating the contents of text files, he started to use awk for the task. But it soon became clear that awk wasn't up to the job; with no other obvious candidate for the job, he'd just have to write some code. Now here's the interesting bit: Larry could have just written a utility to manage the particular job at hand and gotten on with his life. He could see, though, that it wouldn't be long before he'd have to write another special utility to handle something else which the standard tools couldn't quite hack. (It's possible that he realized that most programmers were always writing special utilities to handle things which the standard tools couldn't quite hack.) So rather than waste any more of his time, he invented a new language and wrote an interpreter for it. If that seems like a paradox, it isn't really-it's always a bit more of an effort to set yourself up with the right tools, but if you do it right, the effort pays off. The new language had an emphasis on system management and text handling. After a few revisions, it could handle regular expressions, signals, and network sockets, too. It became known as Perl and quickly became popular with frustrated, lazy UNIX programmers. And the rest of us. Note Is it "Perl" or "perl?" The definitive word from Larry Wall is that it doesn't matter. Many programmers like to refer to languages with capitalized names (Perl) but the program originated on a UNIX system where short, lowercase names (awk, sed, and so forth) were the norm. As with so many things about the language, there's no single "right way" to do it; just use it the way you want. It's a tool, after all, not a dogma. If you're sufficiently pedantic, you may want to call it "[Pp]erl" after you've read [40]Chapter 10, "Regular Expressions." Similar to C? Perl programs bear a passing resemblance to C programs, perhaps because Perl was written in C, or perhaps because Larry found some of its syntactic conventions handy. But Perl is less pedantic and a lot more concise than C. Perl can handle low-level tasks quite well, particularly since Perl 5, when the whole messy business of references was put on a sound footing. In this sense, it has a lot in common with C. But Perl handles the internals of data types, memory allocation, and such automatically and seamlessly. This habit of picking up interesting features as it went along-regular expressions here, database handling there-has been regularized in Perl 5. It is now fairly easy to add your favorite bag of tricks to Perl by using modules. It is likely that many of the added-on features of Perl such as socket handling will be dropped from the core of Perl and moved out to modules after a time. Cost and Licensing Perl is free. The full source code and documentation are free to copy, compile, print, and give away. Any programs you write in Perl are yours to do with as you please; there are no royalties to pay and no restrictions on distributing them as far as Perl is concerned. It's not completely "public domain," though, and for very good reason. If the source were completely public domain, it would be possible for someone to make minor alterations to it, compile it, and sell it-in other words, to rip off its creator. On the other hand, without distributing the source code, it's hard to make sure that everyone who wants to can use it. The GNU General Public License is one way to distribute free software without the danger of someone taking advantage of you. Under this type of license, source code may be distributed freely and used by anybody, but any programs derived using such code must be released under the same type of license. In other words, if you derive any of your source code from GNU-licensed source code, you have to release your source code to anyone who wants it. This is often sufficient to protect the interests of the author, but it can lead to a plethora of derivative versions of the original package. This may deprive the original author of a say in the development of his or her own creation. It can also lead to confusion on the part of the end users as it becomes hard to establish which is the definitive version of the package, whether a particular script will work with a given version, and so on. That's why Perl is released under the terms of the "Artistic" license. This is a variation on the GNU General Public License which says that anyone who releases a package derived from Perl must make it clear that the package is not actually Perl. All modifications must be clearly flagged, executables renamed if necessary, and the original modules distributed along with the modified versions. The effect is that the original author is clearly recognized as the "owner" of the package. The general terms of the GNU General Public License also apply. Do You Have Perl Installed? It's critically important to have Perl installed on your computer before reading too much further. As you read the examples, you'll want to try them. If Perl is not already installed, momentum and time will be lost. It is very easy to see if your system already has Perl installed. Simply go to a command-line prompt and type: perl -v Hopefully, the response will be similar to this: This is perl, version 5.001 Unofficial patchlevel 1m. Copyright 1987-1994, Larry Wall Win32 port Copyright 1995 Microsoft Corporation. All rights reserved. Developed by hip communications inc., http://info.hip.com/info/ Perl for Win32 Build 107 Built Apr 16 1996@14:47:22 Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5.0 source kit. If you get an error message or you have version 4 of Perl, please see your system administrator or install Perl yourself. The next section describes how to get and install Perl. Getting and Installing Perl New versions of Perl are released on the Internet and distributed to Web sites and ftp archives across the world. UNIX binaries are generally not made available on the Internet, as it is generally better to build Perl on your system so that you can be certain it will work. All UNIX systems have a C compiler, after all. Each operating system has its own way of getting and installing Perl. For UNIX and OS/2-The Perl Home Page contains a software link ([41]http://www.perl.com/perl/info/software.html) that will enable you to download the latest Perl source code. The page also explains why Perl binaries are not available. Hopefully, your system will already have Perl installed. If not, try to get your system administrator to install it. For Windows 95/Windows NT-The home page of hip communications, inc. ([42]http://www.perl.hip.com) contains a link to download the i86 Release Binary. This link lets you download a zip file that contains the Perl files in compressed form. Instructions for compiling Perl or for installing on each operating system are included with the distribution files. Follow the instructions provided and you should having a working Perl installation rather quickly. If you have trouble installing Perl, skip ahead to Chapter 22, "Internet Resources," connect to the #perl IRC channel, and ask for help. Don't be shy! Your First Perl Program Your first Perl program will show how to display a line of text on your monitor. First, you create a text file to hold the Perl program. Then you run or execute the Perl program file. Creating the Program A Perl program consists of an ordinary text file containing a series of Perl statements. Statements are written in what looks like an amalgam of C, UNIX shell script, and English. In fact, that's pretty much what it is. Perl code can be quite free-flowing. The broad syntactic rules governing where a statement starts and ends are: * Leading spaces on a line are ignored. You can start a Perl statement anywhere you want: at the beginning of the line, indented for clarity (recommended) or even right-justified (definitely frowned on because the code would be difficult to understand) if you like. * Statements are terminated with a semicolon. * Spaces, tabs, and blank lines outside of strings are irrelevant-one space is as good as a hundred. That means you can split statements over several lines for clarity. A string is basically a series of characters enclosed in quotes. [43]Chapter 2 "Numeric and String Literals," contains a better definition for strings. * Anything after a hash sign (#) is ignored except in strings. Use this fact to pepper your code with useful comments. Here's a Perl statement inspired by Kurt Vonnegut: print("My name is Yon Yonson\n"); No prizes for guessing what happens when Perl runs this code-it prints out My name is Yon Yonson. If the "\n" doesn't look familiar, don't worry-it simply means that Perl should print a newline character after the text, or in other words, go to the start of the next line. Printing more text is a matter of either stringing together statements like this, or giving multiple arguments to the print() function: print("My name is Yon Yonson,\n"); print("I live in Wisconsin,\n", "I work in a lumbermill there.\n"); So what does a complete Perl program look like? Here's a small example, complete with the invocation line at the top and a few comments: #!/usr/local/bin/perl -w print("My name is Yon Yonson,\n"); print("I live in Wisconsin,\n", "I work in a lumbermill there.\n"); That's not at all typical of a Perl program, though; it's just a linear sequence of commands with no complexity. You can create your Perl program by starting any text processor: In UNIX-you can use emacs or vi. In Windows 95/Windows NT-you can use notepad or edit. In OS/2-you can use e or epm. Create a file called test.pl that contains the preceding three lines. Invocation Assuming that Perl is correctly installed and working on your system, the simplest way to run a Perl program is to type the following: perl filename.pl The filename should be replaced by the name of the program that you are trying to run or execute. If you created a test.pl file while reading the previous section, you can run it like this: perl test.pl This example assumes that perl is in the execution path; if not, you will need to supply the full path to perl, too. For example, on UNIX the command might be: /usr/local/bin/perl test.pl Whereas on Windows NT, you might need to use: c:\perl5\bin\perl test.pl UNIX systems have another way to invoke a program. However, you need to do two things. The first is to place a line like #!/usr/local/bin/perl at the start of the Perl file. This tells UNIX that the rest of this script file is to be run by /usr/local/bin/perl. The second step is to make the program file itself executable by changing its mode: chmod +x test.pl Now you can execute the program file directly and let the program file tell the operating system what interpreter to use while running it. The new command line is simply: test Comments in Your Program It is very important to place comments into your Perl programs. Comments will enable you to figure out the intent behind the mechanics of your program. For example, it is very easy to understand that your program adds 66 to another value. But, in two years, you may forget how you derived the number 66 in the first place. Comments are placed inside a program file using the # character. Everything after the # is ignored. For example: # This whole line is ignored. print("Perl is easy.\n"); # Here's a half-line comment. Summary You've finished the first chapter of the book and already written and executed a Perl program. Believe it or not, you've now done more than most people that I talk to on the web. Let's quickly review what you've read so far. Perl was created to solve a need, not to match the ideals of computer science. It has evolved from being a simple hack to a full-fledged modern programming language. Perl's syntax is similar to the C programming language. However, it has a lot of features that were borrowed from UNIX tools. Perl is very cost-effective in a lot of situations because it is free. There are legal restrictions that you need to follow. However, any restrictions are listed in the documentation that comes with Perl, and you don't need that information repeated. You can get Perl by reading the [44]http://www.perl.com/perl/info/software.html Web page. It has links to both the source code and the executables for Windows 95 and Windows NT. Perl programs are simply text files. They are created in any text editor. As long as you give the file an extension of .pl, running the file will be easy. Most systems will run Perl program file called test.pl with the following command: perl test.pl You can add comments to your Perl program using the # character. Anything after the # character is ignored. I hope the journey has been very smooth so far. The only difficulty may have been if you did not have Perl installed. The next part of the journey will be to learn some basic building blocks in the form of numeric and string literals. But literals will have to wait until the next chapter. Review Questions Answers to Review Questions are in Appendix A. 1. What is the address of Perl's home page? 2. Who was the creator of Perl? 3. How much does Perl cost? 4. Why are comments important to programming? Review Exercises 1. Connect to the Perl Home Page and spend a few minutes looking at the links. 2. Create and run a Perl program that prints "Hello, World" on the monitor. ____________________________________________________________________________________________________________________ [45][cc.gif] [46][hb.gif] [47][nc.gif] ____________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________ Chapter 2 Numeric and String Literals ____________________________________________________________________________________________________________________ CONTENTS * [27]Numeric Literals + [28]Example: Numbers * [29]String Literals + [30]Example: Single-Quoted Strings + [31]Example: Double-Quoted Strings + [32]Example: Back-Quoted Strings * [33]Array Literals + [34]Example: Printing an Array + [35]Example: Nesting Arrays + [36]Example: Using a Range of Values * [37]Summary * [38]Review Questions * [39]Review Exercises ____________________________________________________________________________________________________________________ In this chapter, we'll take a look at some of the ways that Perl handles data. All computer programs use data in some way. Some use it to personalize the program. For example, a mail program might need to remember your name so that it can greet you upon starting. Another program-say one that searches your hard disk for files-might remember your last search parameters in case you want to perform the same search twice. A literal is a value that is represented "as is" or hard-coded in your source code. When you see the four characters 45.5 in programs it really refers to a value of forty-five and a half. Perl uses four types of literals. Here is a quick glimpse at them: * Numbers-This is the most basic data type. * Strings-A string is a series of characters that are handled as one unit. * Arrays-An array is a series of numbers and strings handled as a unit. You can also think of an array as a list. * Associative Arrays-This is the most complicated data type. Think of it as a list in which every value has an associated lookup item. Associative arrays will be discussed in [40]Chapter 3 "Variables." Numbers, strings, and regular arrays will be discussed in the following sections. Numeric Literals Numeric literals are frequently used. They represent a number that your program will need to work with. Most of the time you will use numbers in base ten-the base that everyone uses. However, Perl will also let you use base 8 (octal) or base 16 (hexadecimal). Note For those of you who are not familiar with non-decimal numbering systems, here is a short explanation. In decimal notation-or base ten- when you see the value 15 it signifies (1 * 10) + 5 or 1510. The subscript indicates which base is being used. In octal notation-or base eight-when you see the value 15 it signifies (1 * 8) + 5 or 1310. In hexadecimal notation-or base 16-when you see the value 15 it signifies (1 * 16) + 5 or 2110. Base 16 needs an extra six characters in addition to 0 to 9 so that each position can have a total of 16 values. The letters A-F are used to represent 11-16. So the value BD16 is equal to (B16 * 16) + D16 or (1110 * 16) + 1310 which is 17610. If you will be using very large or very small numbers, you might also find scientific notation to be of use. Note If you're like me, you probably forgot most of the math you learned in high school. However, scientific notation has always stuck with me. Perhaps because I liked moving decimal points around. Scientific notation looks like 10.23E+4, which is equivalent to 102,300. You can also represent small numbers if you use a negative sign. For example, 10.23E-4 is .001023. Simply move the decimal point to the right if the exponent is positive and to the left if the exponent is negative. Example: Numbers Let's take a look at some different types of numbers that you can use in your program code. First, here are some integers. [pseudo.gif] An integer. Integers are numbers with no decimal components. An integer in octal format. This number is 35, or (4 * 8) + 3, in base 10. An integer in hexadecimal format. This number is also 35, or (2 * 16) + 3 in base 10. 123 043 0x23 Now, some numbers and fractions-also called floating point values. You will frequently see these values referred to as a float value for simplicity's sake. [pseudo.gif] A float with a value in the tenths place. You can also say 100 and 5/10. A float with a fraction value out to the thousandths place. You can also say 54 and 534/1000. 100.5 54.534 Here's a very small number. [pseudo.gif] A very small float value. You can represent this value in scientific notation as 3.4E-5. .000034 String Literals String Literals are groups of characters surrounded by quotes so that they can be used as a single datum. They are frequently used in programs to identify filenames, display messages, and prompt for input. In Perl you can use single quotes ('), double quotes("), and back quotes (`). Example: Single-Quoted Strings The following examples show you how to use string literals. String literals are widely used to identify filenames or when messages are displayed to users. First, we'll look at single-quoted strings, then double-quoted strings. A single-quoted string is pretty simple. Just surround the text that you'd like to use with single quotes. Note The real value of single-quoted strings won't become apparent until you read about variable interpolation in the section "Examples: Variable Interpolation" in [41]Chapter 3 "Variables." [pseudo.gif] A literal that describes one of my favorite role-playing characters. A literal that describes the blessed cleric that frequently helps WasWaldo stay alive. 'WasWaldo the Illusionist' 'Morganna the Fair' Strings are pretty simple, huh? But what if you wanted to use a single quote inside the literal? If you did this, Perl would think you wanted to end the string early and a compiler error would result. Perl uses the backslash (\) character to indicate that the normal function of the single quote-ending a literal-should be ignored for a moment. Tip The backslash character is also called an escape character-perhaps because it lets the next character escape from its normal interpretation [pseudo.gif] A literal that comments on WasWaldo's fighting ability. Notice how the single quote is used. Another comment from the peanut gallery. Notice that double quotes can be used directly inside single-quoted strings. 'WasWaldo can\'t hit the broad side of a barn.' 'Morganna said, "WasWaldo can\'t hit anything."' The single-quotes are used here specifically so that the double-quotes can be used to surround the spoken words. Later in the section on double-quoted literals, you'll see that the single-quotes can be replaced by double-quotes if you'd like.You must know only one more thing about single-quoted strings. You can add a line break to a single-quoted string simply by adding line breaks to your source code-as demonstrated by Listing 2.1. [pseudo.gif] Tell Perl to begin printing. More Lines for Perl to display. The single quote ends the string literal. ____________________________________________________________________________________________________________________ Listing 2.1 02LST01.PL-Using Embedded Line Breaks to Skip to a New Line print 'Bill of Goods Bread: $34 .45 Fruit: $45.00 ====== $79.45'; ____________________________________________________________________________________________________________________ Figure 2.1 shows a bill of goods displayed on one long, single-quoted literal. [42]Figure 2.1 : A bill of goods displayed one long single-quoted literal. You can see that with single-quoted literals, even the line breaks in your source code are part of the string. Example: Double-Quoted Strings Double-quoted strings start out simple, then become a bit more involved than single-quoted strings. With double-quoted strings, you can use the backslash to add some special characters to your string. [43]Chapter 3 "Variables," will talk about how double-quoted strings and variables interact. Note Variables-which are described in [44]Chapter 3 "Variables"-are simply locations in the computer's memory where Perl holds the various data types. They're called variables because the content of the memory can change as needed. The basic double-quoted string is a series of characters surrounded by double quotes. If you need to use the double quote inside the string, you can use the backslash character. [pseudo.gif] This literal is similar to one you've already seen. Just the quotes are different. Another literal that uses double quotes inside a double-quoted string. "WasWaldo the Illusionist" "Morganna said, \"WasWaldo can't hit anything.\"" Notice how the backslash in the second line is used to escape the double quote characters. And the single quote can be used without a backslash. One major difference between double- and single-quoted strings is that double-quoted strings have some special escape sequences that can be used. Escape sequences represent characters that are not easily entered using the keyboard or that are difficult to see inside an editor window. Table 2.1 shows all of the escape sequences that Perl understands. The examples following the table will illustrate some of them. Table 2.1 Escape Sequences Escape Sequences Description or Character \a Alarm\bell \b Backspace \e Escape \f Form Feed \n Newline \r Carriage Return \t Tab \v Vertical Tab \$ Dollar Sign \@ Ampersand \0nnn Any Octal byte \xnn Any Hexadecimal byte \cn Any Control character \l Change the next character to lowercase \u Change the next character to uppercase \L Change the following characters to lowercase until a \E sequence is encountered. Note that you need to use an uppercase E here, lowercase will not work. \Q Quote meta-characters as literals. See [45]Chapter 10, "Regular Expressions," for more information on meta-characters. \U Change the following characters to uppercase until a \E sequence is encountered. Note that you need to use an uppercase E here, lowercase will not work. \E Terminate the \L, \Q, or \U sequence. Note that you need to use an uppercase E here, lowercase will not work. \\ Backslash Note In the next chapter, "Variables," you'll see why you might need to use a backslash when using the $ and @ characters. [pseudo.gif] This literal represents the following: WasWaldo is 34 years old. The \u is used twice in the first word to capitalize the w characters. And the hexadecimal notation is used to represent the age using the ASCII codes for 3 and 4. This literal represents the following: The kettle was HOT!. The \U capital-izes all characters until a \E sequence is seen. "\uwas\uwaldo is \x33\x34 years old." "The kettle was \Uhot\E!" For more information about ASCII codes, see Appendix E, "ASCII Table." Actually, this example isn't too difficult, but it does involve looking at more than one literal at once and it's been a few pages since our last advanced example. Let's look at the \t and \n escape sequences. Listing 2.2-a program displaying a bill with several items-will produce the output shown in Figure 2.2. [46]Figure 2.2 : A bill of goods displayed using newline and tab characters. [pseudo.gif] Display a literal as the first line, second and third of the output. Display literals that show what was purchased Display a separator line. Display the total. ____________________________________________________________________________________________________________________ Listing 2.2 02LST02.PL-Using Tabs and Newline Characters to Print print "Bill of Goods Bread:\t\$34.45\n"; print "Fruit:\t"; print "\$45.00\n"; print "\t======\n"; print "\t\$79.45\n"; ____________________________________________________________________________________________________________________ Tip Notice that Figure 2.1 and 2.2 look identical. This illustrates a cardinal rule of Perl-there's always more than one way to do something. This program uses two methods to cause a line break. * The first is simply to include the line break in the source code. * The second is to use the \n or newline character. I recommend using the \n character so that when looking at your code in the future, you can be assured that you meant to cause a line break and did not simply press the ENTER key by mistake. Caution If you are a C/C++ programmer, this material is not new to you. However, Perl strings are not identical to C/C++ strings because they have no ending NULL character. If you are thinking of converting C/C++ programs to Perl, take care to modify any code that relies on the NULL character to end a string. Example: Back-Quoted Strings It might be argued that back-quoted strings are not really a data type. That's because Perl uses back-quoted strings to execute system commands. When Perl sees a back-quoted string, it passes the contents to Windows, UNIX, or whatever operating system you are using. Let's see how to use the back-quoted string to display a directory listing of all text files in the perl5 directory. Figure 2.3 shows what the output of such a program might look like. [47]Figure 2.3 : Using a back-quoted string to display a directory. [pseudo.gif] Print the directory listing. print "dir *.txt"; All of the escape sequences used with double-quoted strings can be used with back-quoted strings. Array Literals Perl uses arrays-or lists-to store a series of items. You could use an array to hold all of the lines in a file, to help sort a list of addresses, or to store a variety of items. We'll look at some simple arrays in this section. In the next chapter, "Variables," you'll see more examples of how useful arrays can be. Example: Printing an Array In this section, we'll look at printing an array and see how arrays are represented in Perl source code. This example shows an empty array, an array of numbers and an array of strings. Figure 2.4 shows the output of Listing 2.3. [48]Figure 2.4 : The output from Listing 2.3, showing different array literals. [pseudo.gif] Print the contents of an empty array. Print the contents of an array of numbers. Print the contents of an array of strings. Print the contents of an array with different data types. ____________________________________________________________________________________________________________________ Listing 2.3 02LST03.PL-Printing Some Array Literals print "Here is an empty array:" . () . "<-- Nothing there!\n"; print (12, 014, 0x0c, 34.34, 23.3E-3); print "\n"; print ("This", "is", 'an', "array", 'of', "strings"); print "\n"; print ("This", 30, "is", 'a', "mixed array", 'of', 0x08, "items");. ____________________________________________________________________________________________________________________ The fourth line of this listing shows that you can mix single- and double-quoted strings in the same array. You can also mix numbers and strings interchangeably, as shown in the last line. Note Listing 2.3 uses the period, or concatenation, operator to join a string representation of the empty array with the string "Here is an empty array:" and the string "<-- Nothing there!\n". You can read more about operators in [49]Chapter 4 "Operators." Note In this and other examples in this chapters, the elements of an array will be printed with no spaces between them. You will see how to print with spaces in the section "Strings Revisited" in [50]Chapter 3 "Variables." Example: Nesting Arrays Many times a simple list is not enough. If you're a painter, you might have one array that holds the names of orange hues and one that holds the names of yellow hues. To print them, you can use Perl's ability to specify a sub-array inside your main array definition. While this example is not very "real-world," it gives you the idea behind specifying an array by using sub-arrays. [pseudo.gif] Print an array that consists of two sub-arrays. Print an array that consists of an array, a string, and another array. print (("Bright Orange", "Burnt"), ("Canary Yellow", "Sunbeam")); print (("Bright Orange", "Burnt"), " Middle ", ("Canary Yellow", "Sunbeam")); So far, we haven't talked about the internal representations of data types. That's because you almost never have to worry about such things with Perl. However, it is important to know that, internally, the sub-arrays are merged into the main array. In other words, the array: (("Bright Orange", "Burnt"), ("Canary Yellow", "Sunbeam")) is exactly equivalent to ("Bright Orange", "Burnt", "Canary Yellow", "Sunbeam") Example: Using a Range of Values At times you might need an array that consists of sequential numbers or letters. Instead of making you list out the entire array, Perl has a shorthand notation that you can use. Perl uses two periods (..) to replace a consecutive series of values. Not only is this method quicker to type-and less prone to error-it is easier to understand. Only the end points of the series are specified; you don't need to manually verify that every value is represented. If the .. is used, then automatically you know that a range of values will be used. [pseudo.gif] Print an array consisting of the numbers from 1 to 15. Print an array consisting of the numbers from 1 to 15 using the shorthand method. print (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15); print "\n"; print (1..15); The two arrays used in the previous example are identical, but they were specified differently. Note The double periods in the array specification are called the range operator. The range operator is also discussed in [51]Chapter 4 "Operators." You can also use the shorthand method to specify values in the middle of an array. [pseudo.gif] Print an array consisting of the numbers 1, 2, 7, 8, 9, 10, 14, and 15.Print an array consisting of the letters A, B, F, G, H, Y, Z print (1, 2, 7..10, 14, 15); print "\n" print ("A", "B", "F".."H", "Y", "Z"); The range operator works by taking the lefthand value, adding one to it, then appending that new value to the array. Perl continues to do this until the new value reaches the righthand value. You can use letters with the range operator because the ASCII table uses consecutive values to represent consecutive letters. For more information about ASCII codes, see Appendix E, "ASCII Table." Summary This chapter introduced you to both numeric and string literals. You learned that literals are values that are placed directly into your source code and never changed by the program. They are sometimes referred to as hard-coded values. You read about numbers and the three different bases that can be used to represent them-decimal, octal, and hexadecimal. Very large or small numbers can also be described using scientific notation. Strings were perhaps a bit more involved. Single-, double-, and back-quoted strings are used to hold strings of characters. Back-quoted strings have an additional purpose. They tell Perl to send the string to the operating system for execution. Escape sequences are used to represent characters that are difficult to enter through the keyboard or that have more than one purpose. For example, using a double quote inside a double-quoted string would end the string before you really intended. The backslash character was introduced to escape the double quote and change its meaning. The next chapter, "Variables," will show you how Perl uses your computer memory to store data types and also will show you ways that you can manipulate data. Review Questions Answers to Review Questions are in Appendix A. 1. What are the four types of literals? 2. What is a numeric literal? 3. How many types of string literals are there? 4. What is the major difference between single- and double-quoted strings? 5. What are three escape sequences and what do they mean? 6. What would the following one-line program display? print 'dir /*.log'; 7. What is scientific notation? 8. How can you represent the number 64 in hexadecimal inside a double-quoted string? 9. What is the easiest way to represent an array that includes the numbers 56 to 87? Review Exercises 1. Write a program that prints the decimal number 32. However, in the print command, specify the value of 32 using hexadecimal notation. 2. Create program that uses the tab character in three literals to align text. 3. Write a program that prints using embedded new lines in a single-quoted literal. 4. Convert the number 56,500,000 into scientific notation. 5. Write a program that prints an array that uses the range operator. The left value should be AA and the right value should be BB. What happens and why? 6. Write a program that prints its own source code using a back-quoted string. ____________________________________________________________________________________________________________________ [52][pc.gif] [53][cc.gif] [54][hb.gif] [55][nc.gif] _________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________ Chapter 3 Variables ____________________________________________________________________________________________________________________ CONTENTS * [27]Scalar Variables + [28]Example: Assigning Values to Scalar Variables + [29]Changing Values in Scalar Variables * [30]Array Variables + [31]Example: Assigning Values to Array Variables + [32]Example: Using Array Elements + [33]Example: Using Negative Subscripts + [34]Example: Determining the Number of Elements in an Array + [35]Example: How to Grab a Slice (or Part) of an Array * [36]Associative Array Variables + [37]Example: Assigning Values to Associative Array Variables * [38]Double-Quoted Strings Revisited + [39]Example: Variable Interpolation + [40]Example: Using the $" Special Variable * [41]Summary * [42]Review Questions * [43]Review Exercises ____________________________________________________________________________________________________________________ In the last chapter, you learned about literals-values that don't change while your program runs because you represent them in your source code exactly as they should be used. Most of the time, however, you will need to change the values that your program uses. To do this, you need to set aside pieces of computer memory to hold the changeable values. And, you need to keep track of where all these little areas of memory are so you can refer to them while your program runs. Perl, like all other computer languages, uses variables to keep track of the usage of computer memory. Every time you need to store a new piece of information, you assign it to a variable. You've already seen how Perl uses numbers, strings, and arrays. Now, you'll see how to use variables to hold this information. Perl has three types of variables: Variable Type Description Scalars Holds one number or string value at a time. Scalar variable names always begin with a $. Arrays Holds a list of values. The values can be numbers, strings, or even another array. Array variable names always begin with an @. Associative Arrays Uses any value as an index into an array. Associative array variable names always begin with an . The different beginning characters help you understand how a variable is used when you look at someone else's Perl code. If you see a variable called @Value, you automatically know that it is an array variable. They also provide a different namespace for each variable type. Namespaces separate one set of names from another. Thus, Perl can keep track of scalar variables in one table of names (or namespace) and array variables in another. This lets you use $name, @name, and %name to refer to different values. Tip I recommend against using identical variable names for different data types unless you have a very good reason to do so. And, if you do need to use the same name, try using the plural of it for the array variable. For example, use $name for the scalar variable name and @names for the array variable name. This might avoid some confusion about what your code does in the future. Note Variable names in Perl are case-sensitive. This means that $varname, $VarName, $varName, and $VARNAME all refer to different variables. Each variable type will be discussed in its own section. You'll see how to name variables, set their values, and some of the uses to which they can be put. Scalar Variables Scalar variables are used to track single pieces of information. You would use them to hold the title of a book or the number of rooms in a house. You can use just about any name imaginable for a scalar variable as long as it begins with a $. Tip If you have programmed in Visual Basic, you need to be especially careful when naming variables. Just remember that all scalars begin with a $, not just strings, and that the $ starts the name; it doesn't end it. Let's jump right in and look at some variable names. [pseudo.gif] This scalar variable will hold the number of rooms. This scalar variable will hold the title of a book. This scalar variable conflicts with a Perl special variable that you'll learn about in [44]Chapter 12, "Using Special Variables." $numberOfRooms $bookTitle $0 Note It is generally a good idea to stay away from short variable names. Longer variable names are more descriptive and aid in understanding programs. Let me say a quick word about variable names. I always start my variable names with a lowercase letter and then make the first letter of each "word" in the name uppercase. Some programmers like to separate each word with an underscore. For example, $numberOfRooms would look like $number_of_rooms. Choose a method that you feel comfortable with and then stick with it. Being consistent will make your program more understandable. Most programmers try to use descriptive names for their variables. There is no practical limit to the length of a Perl variable name, but I like to keep them under 15 characters. Anything longer than that means that it will take a while to type them and increases the chances of spelling errors. Example: Assigning Values to Scalar Variables Now that you know what scalar variable names look like, we'll look at how you can assign a value to them. Assigning values to a variable is done with the equals (=) sign. [pseudo.gif] Assign a value of 23 to a variable called $numberOfRooms. Assign a value of Perl by Example to a variable called $bookTitle. $numberOfRooms = 23; $bookTitle = "Perl by Example"; Notice that you are assigning literal values to the variables. After assigning the values, you then can change them. Changing Values in Scalar Variables The next example will make a variable assignment and then change that variable's value using a second assignment. The second assignment will increment the value by five. [pseudo.gif] Assign a value of 23 to a variable called $numberOfRooms. Add 5 to the $numberOfRooms variable. $numberOfRooms = 23; $numberOfRooms = $numberOfRooms + 5; Note In Perl, you never have to declare, define, or allocate simple data types (for example: scalars, arrays, or associative arrays). When you use the variable for the first time, Perl either assigns it a zero if you need a number or an empty list if you need an array. Using a variable name is equivalent to defining it. Caution Letting Perl automatically initialize variables is fine for small programs. However, if you write professional programs that need to be maintained, you'll want to explicitly declare variables using the my() function. Explicitly declaring functions will reduce errors and improve the internal documentation of your programs. The my() function is discussed in [45]Chapter 5 "Functions." Array Variables You had a short introduction to arrays last chapter when you printed out entire arrays (with no spaces, remember?) using Perl's print statement. Now, you'll learn about arrays in more detail. Array variable names always begin with an @ character. Tip I remember that the @ sign starts array variables because "at" and "array" start with the same letter. Simple...but it works for me. The rules for naming array variables are the same as those for scalar variables. There are no rules. Well, none that you need to worry about. In fact, let's skip looking at variable names and get right to assigning arrays to variables, instead. Example: Assigning Values to Array Variables You use the equals (=) sign to assign values to array variables just like scalar values. We'll use one of the examples from [46]Chapter 2 "Numeric and String Literals"-reworked a little-so you'll already be familiar with part of the example. Tip The printing of the newline character is separated from the printing of the array for a reason. It has to do with how Perl interprets variables in different contexts. If you tried to use print @numberArray . "\n"; Perl thinks that you want to use @numberArray in a scalar context and won't print the elements of the array. It will print the number of elements instead. See the section, "Example: Determine the Number of Elements in an Array," later in this chapter. [pseudo.gif] Assign values to array variables. Print the array variables. Listing 3.1 shows values assigned to array variables. ____________________________________________________________________________________________________________________ Listing 3.1 03LST01.PL-Assigning Values to Array Variables @emptyArray = (); @numberArray = (12, 014, 0x0c, 34.34, 23.3E-3); @stringArray = ("This", "is", 'an', "array", 'of', "strings"); @mixedArray = ("This", 30, "is", 'a', "mixed array", 'of', 0x08, "items"); print "Here is an empty array:" . @emptyArray . "<- Nothing there!\n"; print @numberArray; print "\n"; print @stringArray; print "\n"; print @mixedArray; print "\n"; ____________________________________________________________________________________________________________________ This program will display: Here is an empty array:0<- Nothing there! 12121234.340.0233 Thisisanarrayofstrings This30isamixed arrayof8items In this example, we assign literal values to array variables and then display them using the print command. This is very similar to what we did in [47]Chapter 1 "Getting Your Feet Wet," except that we are temporarily storing the array values into variables before printing them. Suppose that you want to create one array from two smaller ones. You can do this by using the sub-arrays inside the assignment statement. [pseudo.gif] Create two small arrays using the range operator. Create an array that consists of the two small arrays. Print the array. @smallArrayOne = (5..10); @smallArrayTwo = (1..5); @largeArray = (@smallArrayOne, @smallArrayTwo); print @largeArray; When run, this program prints the array (5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5). Notice that the 5 is duplicated in the new array and that the elements are still in the same order as the sub-arrays. When you concatenate arrays in this manner, Perl does not sort them or modify their contents in any way. Example: Using Array Elements Individual elements of an array are accessed by prefixing the array name with a $ and using an index that indicates to Perl which element you want to use. Listing 3.2 creates an array of five elements and then prints each individual element. [pseudo.gif] Create an array with five elements. Print the array. Print each element of the array. ____________________________________________________________________________________________________________________ Listing 3.2 03LIST02.PL-Accessing Array Elements @array = (1..5); print @array; print "\n"; print $array[0]; print "\n"; print $array[1]; print "\n"; print $array[2]; print "\n"; print $array[3]; print "\n"; print $array[4]; print "\n"; ____________________________________________________________________________________________________________________ Listing 3.2 will print the following: 12345 1 2 3 4 5 Perl array indexes start at 0-well, they actually start at $[-but for the moment zero is good enough. Almost every Perl program uses zero as the base array subscript. Note The special variable, $[, is used to hold the base array subscript; usually, it is zero. However, it can be changed to any integer you want, even negative ones. Using a negative base array subscript probably will make your programs hard to understand, and I recommend against it. Other special variables are mentioned in [48]Chapter 12, "Using Special Variables." You can replace the numeric literal indexes in the above example with scalar variables. You can say: $index = 2 @array = (1..5); print $array[$index]; print "\n"; which would print 3. Example: Using Negative Subscripts Perl is definitely a language that will surprise you at times. In other languages, subscripts must be positive integers. However, Perl lets you use negative subscripts to access array elements in reverse order. Tip Using a negative subscript may come in handy if you need a fast way to get the value of the last element in an array. The program in Listing 3.3 assigns a five-element array to @array. Then, it uses the print statement and negative subscripts to print each array element in reverse order. ____________________________________________________________________________________________________________________ Listing 3.3 03LIST03.PL-Accessing Array Elements Using Negative Subscripts @array = (1..5); print @array; print "\n"; print $array[-1]; print "\n"; print $array[-2]; print "\n"; print $array[-3]; print "\n"; print $array[-4]; print "\n"; print $array[-5]; print "\n"; ____________________________________________________________________________________________________________________ Listing 3.3 will print the following: 12345 5 4 3 2 1 Example: Determining the Number of Elements in an Array If you need to determine the number of elements that an array contains, you can assign the array to a scalar variable. In fact, anytime that an array is used when a scalar is needed, the value used will be the number of array elements. [pseudo.gif] Create an array with five elements. Assign the array size to the $numberOfElements scalar variable. Multiply the array size by 2 and assign that value to $doubleTheSize. Print the scalar variables. @array = (1..5); $numberOfElements = @array; $doubleTheSize = 2 * @array; print "The number of array elements is: " . $numberOfElements . "\n"; print "Double the number of array elements is: " . $doubleTheSize . "\n"; When this program runs, it will assign a value of 5 to $numberOfElements and 10 to $doubleTheSize. Tip Perl has the powerful ability to return the number of array elements when the array variable is used in a scalar context. However, this ability can be confusing while looking at someone else's program if you don't remember that there is a difference between scalar contexts and array contexts. Example: How to Grab a Slice (or Part) of an Array At times you will need to use some elements of an array and not others. You might want to assign array elements to scalars or to another array. Using only part of an array is done with an array slice. An array slice uses an @ character and the square brackets ([]) to create a sub-array consisting of selected individual elements. For example, [pseudo.gif] Create a four-element array and assign it to @array. Use an array slice to assign the first and third elements to $first and $third. Use an array slice to assign the second half of the array to @half. Print @array, $first, $third, and @half to verify their values. Tranpose the first and last elements in @array. Print @array to verify that the elements have been switched. @array = ("One", "Two", "Three", "Four"); ($first, $third) = @array[0, 2]; @half = @array[2, 3]; print("\@array=@array\n"); print("\$first=$first \$third=$third\n"); print("\@half=@half\n"); @array[0, 3] = @array[3, 0]; print("\@array=@array\n"); This program will display: @array=One Two Three Four $first=One $third=Three @half=Three Four @array=Four Two Three One You won't really understand the power of array slices until you learn about functions in [49]Chapter 5 "Functions." At that point, you'll see that functions (sub- programs that you invoke using a function name) can return a value. When calling a function that returns the time and date in an array, a slice can be used to "grab" just those elements in which you are interested. For example, just the year or just the hour. Associative Array Variables Now it's time to look at associative arrays. These are definitely the most complicated of the three data types. And yet, they are just another type of array. You've already seen that array elements can be accessed with both positive and negative integer indexes. Well, with associative arrays you can use any scalar data type as an index. Associative array names start with the % character. You will see associative arrays called hashes at times. The term "hash" refers to how associative array elements are stored in memory. "Hash" also is much shorter than "associative array," and therefore much easier to type and talk about. Example: Assigning Values to Associative Array Variables Before we discuss associative arrays further, let's see how to assign values to them. When defining a whole array, you can use the same representation that was used for arrays-just remember that you need two items for every element in the associative array. You also can assign values to individual elements of an associative array by using curly braces ({}) around the index key. [pseudo.gif] Create an associative array with three elements. Each element consists of twovalues: the lookup key and its associated value. Add a single element to the associative array. %associativeArray = ("Jack A.", "Dec 2", "Joe B.", "June 2", "Jane C.", "Feb 13"); $associativeArray{"Jennifer S."} = "Mar 20"; print "Joe's birthday is: " . $associativeArray{"Joe B."} . "\n"; print "Jennifer's birthday is: " . $associativeArray{"Jennifer S."} . "\n"; This program will print the following: Joe's birthday is: June 2 Jennifer's birthday is: Mar 20 Perl will extend the associative array as needed when you assign values to keys. An internal table is used to keep track of which keys are defined. If you try to access an undefined key, Perl will return a null or blank string. You can do a lot with associative arrays, but first you need more background in operators, functions, and statements. We'll handle these topics in future chapters. In the next section, we look at string literals and how they interact with variables. Double-Quoted Strings Revisited Perl strings have some additional functionality that was not mentioned in [50]Chapter 1 "Getting Your Feet Wet," because you needed to know a little about variables beforehand. Now that you are familiar with how Perl handles basic variables, let's look a little deeper at double-quoted strings. Example: Variable Interpolation Interpolation is a big word for a simple concept-replacement of a variable name with its value.You already know that variable names are a "stand-in" for a value. If $var is equal to 10, the $var + 20 is really 10 + 20. In Perl, this concept also is used inside strings. You can combine variables and strings in a very natural way using Perl. Simply place the variable directly inside a double-quoted string, and its value automatically will be interpolated as needed. Tip Until now, each time you printed an array, all of the elements were mashed together (concatenated). Having the array element printed without delimiting spaces made determining the individual items very difficult. If, when printing, you enclose the array in quotes, Perl automatically will separate the array elements with a space. [pseudo.gif] Create a five-element array. Print the element with spaces between the elements. @array = (1..5); print "@array\n"; This program will print: 1 2 3 4 5 Perl runs into a problem when you want to use a variable and then append some letters to the end. Let's illustrate this with scalar variables. [pseudo.gif] Assign the value large to a scalar variable. Print a string with an embedded variable. $word = "large"; print "He was a $wordr fellow."; This program will print: He was a fellow. In this example, Perl looks for the variable $wordr-obviously not what I intended to do. I meant for the string "He was a larger fellow" to print. This problem can be corrected by doing the following: $word = "large"; print "He was a " . $word . "r fellow."; Because the variable is separate, Perl sees the correct variable name. Then the string concatenation operator joins the three strings together. This method of programming makes it very easy to see where the variable is. Remember when I said that Perl enables you to do something in many different ways? You also could do the following: print "He was a ${word}r fellow."; The curly braces around the variable name tell Perl where the name starts and ends. Note If you're ever on IRC and see longhair_ or Kirby Hughes (khughes@netcom.com), tell him I said "thanks." He remembered that curly braces can be used in this manner. Example: Using the $" Special Variable Perl has a number of special variables. These variables each have a predefined meaning. [51]Chapter 12, "Using Special Variables," introduces you to quite a few Perl special variables. However, because we were just looking at strings and arrays, we also should spend a moment and talk about the $" special variable. [pseudo.gif] Set the $" special variable to the comma character. Create a five-element array. Print the element with commas between the elements. $" = ","; @array = (1..5); print "@array\n"; This program will print: 1,2,3,4,5 Of course, because $" is a scalar variable you also could assign a longer string to it. For instance, you could use $" = ", " to add both a comma and a space between the array elements. Summary This chapter introduced you to the concept of variables-places in computer memory that are used to hold values as your program runs. They are called variables because you can assign different values to them as needed. You read about three types of variables: scalars, arrays, and associative arrays. Each variable type has its own unique character that is used to begin a variable name. Scalars use a $, Arrays use an @, and associative arrays use a %. Tip When I first started to learn Perl, I found it difficult to remember which character begins which variable type. Then, I saw this chart on the Internet and things became clearer: $ = "the" (singular) @ = "those" (plural) % = "relationship" Each variable type must start with a different character that uses a separate namespace. This means that $varName and @varName are different variables. Remember, too, that variable names in Perl are case-sensitive. A lot of this chapter looked at assigning values to variables using the equals (=) sign. We also reviewed how to use positive and negative subscripts (such as $array[1]) to access array elements. Associative array elements are accessed a little differently-curly braces are used instead of square braces (for example, $associativeArray{"Jack B."}). And finally, we took another look at double-quoted strings to see how variable interpolation works. You saw that Perl automatically replaces variables inside double-quoted strings. When arrays are printed inside strings, their elements are separated by the value of $"-which is usually a space. Review Questions Answers to Review Questions are in Appendix A. 1. What are the three basic data types that Perl uses? 2. How can you determine the number of elements in an array? 3. What is a namespace? 4. What is the special variable $[ used for? 5. What is the special variable $" used for? 6. What is the value of a variable when it is first used? 7. What is an associative array? 8. How can you access associative array elements? Review Exercises 1. Create an array called @months. It should have 12 elements in it with the names of the months represented as strings. 2. Create a string that interpolates that value of the variable $numberOfBooks. 3. Using the range operator (..), create an array with the following elements: 1, 5, 6, 7, 10, 11, 12. 4. Using the array created in the last exercise, create a print command to display the last element. 5. Create an associative array that holds a list of five music artists and a rating for them. Use "good," "bad," and "indifferent" as the ratings. 6. Using the array created in the last exercise, create a print command to display the last element. ____________________________________________________________________________________________________________________ [52][pc.gif] [53][cc.gif] [54][hb.gif] [55][nc.gif] _________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________ Chapter 4 Operators ____________________________________________________________________________________________________________________ CONTENTS * [27]Operator Types * [28]The Binary Arithmetic Operators + [29]Example: The Exponentiation Operator + [30]Example: The Modulus Operator * [31]The Unary Arithmetic Operators + [32]Example: The Pre-increment Operator + [33]Example: The Pre-decrement Operator + [34]Example: The Post-increment Operator * [35]The Logical Operators + [36]Example: The "AND" Operator (&&) + [37]Example: The "OR" Operator (||) * [38]Example: The "NOT" Operator (!) * [39]The Bitwise Operators + [40]Example: Using the &, |, and ^ Operators + [41]Example: Using the >> and << Operators * [42]The Numeric Relational Operators + [43]Example: Using the <=> Operator * [44]The String Relational Operators + [45]Example: Using the cmp Operator * [46]The Ternary Operator + [47]Example: Using the Ternary Operator to Assign Values * [48]The Range Operator (..) + [49]Example: Using the Range Operator * [50]The String Operators (. and x) + [51]Example: Using the Concatenation Operator + [52]Example: Using the Repetition Operator * [53]The Assignment Operators + [54]Example: Assignment Using Array Slices + [55]Example: Assigning an Array to Scalar Variables * [56]Order of Precedence + [57]Example: Order of Precedence * [58]Summary * [59]Review Questions * [60]Review Exercises ____________________________________________________________________________________________________________________ The operators in a computer language tell the computer what actions to perform. Perl has more operators than most languages. You've already seen some operators-like the equals or assignment operator(=)-in this book. As you read about the other operators, you'll undoubtedly realize that you are familiar with some of them. Trust your intuition; the definitions that you already know will probably still be true. Operators are instructions you give to the computer so that it can perform some task or operation. All operators cause actions to be performed on operands. An operand can be anything that you perform an operation on. In practical terms, any particular operand will be a literal, a variable, or an expression. You've already been introduced to literals and variables. A good working definition of expression is some combination of operators and operands that are evaluated as a unit. [61]Chapter 6 "Statements," has more information about expressions. Operands are also recursive in nature. In Perl, the expression 3 + 5-two operands and a plus operator-can be considered as one operand with a value of 8. For instance, (3 + 5) - 12 is an expression that consists of two operands, the second of which is subtracted from the first. The first operand is (3 + 5) and the second operand is 12. This chapter will discuss most of the operators available to you in Perl . You'll find out about many operator types and how to determine their order of precedence. And, of course, you'll see many examples. Precedence is very important in every computer language and Perl is no exception. The order of precedence indicates which operator should be evaluated first. I like to think about operators in the same way I would give instructions to the driver of a car. I might say "turn left" or "turn right." These commands could be considered directional operators in the same way that + and - say "add this" or "subtract this." If I yell "stop" while the car is moving, on the other hand, it should supersede the other commands. This means that "stop" has precedence over "turn left" and "turn right." The "Order of Precedence" section later in this chapter will discuss precedence in more detail. Operator Types Perl supports many types of operators. Table 4.1 shows all of the operator types in the Perl language. This chapter discusses the more commonly used types in detail. You can learn about any type not discussed in this chapter by looking in the chapter referenced in that type's description in Table 4.1. Table 4.1 The Perl Operator Types Operator Types Description Arithmetic These operators mirror those you learned in grade school. Addition, Subtraction, and Multiplication are the bread and butter of most mathematical statements. Assignment These operators are used to assign a value to a variable. Algebra uses assignment operators. For example, in the statement X = 6, the equal sign is the assignment operator. Binding These operators are used during string comparisons and are ex-plained in [62]Chapter 10, "Regular Expressions." Bitwise These operators affect the individual bits that make up a value. For example, the value 3 is also 11 in binary notation or ((1 ¥ 2) + 1). Each character in binary notation represents a bit, which is the smallest piece of a computer's memory that you can modify. Comma The comma operator has two functions. It serves to separate array or list elements (see [63]Chapter 2 "Numeric and String Literals") and it serves to separate expressions (see [64]Chapter 6 "Statements"). File Test These operators are used to test for various conditions associated with files. You can test for file existence, file type, and file access rights among other things. See [65]Chapter 9 "Using Files," for more information. List List operators are funny things in Perl. They resemble function calls in other languages. List operators are discussed in [66]Chapter 5 "Functions." Logical These operators implement Boolean or true/false logic. In the sentence "If John has a fever AND John has clogged sinuses OR an earache AND John is NOT over 60 years old, then John has a cold," the AND, OR, and NOT are acting as logical operators. The low precedence logical operators will be discussed separately in [67]Chapter 13, "Handling Errors and Signals." Numeric Relational These operators allow you to test the Relationalrelationship of one numeric variable to another. For example, is 5 GREATER THAN 12? Postfix A member of this group of opera-tors-(), [], {}- appears at the end of the affected objects. You've already seen them used in [68]Chapter 3 "Vari-ables" for arrays and associative arrays. The parentheses operators are also used for list operators as discussed in [69]Chapter 5 "Functions." Range The range operator is used to create a range of elements in arrays. It can also be used in a scalar context. Reference The reference operators are used to manipulate variables. For more information, see [70]Chapter 8 "Refer-ences." String The string concatenation operator is used to join two strings together. The string repetition operator is used to repeat a string. String Relational These operators allow you to test the relationship of one string variable to another. For example, is "abc" GREATER THAN "ABC"? Ternary The ternary operator is used to choose between two choices based on a given condition. For instance: If the park is within one mile, John can walk; otherwise, he must drive. The Binary Arithmetic Operators There are six binary arithmetic operators: addition, subtraction, multiplication, exponentiation, division, and modulus. While you may be unfamiliar with the modulus operator, the rest act exactly as you would expect them to. Table 4.2 lists the arithmetic operators that act on two operands-the binary arithmetic operators. In other words, the addition (+) operator can be used to add two numbers together like this: 4 + 5. The other binary operators act in a similar fashion. Table 4.2 The Binary Arithmetic Operators Operator Description op1 + op2 Addition op1 - op2 Subtraction op1 * op2 Multiplication op1 ** op2 Exponentiation op1 / op2 Division op1 % op2 Modulus Example: The Exponentiation Operator The exponentiation operator is used to raise a number to a power. For instance, 2 **4 is equivalent to 2 * 2 * 2 * 2, which equals 16. You'll occasionally see a reference to when exponentiation discussion turns to how efficient a given algorithm is, but I've never needed it for my everyday programming tasks. In any case, here's a quick look at how it works. This example shows how to raise the number 4 to the 3rd power, which is equivalent to 4 * 4 * 4 or 64. [pseudo.gif] Assign $firstVar the value of 4. Raise 4 to the 3rd power using the exponentiation operator and assign the new value to $secondVar. Print $secondVar. $firstVar = 4; $secondVar = $firstVar ** 3; print("$secondVar\n"); This program produces the following output: 64 Example: The Modulus Operator The modulus operator is used to find the remainder of the division between two integer operands. For instance, 10 % 7 equals 3 because 10 / 7 equals 1 with 3 left over. I've found the modulus operator to be useful when my programs need to run down a list and do something every few items. This example shows you how to do something every 10 items. [pseudo.gif] Start a loop that begins with $index equal to zero. If the value of $index % 10 is equal to zero then the print statement will be executed. Print the value of $index followed by space. The program will increase the value of $index by one and then loop back to the start of the if statement. ____________________________________________________________________________________________________________________ Listing 4.1 O4LST01.PL-How to Display a Message Every Ten Items for ($index = 0; $index <= 100; $index++) { if ($index % 10 == 0) { print("$index "); } } ____________________________________________________________________________________________________________________ When this program is run, the output should look like the following: 0 10 20 30 40 50 60 70 80 90 100 Notice that every tenth item is printed. By changing the value on the right side of the modulus operator, you can affect how many items are processed before the message is printed. Changing the value to 15 means that a message will be printed every 15 items. [71]Chapter 7 "Control Statements," describes the if and for statement in detail. The Unary Arithmetic Operators The unary arithmetic operators act on a single operand. They are used to change the sign of a value, to increment a value, or to decrement a value. Incrementing a value means to add one to its value. Decrementing a value means to subtract one from its value. Table 4.3 lists Perl's unary operators. Table 4.3 The Unary Arithmetic Operators Operator Description Changing the sign of op1 +op1 Positive operand -op1 Negative operand Changing the value of op1 before usage ++op1 Pre-increment operand by one --op1 Pre-decrement operand by one Changing the value of op1 after usage op1++ Post-increment operand by one op1-- Post-decrement operand by one Arithmetic operators start to get complicated when unary operators are introduced. Just between you and me, I didn't get the hang of negative numbers until someone said: "If you have five pieces of chocolate, and add negative two pieces..." You might think that adding negative numbers is strange. Not so. I know that you will never write a mathematics statement such as the following: 345 + -23. However, you might use 354 + $gasBill, where $gasBill represents a 23-dollar debit-in other words, a negative number. Using the unary plus operator does nothing, and Perl ignores it. The unary negative operator, however, changes the meaning of a value from positive to negative or vice versa. For instance, if you had a variable called $firstVar equal to 34, then printing -$firstVar would display -34. The ++ and -- operators are examples of the Perl shorthand notation. If the ++ or -- operators appear in front of the operand, the operand is incremented or decremented before its value is used. If the ++ or -- operators appear after the operand, then the value of the operand is used and then the operand is incremented or decremented as required. Example: The Pre-increment Operator This example shows how to use the pre-increment operator (++). [pseudo.gif] The $numPages variable is assigned a value of 5. The $numPages variable is incremented by 1. The $numPages variable is printed. The $numPages variable is assigned a value of 5. The $numPages variables are incremented using the pre-increment operator and then printed. ____________________________________________________________________________________________________________________ Listing 4.2 04LST02.PL-Using Pre-increment Operator # Original Way $numPages = 5; $numPages = $numPages + 1; print($numPages, "\n"); # New Way $numPages = 5; print(++$numPages, "\n"); ____________________________________________________________________________________________________________________ This program produces the following output: 6 6 You can see that the new way of coding is shorter than the original way. The statement print(++$numPages, "\n"); will first increment the $numPages variable and then allow the print command to use it. Example: The Pre-decrement Operator This example shows how to use the pre-decrement operator (--). [pseudo.gif] The $numPages variable is assigned a value of 5. The $numPages variable is decremented by 1. The $totalPages variable is assigned the value of $numPages + 5. The $numPages and $totalPages variables are printed. The $numPages variable is assigned a value of 5. The $numPages variable is decremented and then $numPages + 5 is assigned to $totalPages. The $numPages and $totalPages variables are printed. ____________________________________________________________________________________________________________________ Listing 4.3 04LST03.PL-Using Pre-increment Operator # Original Way $numPages = 5; $numPages = $numPages - 1; $totalPages = $numPages + 5; print("$numPages $totalPages \n"); # New Way $numPages = 5; $totalPages = --$numPages + 5; print("$numPages $totalPages \n"); ____________________________________________________________________________________________________________________ This program produces the following output: 4 9 4 9 The statement $totalPages = --$numPages + 5; will first decrement the $numPages variable and then allow the plus operator to use it. Example: The Post-increment Operator This example shows how to use the ++ and -- post-increment operators. [pseudo.gif] The $numPages variable is assigned a value of 5. The $totalPages variable is assigned the value of $numPages. The $numPages variable is incremented by one. The $numPages and $totalPages variables are printed. The $numPages variable is assigned a value of 5. The $totalPages variable is assigned the value of $numPages and then the $numPages variable is incremented. The $numPages and $totalPages variables are printed. ____________________________________________________________________________________________________________________ Listing 4.4 04LST04.PL-Using Pre-increment Operator # Original Way $numPages = 5; $totalPages = $numPages; $numPages = $numPages + 1; print("$numPages $totalPages \n"); # New Way $numPages = 5; $totalPages = $numPages++; print("$numPages $totalPages \n"); ____________________________________________________________________________________________________________________ The program produces the following output: 6 5 6 5 The statement $totalPages = $numPages++; will first assign the value of $numPages to $totalPages and then increment the $numPages variable. It may help to know that post-increment and post-decrement operators do not affect the value of the variable on the left side of the assignment operator. If you see post-increment or post-decrement operators, evaluate the statement by ignoring them. Then, when done, apply the post-increment and post-decrement operators as needed. Tip The Perl programming language has many ways of achieving the same objective. You will become a more efficient programmer if you decide on one approach to incrementing/decrementing and use it consistently. The Logical Operators Logical operators are mainly used to control program flow. Usually, you will find them as part of an if, a while, or some other control statement. Control statements are discussed in [72]Chapter 7 "Control Statements." Table 4.4 The Logical Operators Operator Description op1 && op2 Performs a logical AND of the two operands. op1 || op2 Performs a logical OR of the two operands. !op1 Performs a logical NOT of the operand. The concept of logical operators is simple. They allow a program to make a decision based on multiple conditions. Each operand is considered a condition that can be evaluated to a true or false value. Then the value of the conditions is used to determine the overall value of the op1 operator op2 or !op1 grouping. The following examples demonstrate different ways that logical conditions can be used. Example: The "AND" Operator (&&) The && operator is used to determine whether both operands or conditions are true. Table 4.5 shows the results of using the && operator on the four sets of true/false values. Table 4.5 The && Result Table Op1 Op2 Op1 && Op2 0 0 0 1 0 0 0 1 0 1 1 1 [pseudo.gif] If the value of $firstVar is 10 AND the value of $secondVar is 9, then print the error message. if ($firstVar == 10 && $secondVar == 9) { print("Error!"); }; If either of the two conditions is false or incorrect, then the print command is bypassed. Example: The "OR" Operator (||) The || operator is used to determine whether either of the conditions is true. Table 4.6 shows the results of using the || operator on the four sets of true/false values. Table 4.6 The || Result Table 0p1 0p2 0p1 || 0p2 0 0 0 1 0 1 0 1 1 1 1 1 [pseudo.gif] If the value of $firstVar is 9 OR the value of $firstVar is 10, then print the error message. if ($firstVar == 9 || $firstVar == 10) { print("Error!"); If either of the two conditions is true, then the print command is run. Caution If the first operand of the || operator evaluates to true, the second operand will not be evaluated. This could be a source of bugs if you are not careful. For instance, in the following code fragment: if ($firstVar++ || $secondVar++) { print("\n"); } variable $secondVar will not be incremented if $firstVar++ evaluates to true. Note You might be tempted to try the following: if ($firstVar == (9 || 10)) { print("Error!"); }; to determine if $firstVar is equal to either 9 or 10. Don't do it. Perl doesn't work this way. First, the expression (9 || 10) will be evaluated to be equal to 9. And then, Perl will evaluate $firstVar == 9. The correct method for testing $firstVar is to explicitly state each sub-condition that needs to be met in order for the entire condition to return true. The correct way is: if ($firstVar == 9 || $firstVar == 10) { print("Error!"); }; Example: The "NOT" Operator (!) The ! operator is used to convert true values to false and false values to true. In other words, it inverts a value. Perl considers any non-zero value to be true-even string values. Table 4.7 shows the results of using the ! operator on true and false values. Table 4.7 The ! Result Table Op1 !Op1 0 1 1 0 [pseudo.gif] Assign a value of 10 to $firstVar. Negate $firstVar-!10 is equal to 0-and assign the new value to $secondVar. If the $secondVar variable is equal to zero, then print the string "zero." $firstVar = 10; $secondVar = !$firstVar; if ($secondVar == 0) { print("zero\n"); }; The program produces the following output: zero You could replace the 10 in the first line with "ten," 'ten,' or any non-zero, non-null value. The Bitwise Operators The bitwise operators, listed in Table 4.8, are similar to the logical operators, except that they work on a smaller scale. Table 4.8 The Bitwise Operators Operator Description op1 & op2 The AND operator compares two bits and generates a result of 1 if both bits are 1; otherwise, it returns 0. op1 | op2 The OR operator compares two bits and generates a result of 1 if the bits are complementary; otherwise, it returns 0. op1 ^ op2 The EXCLUSIVE-OR operator compares two bits and gener-ates a result of 1 if either or both bits are 1; otherwise, it returns 0. ~op1 The COMPLEMENT operator is used to invert all of the bits of the operand. I've never found this useful, so we'll skip looking at an example of it. op1 >> op2 The SHIFT RIGHT operator moves the bits to the right, discards the far right bit, and assigns the leftmost bit a value of 0. Each move to the right effectively divides op1 in half. op1 << op2 The SHIFT LEFT operator moves the bits to the left, discards the far left bit, and assigns the rightmost bit a value of 0. Each move to the left effectively multiplies op1 by 2. Note Both operands associated with the bitwise operator must be integers. Bitwise operators are used to change individual bits in an operand. A single byte of computer memory-when viewed as 8 bits-can signify the true/false status of 8 flags because each bit can be used as a boolean variable that can hold one of two values: true or false. A flag variable is typically used to indicate the status of something. For instance, computer files can be marked as read-only. So you might have a $fReadOnly variable whose job would be to hold the read-only status of a file. This variable is called a flag variable because when $fReadOnly has a true value, it's equivalent to a football referee throwing a flag. The variable says, "Whoa! Don't modify this file." When you have more than one flag variable, it might be more efficient to use a single variable to indicate the value of more than one flag. The next example shows you how to do this. Example: Using the &, |, and ^ Operators The first step to using bitwise operators to indicate more than one flag in a single variable is to define the meaning of the bits that you'd like to use. Figure 4.1 shows an example of 8 bits that could be used to control the attributes of text on a display. [73]Figure 4.1 : The bit definition of a text attribute control variable. If you assume that $textAttr is used to control the text attributes, then you could set the italic attribute by setting $textAttr equal to 128 like this: $textAttr = 128; because the bit pattern of 128 is 10000000. The bit that is turned on corresponds to the italic position in $textAttr. Now let's set both the italic and underline attributes on at the same time. The underline value is 16, which has a bit pattern of 00010000. You already know the value for italic is 128. So we call on the OR operator to combine the two values. $textAttr = 128 | 16; or using the bit patterns (this is just an example-you can't do this in Perl) $textAttr = 10000000 | 00010000; If you look back at Table 4.8 and evaluate each bit, you will see that $textAttr gets assigned a value of 144 (or 10010000 as a bit pattern). This will set both italic and underline attributes on. The next step might be to turn the italic attribute off. This is done with the EXCLUSIVE-OR operator, like so: $textAttr = $textAttr ^ 128; Example: Using the >> and << Operators The bitwise shift operators are used to move all of the bits in the operand left or right a given number of times. They come in quite handy when you need to divide or multiply integer values. This example will divide by 4 using the >> operator. [pseudo.gif] Assign a value of 128 to the $firstVar variable. Shift the bits inside $firstVar two places to the right and assign the new value to $secondVar. Print the $secondVar variable. $firstVar = 128; $secondVar = $firstVar >> 2; print("$secondVar\n"); The program produces the following output: 32 Let's look at the bit patterns of the variables before and after the shift operation. First, $firstVar is assigned 128 or 10000000. Then, the value in $firstVar is shifted left by two places. So the new value is 00100000 or 32, which is assigned to $secondVar. The rightmost bit of a value is lost when the bits are shifted right. You can see this in the next example. This example will divide by 8 using the >> operator. [pseudo.gif] Assign a value of 129-a bit pattern of 10000001-to $firstVar. Every odd value has the rightmost bit set. Shift the bits inside $firstVar three places to the right and assign the new value to $secondVar. Print the $secondVar variable. $firstVar = 129; $secondVar = $firstVar >> 3; print("$secondVar\n"); The program produces the following output: 16 Since the bit value of 16 is 00010000, you can tell that the rightmost bit has disappeared. Here's a quick example using the << operator. We'll multiply 128 by 8. [pseudo.gif] Assign a value of 128 to the $firstVar variable. Shift the bits inside $firstVar two places to the left and assign the new value to $secondVar. Print the $secondVar variable. $firstVar = 128; $secondVar = $firstVar << 3; print $secondVar; The program produces the following output: 1024 The value of 1024 is beyond the bounds of the 8 bits that the other examples used. This was done to show you that the number of bits available for your use is not limited to one byte. You are really limited by however many bytes Perl uses for one scalar variable-probably 4. You'll need to read the Perl documentation that came with the interpreter to determine how many bytes your scalar variables use. The Numeric Relational Operators The numeric relational operators, listed in Table 4.9, are used to test the relationship between two operands. You can see if one operand is equal to another, if one operand is greater than another, or if one operator is less than another. Note It is important to realize that the equality operator is a pair of equal signs and not just one. Quite a few bugs are introduced into programs because people forget this rule and use a single equal sign when testing conditions. Table 4.9 The Numeric Relational Operators Operator Description The Equality Operators op1 == op2 This operator returns true if op1 is equal to op2. For example, 6 == 6 is true. op1 != op2 This operator returns true if op1 is not equal to op2. For example, 6 != 7 is true. The Comparison Operators op1 < op2 This operator returns true if op1 is less than op2. For example, 6 < 7 is true. Op1 <= op2 This operator returns true if op1 is less than or equal to op2. For example, 7 <= 7 is true. op1 > op2 This operator returns true if op1 is greater than op2. For example, 6 > 5 is true. op1 >= op2 This operator returns true if op1 is greater than or equal to op2. For example, 7 >= 7 is true. op1 <=> op2 This operator returns 1 if op1 is greater than op2, 0 if op1 equals op2, and -1 if op1 is less than op2. You will see many examples of these operators when you read about controlling program flow in [74]Chapter 7 "Control Statements." Therefore, I'll show only an example of the <=> comparison operator here. Example: Using the <=> Operator The number comparison operator is used to quickly tell the relationship between one operand and another. It is frequently used during sorting activities. Tip You may sometimes see the <=> operator called the spaceship operator because of the way that it looks. [pseudo.gif] Set up three variables. Print the relationship of each variable to the variable $midVar. $lowVar = 8; $midVar = 10; $hiVar = 12; print($lowVar <=> $midVar, "\n"); print($midVar <=> $midVar, "\n"); print($hiVar <=> $midVar, "\n"); The program produces the following output: -1 0 1 The -1 indicates that $lowVar (8) is less than $midVar (10). The 0 indicates that $midVar is equal to itself. And the 1 indicates that $hiVar (12) is greater than $midVar (10). The String Relational Operators The string relational operators, listed in Table 4.10, are used to test the relationship between two operands. You can see if one operand is equal to another, if one operand is greater than another, or if one operator is less than another. Table 4.10 The String Relational Operators Operator Description The Equality Operators op1 eq op2 This operator returns true if op1 is equal to op2. For example, "b" eq "b" is true. Op1 ne op2 This operator returns true if op1 is not equal to op2. For example, "b" ne "c" is true. The Comparison Operators op1 lt op2 This operator returns true if op1 is less than op2. For example, "b" lt "c" is true. Op1 le op2 This operator returns true if op1 is less than or equal to op2. For example, "b" le "b" is true. Op1 gt op2 This operator returns true if op1 is greater than op2. For example, "b" gt "a" is true. Op1 ge op2 This operator returns true if op1 is greater than or equal to op2. For example, "b" ge "b" is true. Op1 cmp op2 This operator returns 1 if op1 is greater than op2, 0 if op1 equals op2, and -1 if op1 is less than op2. String values are compared using the ASCII values of each character in the strings. You will see examples of these operators when you read about control program flow in [75]Chapter 7 "Control Statements." So, we'll only show an example of the cmp comparison operator here. You may want to glance at Appendix E, "ASCII Table," to see all of the possible ASCII values. Example: Using the cmp Operator The string comparison operator acts exactly like the <=> operator except that it is designed to work with string operands. This example will compare the values of three different strings. [pseudo.gif] Set up three variables. Print the relationship of each variable to the variable $midVar. $lowVar = "AAA"; $midVar = "BBB"; $hiVar = "ccC"; print($lowVar cmp $midVar, "\n"); print($midVar cmp $midVar, "\n"); print($hiVar cmp $midVar, "\n"); The program produces the following output: -1 0 1 Notice that even though strings are being compared, a numeric value is returned. You may be wondering what happens if the strings have spaces in them. Let's explore that for a moment. $firstVar = "AA"; $secondVar = " A"; print($firstVar cmp $secondVar, "\n"); The program produces the following output: 1 which means that "AA" is greater than " A" according to the criteria used by the cmp operator. The Ternary Operator The ternary is actually a sequence of operators. The operator is used like this: CONDITION-PART ? TRUE-PART : FALSE-PART which is shorthand for the following statement: if (CONDITION-PART) { TRUE-PART } else { FALSE-PART } You can find more information about if statements in [76]Chapter 7 "Control Statements." The value of the entire operation depends on the evaluation of the CONDITION-PART section of the statement. If the CONDITION-PART evaluates to true, then the TRUE-PART is the value of the entire operation. If the CONDITION-PART evaluates to false, then the FALSE-PART is the value of the entire operation. Tip The ternary operator is also referred to as the conditional operator by some references. Example: Using the Ternary Operator to Assign Values I frequently use the ternary operator to assign a value to a variable when it can take one of two values. This use of the operator is fairly straightforward. [pseudo.gif] If $firstVar is zero, then assign $secondVar a value of zero. Otherwise, assign $secondVar the value in the first element in the array @array. $secondVar = ($firstVar == 0) ? 0 : $array[0]; The ternary operator can also be used to control which code sections are performed. However, I recommend against this use because it makes the program harder to read. I believe that operators should affect variables, not program flow. [pseudo.gif] The CONDITION-PART evaluates to true so the $firstVar variable is incremented. 1 ? $firstVar++ : $secondVar++; [pseudo.gif] The CONDITION-PART evaluates to false so the $secondVar variable is incremented. 0 ? $firstVar++ : $secondVar++; In this example, you get a chance to see how the language can be abused. When you have more than two actions to consider, you can nest ternary operators inside each other. However, as you can see the result is confusing code. [pseudo.gif] Assign one of four values to $firstVar depending on the value of $temp. $firstVar = $temp == 0 ? $numFiles++ : ($temp == 1 ? $numRecords++ : ($temp == 3 ? $numBytes++ : $numErrors++)); Tip Abusing the language in this manner will make your programs difficult to understand and maintain. You can use the if statement for better looking and more maintainable code. See [77]Chapter 7 "Control Statements," for more information. If you'd like to see a really strange use of the ternary operator, take a look at this next example. It uses the ternary operator to determine which variable gets assigned a value. $firstVar = 1; $secondVar = 1; $thirdVar = 1; ($thirdVar == 0 ? $firstVar : $secondVar) = 10; print "$firstVar\n"; print "$secondVar\n"; print "$thirdVar\n"; The program produces the following output: 1 10 1 The line ($thirdVar == 0 ? $firstVar : $secondVar) = 10; is equivalent to the following control statement: if ($thirdVar ==0) { $firstVar = 10; } else { $secondVar = 10; } This use of the ternary operator works because Perl lets you use the results of evaluations as lvalues. An lvalue is anything that you can assign a value to. It's called an lvalue because it goes on the left side of an assignment operator. Note Some programmers might think that this use of the ternary operator is as bad as using it to control program flow. However, I like this ability because it gives you the ability to concisely determine which variable is the target of an assignment. The Range Operator (..) The range operator was already introduced to you in [78]Chapter 3 "Variables," when you read about arrays. I review its use here-in an array context-in a bit more detail. Example: Using the Range Operator When used with arrays, the range operator simplifies the process of creating arrays with contiguous sequences of numbers and letters. We'll start with an array of the numbers one through ten. [pseudo.gif] Create an array with ten elements that include 1, 2, 3, 4, 5, 6, 7, 8, 9, and 10. @array = (1..10); You can also create an array of contiguous letters. [pseudo.gif] Create an array with ten elements that include A, B, C, D, E, F, G, H, I , and J. @array = ("A".."J"); And, of course, you can have other things in the array definition besides the range operator. [pseudo.gif] Create an array that includes AAA, 1, 2, 3, 4, 5, A, B, C, D, and ZZZ. @array = ("AAA", 1..5, "A".."D", "ZZZ"); You can use the range operator to create a list with zero-filled numbers. [pseudo.gif] Create an array with ten elements that include the strings 01, 02, 03, 04, 05, 06, 07, 08, 09, and 10. @array = ("01".."10"); And you can use variables as operands for the range operator. [pseudo.gif] Assign a string literal to $firstVar. Create an array with ten elements that include the strings 01, 02, 03, 04, 05, 06, 07, 08, 09, and 10. $firstVar = "10"; @array = ("01"..$firstVar); If you use strings of more than one character as operands, the range operator will increment the rightmost character by one and perform the appropriate carry operation when the number 9 or letter z is reached. You'll probably need to see some examples before this makes sense. I know that I had trouble figuring it out. So here goes. You've already seen "A".."Z," which is pretty simple to understand. Perl counts down the alphabet until Z is reached. Caution The two ranges "A".."Z" and "a".."Z" are not identical. And the second range does not contain all lowercase letters and all uppercase letters. Instead, Perl creates an array that contains just the lowercase letters. Apparently, when Perl reaches the end of the alphabet-whether lowercase or uppercase-the incrementing stops. What happens when a two-character string is used as an operand for the range operator? Let's find out. [pseudo.gif] Create an array that includes the strings aa, ab, ac, ad, ae, and af. @array = ("aa" .. "af"); This behaves as you'd expect, incrementing along the alphabet until the f letter is reached. However, if you change the first character of one of the operands, watch what happens. [pseudo.gif] Create an array that includes the strings ay, az, ba, bb, bc, bd, be, and bf. @array = ("ay" .. "bf"); When the second character is incremented to z, then the first character is incremented to b and the second character is set to a. Note If the right side of the range operator is greater than the left side, an empty array is created. The String Operators (. and x) Perl has two different string operators-the concatenation (.) operator and the repetition (x) operator. These operators make it easy to manipulate strings in certain ways. Let's start with the concatenation operator. Example: Using the Concatenation Operator The concatenation operator is used to join two strings together. If you have a numeric value as one of the two operands, Perl will quietly convert it to a string. Here is an example that shows Perl converting a number into a string. [pseudo.gif] Assign a string value to $firstVar. The string will be three values concatenated into one string. $firstVar = "This box can hold " . 55 . " items."; print("$firstVar\n"); The program produces the following output: This box can hold 55 items. The number 55 is automatically converted to a string and then combined with the other strings. Notice that the string literals have spaces in them so that when the final string is created, the number will be surrounded with spaces, making the sentence readable. You can also use variables as operands with the concatenation operator. [pseudo.gif] Assign string values to $firstVar and $secondVar. Assign the concatenation of $firstVar and $secondVar to $thirdVar. Print $thirdVar. $firstVar = "AAA"; $secondVar = "BBB"; $thirdVar = $firstVar . $secondVar; print("$thirdVar\n"); The program produces the following output AAABBB Notice that Perl concatenates the strings together without adding any spaces or other separating characters. If you want a space between the string after they are concatenated, you must ensure that one of original strings has the space character-either at the end of the first string or the start of the second. Example: Using the Repetition Operator The repetition operator is used to repeat any string a given number of times. Like the concatenation operator, any numbers will be quietly converted to strings so that they can be repeated. Here is an example that shows how to repeat a string 7 times. [pseudo.gif] Assign $firstVar the value of "1". Assign $secondVar the value of $firstVar repeated seven times. Print $secondVar. $firstVar = "1"; $secondVar = $firstVar x 7; print("$secondVar\n"); The program produces the following output: 1111111 The string that gets repeated can be longer than one character. [pseudo.gif] Assign $firstVar the value of "11 ". Assign $secondVar the value of $firstVar repeated seven times. Print $secondVar. $firstVar = "11 "; $secondVar = $firstVar x 7; print("$secondVar\n"); The program produces the following output: 11 11 11 11 11 11 11 You can also use the repetition operator on arrays or lists. However, the array gets evaluated in a scalar context so that the number of elements is returned. This number gets converted to a string and then repeated. [pseudo.gif] Assign the elements "A" through "G" to @array. Get the number of elements in @array, convert that number to a string, repeat it twice, and then assign the new string to $firstVar. Print the @array and $firstVar variables. @array = ('A'..'G'); $firstVar = @array x 2; print("@array\n"); print("$firstVar\n"); This program produces the following output: A B C D E F G 77 Tip If you want to repeat an array element, explicitly say which element you want to repeat, using an array index. The Assignment Operators The last type of operators that we'll look at are assignment operators. You've already used the basic assignment operator (=) to value variables in some of the examples earlier in this chapter. In addition, Perl has shortcut assignment operators that combine the basic assignment operator with another operator. For instance, instead of saying $firstVar = $firstVar + $secondVar you could say $firstVar += $secondVar. The advantage of the using shortcut operators-besides having less to type-is that your intentions regarding assignment are made clear. Table 4.11 lists all of Perl's assignment operators. After reading the other sections in this chapter about the various operator types, you should be familiar with all of the operations described in the table. Table 4.11 The Assignment Operators Operator Description var = op1; This operator assigns the value of op1 to var. var += op1; This operator assigns the value of var + op1 to var. var -= op1; This operator assigns the value of var - op1 to var. var *= op1; This operator assigns the value of var * op1 to var. var /= op1; This operator assigns the value of var / op1 to var. var %= op1; This operator assigns the value of var % op1 to var. var .= op1; This operator assigns the value of var . op1 to var. var **= op1; This operator assigns the value of var ** op1 to var. var x= op1; This operator assigns the value of var x op1 to var. var <<= op1; This operator assigns the value of var << op1 to var. var >>= op1; This operator assigns the value of var >> op1 to var. var &= op1; This operator assigns the value of var & op1 to var. var |= op1; This operator assigns the value of var | op1 to var. var ||= op1; This operator assigns the value of var || op1 to var. var ^= op1; This operator assigns the value of var ^ op1 to var. The examples in this section will not describe the different assignment operators. Their use is straightforward. However, when assigning values to arrays, there are some special situations. The first is assigning values to array slices and the second is assigning array elements to scalars. Let's start with array slices. Example: Assignment Using Array Slices If you recall from [79]Chapter 3 "Variables," array slices let you directly access multiple elements of an array using either the comma or range operators. For instance, the variable @array(10, 12) refers to both the tenth and the twelfth elements of the @array array. You can use the assignment operator in conjunction with array slices to assign values to multiple array elements in one statement. If you have an array with 10 elements and you need to change elements 4 and 7, you can do so like this: [pseudo.gif] Create an array with 10 elements. Assign values to elements 4 and 7. Print the array. @array = (0..10); @array[4, 7] = ("AA","BB"); print("@array\n"); This program produces the following output: 0 1 2 3 AA 5 6 BB 8 9 10 Tip The elements to which an array slice refers do not have to be in consecutive order. You can look at the array slice assignment in the following way. The array on the left is the target and the array on the right is the source. So, the target array gets assigned the values in the source array. There are a number of variations on the basic idea of using array slices in assignment statements. You can use scalar variables in place of the literals as operands for the range operator. [pseudo.gif] Create an array with 10 elements. Assign values to elements 4 and 7. Print the array. $firstVar = "AA"; @array = (0..10); @array[4, 7] = ($firstVar, $firstVar); print("@array\n"); This program produces the following output: 0 1 2 3 AA 5 6 AA 8 9 10 And you can use array variables, also. [pseudo.gif] Create an array with 10 elements and an array with 2 elements. Assign values to elements 4 and 7 of the @array1 array. Print @array1. @array1 = (0..10); @array2 = ("AA", "BB"); @array1[4, 7] = @array2; print("@array1\n"); This program produces the following output: 0 1 2 3 AA 5 6 BB 8 9 10 An array slice assignment is a quick and convenient way to swap two array elements from the same array. [pseudo.gif] Create an array with 10 elements. Swap elements 4 and 7. Print the array. @array = (0..10); @array[4, 7] = @array[7, 4]; print "@array\n"; This program produces the following output: 0 1 2 3 7 5 6 4 8 9 10 Notice that the 4th element and the 7th element have swapped places. You can also use the range operator when using array slice assignment. [pseudo.gif] Create an array with 10 elements. Assign the 23rd, 24th, and 25th elements from @array2 to @array1 as elements 0, 1, and 2. Print the array. @array1 = (0..10); @array2 = ("A".."Z"); @array1[1..3] = @array2[23..25]; print "@array1\n"; This program produces the following output: 0 X Y Z 4 5 6 7 8 9 10 Figure 4.2 shows a depiction of which array elements in @array2 are being assigned to which array elements in @array1. [80]Figure 4.2 : Assigning array elements using an array slice and the range operator. If you need only certain elements from an array, you can use the array slice to select a new array in one step. [pseudo.gif] Create an array with 10 elements. Assign the 2nd, 4th, and 6th elements from @array2 to @array1 as elements 0, 1, and 2. Print the arrays. @array1 = ("A".."J"); @array2 = @array1[2, 4, 6]; print("@array1\n"); print("@array2\n"); This program produces the following output: A B C D E F G H I J C E G Example: Assigning an Array to Scalar Variables At times, you may want to take array elements and assign them to scalar variables. The ability is especially useful inside functions and you'll read about that usage in [81]Chapter 5 "Functions." It's also useful when you want to make your code more readable. So, instead of referring to the 3rd element of an array as $array[3], you can refer to the value as $town or whatever variable name you use. In this next example, we'll take an array that holds an address and separate the elements into four scalar variables. [pseudo.gif] Create an array with Tom Jones' address. Assign each element of the array to a separate scalar variable. Print the scalar variables. @array = ("Tom Jones", "123 Harley Lane", "Birmingham", "AR"); ($name, $street, $town, $state) = @array; print("$name, $street, $town, $state\n"); This program prints: Tom Jones, 123 Harley Lane, Birmingham, AR The first element of @array is assigned to the first scalar on the left side of the assignment operator. Because the scalars are surrounded by parentheses, Perl sees them as another list. If you couldn't do this type of multiple array element to multiple scalar assignment, you would have to do this: @array = ("Tom Jones", "123 Harley Lane", "Birmingham", "AR"); $name = $array[0]; $street = $array[1]; $town = $array[2]; $state = $array[3]; print("$name, $street, $town, $state\n"); I think that the first example is easier to understand, don't you? If the array has more elements than scalars, the extra elements are ignored. Conversely, if there are not enough elements, some of the scalar variables will have an undefined value. Tip You can also use the array slice and range operators with this type of assignment. Order of Precedence We briefly touched on the order of precedence concept at the beginning of the chapter. Now that you are familiar with most of Perl's operators, we can explore the subject in more detail. Table 4.12 is an exhaustive list of operators and how they rank in terms of precedence-the higher the level, the higher their precedence. Operators at the same level have the same precedence and are evaluated from left to right. Otherwise, higher precedence levels are evaluated first. Perl uses associativity to decide which operators belong together. For instance, the unary minus operator has an associativity of right to left because it affects the operand immediately to its right. Table 4.12 The Order of Precedence and Associativity for Perl Operators Level Operator Description Associativity 22 (), [], {} Function Calls, Parentheses, Array subscripts Left to right 21 -> Infix dereference Operator Left to right 20 ++, -- Auto increment, Auto decrement None 19 ** Exponentiation Right to left 18 !, ~, +,+, -,\ Logical not, bitwise not, unary plus, unary minus, reference Right to left 17 =~, !~ Match, Not match Left to right 16 *, /, % x Multiply, Divide, Modulus, Repetition Left to right 15 +, -,. Add, Subtract, String concatenation Left to right 14 <<,>> Bitwise left shift, Bitwise right shift Left to right 13 File test Operators None 12 Relational Operators None 11 Equality Operators None 10 & Bitwise and Left to right 9 |, ^ Bitwise or, Bitwise xor Left to right 8 && Logical and Left to right 7 || Logical or Left to right 6 .. Range Operator None 5 ?: Ternary or conditional Operator Right to left 4 Assignment Operators Right to left 3 , Comma Operator Left to right 2 not Low precedence logical Operators Left to right 1 and Low precedence logical Operators Left to right 0 or, xor Low precedence logical Operators Left to right Operators that are not discussed in this chapter are discussed elsewhere in this book. Table 4.1, at the beginning of the chapter, points out where you can get more information on those operators. In addition, you can read about the low precedence logical operators in [82]Chapter 13, "Handling Errors and Signals." Example: Order of Precedence While it is not possible to show examples of all the ramifications of operator precedence, we can look at one or two so that you can get a feel for the concept. First, an example using the ternary operator and various arithmetic operators: [pseudo.gif] Assign values to $firstVar and $secondVar. Assign either a 1 or 0 to $totalPages based on the evaluation of the condi-tion 34 + $firstVar-- + $secondVar ? 1 : 0. Print $totalPages. $firstVar = 4; $secondVar = 1; $thirdVar = 34 + $firstVar-- + $secondVar ? 1 : 0; print("$thirdVar\n"); The program produces the following output: 1 The ternary operator has a precedence level of 5; every other operator has a higher precedence level and will be evaluated first. [pseudo.gif] Assign values to $firstVar and $secondVar. Assign either a 1 or 0 to $thirdVar based on the evaluation of the condition 34 + $firstVar-- + ($secondVar ? 1 : 0). Print $thirdVar. $firstVar = 4; $secondVar = 1; $thirdVar = 34 + $firstVar-- + ($secondVar ? 1 : 0); print "$thirdVar\n"; The program produces the following output: 39 This program results in a value of 39 for $thirdVar because the parentheses operators have a precedence level of 22. They serve to isolate regions of the statements and tell Perl to evaluate the stuff inside before evaluating the rest of the statement. Caution Remember that these examples are contrived to show a point. I don't program in this manner. I recommend using parentheses to tell Perl exactly how you want your code to be evaluated. So, I would normally do the following: $thirdVar = 34 + $firstVar + ($secondVar ? 1 : 0); $firstVar--; The decrementing of $firstVar has been pulled out of the first line because using the post-decrement operator has no effect on the first line and makes it harder to understand. Here is a example of operator precedence using the exponentiation operator. This also shows you how to determine operator precedence on your own. [pseudo.gif] Assign an expression to $firstVar. Assign an expression to $secondVar using parentheses to indicate a preferred precedence order. Assign an expression to $thirdVar using parentheses in a different manner to indicate a preferred precedence order. Print the variables. $firstVar = -2 ** 4; $secondVar = -(2 ** 4); $thirdVar = (-2) ** 4; print "$firstVar\n"; print "$secondVar\n"; print "$thirdVar\n"; The program produces the following output: -16 -16 16 From this example, you can see the precedence level for exponentiation is higher than unary minus because the first and second variables are equal. Tip If you always use parentheses to indicate how you want the operators to be evaluated, you'll never need to worry about operator precedence in your code. Summary This chapter was pretty long and you've seen quite a few examples of how operators can be used. Let's review. You learned that operators are used to telling Perl what actions to perform. Some operators take precedence over others so that they and their operands will be evaluated first. An operand can be as simple as the number 10 or very complex-involving variables, literals, and other operators. This means that they are recursive in nature. Perl has many different types of operators: arithmetic, assignment, binding, bitwise, comma, file test, list, logical, postfix, range, reference, relational (both numeric and string), string, and ternary. Most of these operator types were discussed in this chapter, and the rest are scattered throughout the rest of the book. Table 4.1 lists the chapters where more information can be found on those operators not covered in this chapter. The bulk of the chapter talked about various types of operators. Starting with binary arithmetic operators, and then unary arithmetic operators. You were introduced to the pre- and post-increment and pre- and post-decrement operators. Next, came the logical operators and the bitwise operators. Sometimes, the bitwise shift operators are used when fast integer multiplication and division are needed. Then, came numeric and string relational operators, followed by the ternary operator. The ternary operator was used to show you what an lvalue is. An lvalue is the value on the left side of an assignment operator. It must evaluate to some variable that Perl can use to hold a value. The range operator was used to create sequential elements of an array, the concatenation operator was used to join two strings together, and the string repetition operator was used to repeat a string a given number of times. Then, you looked at the list of assignment operators. Most were shortcuts to reduce typing and clarify the meaning of the assignment. Finally, you saw a detailed list of Perl's operators and their order of precedence. Several examples were given to illustrate how precedence worked. My recommendation is to use parentheses to explicitly tell Perl how and in which order to evaluate operators. The next chapter, "Functions," will look at how functions and list operators are the same thing. You will be introduced to subroutines and parameters. Review Questions Answers to Questions are in Appendix A. 1. What are three arithmetic operators? 2. What does the x operator do? 3. What does it mean to pre-decrement a variable? 4. What is the value of 1 ^ 1? 5. What is the value of 1 << 3? 6. What is the ternary operator used for? 7. Can the x operator be used with arrays? 8. What is the precedence level of the range operator? 9. What is the value of 2 ¥ 5 + 10? 10. What is the value of 65 >> 1? 11. What is the spaceship operator used for? 12. If an array were defined with ("fy".."gb"), what would its elements be? Review Exercises 1. Assign a value to $firstVar, using both division and subtraction. 2. Using the post-decrement operator, subtract one from $firstVar. 3. Write a program that assigns values to $firstVar and $secondVar and uses the >= operator to test their relationship to each other. Print the resulting value. 4. Use the **= assignment operator to assign a value to $firstVar. 5. Use the ternary operator to decide between two different values. 6. Write a program that assigns values to $firstVar and $secondVar and uses the <=> operator to test their relationship to each other. Print the resulting value. 7. Use the concatenation operator to join the following values together: "A" x 4 and "B" x 3. 8. Use the exponentiation operator to find the value of 2 to the 5th power. 9. Write an assignment statement that uses the && and || and ! operators. 10. Write a program that prints the value of the fifth bit from the right in a scalar variable. 11. Write a program that uses a bitwise assignment to set the fifth bit from the right in a scalar variable. 12. Write a program that shows the difference in operator precedence between the % operator and the && operator. ____________________________________________________________________________________________________________________ [83][pc.gif] [84][cc.gif] [85][hb.gif] [86][nc.gif] _________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________ Chapter 5 Functions ____________________________________________________________________________________________________________________ CONTENTS * [28]Example: Using the Parameter Array (@_) * [29]Example: Passing Parameters by Reference * [30]Example: Scope of Variables * [31]Example: Using a List as a Function Parameter * [32]Example: Nesting Function Calls * [33]Example: Using a Private Function [34]String Functions * [35]Example: Changing a String's Value * [36]Example: Searching a String [37]Array Functions * [38]Example: Printing an Associative Array * [39]Example: Checking the Existence of an Element [40]Summary [41]Review Questions [42]Review Exercises ____________________________________________________________________________________________________________________ This chapter takes a look at functions. Functions are blocks of codes that are given names so that you can use them as needed. Functions help you to organize your code into pieces that are easy to understand and work with. They let you build your program step by step, testing the code along the way. After you get the idea for a program, you need to develop a program outline-either in your head or on paper. Each step in the outline might be one function in your program. This is called modular programming. Modular programming is very good at allowing you to hide the details so that readers of your source code can understand the overall aim of your program. For instance, if your program has a function that calculates the area of a circle, the following line of code might be used to call it: $areaOfFirstCircle = areaOfCircle($firstRadius); By looking at the function call, the reader knows what the program is doing. Detailed understanding of the actual function is not needed. Tip Well thought out function and variable names help people to understand your program. If the line of code was $areaFC = areaCirc($fRad); its meaning would not be as clear. Note Calling a function means that Perl stops executing the current series of program lines. Program flow jumps into the program code inside the function. When the function is finished, Perl jumps back to the point at which the function call was made. Program execution continues from that point onward. Let's look at the function call a little closer. The first thing on the line is a scalar variable and an assignment operator. You already know this means Perl assigns the value on the right of the assignment operator to $areaOfFirstCircle. But, what exactly is on the right? The first thing you see is the function name areaOfCircle(). The parentheses directly to the right and no $, @, or % beginning the name indicates that this is a function call. Inside the parentheses is a list of parameters or values that get passed to the function. You can think of a parameter just like a football. When passed, the receiver (for example, the function) has several options: run (modify it in some way), pass (call other routines), fumble (call the error handler). Note Perl enables you to use the & character to start function names, and in a few cases it is needed. Those few situations that the & character is needed are beyond the scope of this book. Listing 5.1 shows a short program that calls and defines the areaOfCircle() function. [pseudo.gif] Assign $areaOfFirstCircle the value that is returned by the functionareaOfCircle(). Print $areaOfFirstCircle. Define the areaOfCircle() function. Get the first parameter from the @_ parameter array. Calculate the area and return the new value. ____________________________________________________________________________________________________________________ Listing 5.1 05LST01.PL-Calculating the Area of a Circle $areaOfFirstCircle = areaOfCircle(5); print("$areaOfFirstCircle\n"); sub areaOfCircle { $radius = $_[0]; return(3.1415 * ($radius ** 2)); } ____________________________________________________________________________________________________________________ This program prints: 78.7375 The fact that something prints tells you that the program flow returned to the print line after calling the areaOfCircle() function. A function definition is very simple. It consists of: sub functionName { } That's it. Perl function definitions never get any more complex. The complicated part comes when dealing with parameters. Parameters are values passed to the function (remember the football?). The parameters are specified inside the parentheses that immediately follow the function name. In Listing 5.1, the function call was areaOfCircle(5). There was only one parameter, the number 5. Even though there is only one parameter, Perl creates a parameter array for the function to use. Inside the areaOfCircle() function, the parameter array is named @_. All parameters specified during the function call are stored in the @_ array so that the function can retrieve them. Our small function did this with the line: $radius = $_[0]; This line of code assigns the first element of the @_ array to the $radius scalar. Note Because parameters always are passed as lists, Perl functions also are referred to as list operators. And, if only one parameter is used, they are sometimes referred to as unary operators. However, I'll continue to call them functions and leave the finer points of distinction to others. The next line of the function: return(3.1415 * ($radius ** 2)); calculates the circle's area and returns the newly calculated value. In this case, the returning value is assigned to the $areaOfFirstCircle scalar variable. Note If you prefer, you don't need to use the return() function to return a value because Perl automatically returns the value of the last expression evaluated. I prefer to use the return() function and be explicit so that there is no mistaking my intention. You may have used programming languages that distinguish between a function and a subroutine, the difference being that a function returns a value and a subroutine does not. Perl makes no such distinctions. Everything is a function-whether or not it returns a value. Example: Using the Parameter Array (@_) All parameters to a function are stored in an array called @_. One side effect of this is that you can find out how many parameters were passed by evaluating @ in a scalar context. [pseudo.gif] Call the firstSub() function with a variety of parameters. Define the firstSub() function Assign $numParameters the number of elements in the array @_. Print out how any parameters were passed. firstSub(1, 2, 3, 4, 5, 6); firstSub(1..3); firstSub("A".."Z"); sub firstSub { $numParameters = @_ ; print("The number of parameters is $numParameters\n"); } This program prints out: The number of parameters is 6 The number of parameters is 3 The number of parameters is 26 Perl lets you pass any number of parameters to a function. The function decides which parameters to use and in what order. The @_ array is used like any other array. Let's say that you want to use scalar variables to reference the parameters so you don't have to use the clumsy and uninformative $_ [0] array element notation. By using the assignment operator, you can assign array elements to scalars in one easy step. [pseudo.gif] Call the areaOfRectangle() function with varying parameters. Define the areaOfRectangle() function. Assign the first two elements of @_ to $height and $width respectively. Calculate the area. Print the three variables: $height, $width, and $area. areaOfRectangle(2, 3); areaOfRectangle(5, 6); sub areaOfRectangle { ($height, $width) = @_ ; $area = $height * $width; print("The height is $height. The width is $width. The area is $area.\n\n"); } This program prints out: The height is 2. The width is 3. The area is 6. The height is 5. The width is 6. The area is 30. The statement ($height,$width) = @_; does the array element to scalar assignment. The first element is assigned to $height, and the second element is assigned to $width. After the assignment is made, you can use the scalar variables to represent the parameters. Example: Passing Parameters by Reference Using scalar variables inside your functions is a good idea for another reason-besides simple readability concerns. When you change the value of the elements of the @ array, you also change the value of the parameters in the rest of the program. This is because Perl parameters are called by reference. When parameters are called by reference, changing their value in the function also changes their value in the main program. Listing 5.2 shows how this happens. [pseudo.gif] Create an array with 6 elements. Print the elements of the array. Call the firstSub() function. Print the elements of the array. Define the firstSub() function. Change the values of the first two elements of @_. ____________________________________________________________________________________________________________________ Listing 5.2 05LST02.PL-Using the @Array to Show Call by Reference @array = (0..5); print("Before function call, array = @array\n"); firstSub(@array); print("After function call, array = @array\n"); sub firstSub{ $_[0] = "A"; $_[1] = "B"; } ____________________________________________________________________________________________________________________ This program prints: Before function call, array = 0 1 2 3 4 5 After function call, array = A B 2 3 4 5 You can see that the function was able to affect the @array variable in the main program. Generally, this is considered bad programming practice because it does not isolate what the function does from the rest of the program. If you change the function so that scalars are used inside the function, this problem goes away. List-ing 5.3 shows how to redo the program in Listing 5.2 so scalars are used inside the function. [pseudo.gif] Create an array with 6 elements. Print the elements of the array. Call the firstSub() function. Print the elements of the array. Define the firstSub() function. Assign the first two elements of @_ to $firstVar and $secondVar. Change the values of the scalar variables. ____________________________________________________________________________________________________________________ Listing 5.3 05LST03.PL-Using Scalars Instead of the @_ Array Inside Functions @array = (0..5); print("Before function call, array = @array\n"); firstSub(@array); print("After function call, array = @array\n"); sub firstSub{ ($firstVar, $secondVar) = @_ ; $firstVar = "A"; $secondVar = "B"; } ____________________________________________________________________________________________________________________ This program prints: Before function call, array = 0 1 2 3 4 5 After function call, array = 0 1 2 3 4 5 This example shows that the original @array variable is left untouched. However, another problem has quietly arisen. Let's change the program a little so the values of $firstVar are printed before and after the function call. Listing 5.4 shows how changing a variable in the function affects the main program. [pseudo.gif] Assign a value to $firstVar. Create an array with 6 elements. Print the elements of the array. Call the firstSub() function. Print the elements of the array. Define the firstSub() function. Assign the first two elements of @_ to $firstVar and $secondVar. Change the values of the scalar variables. ____________________________________________________________________________________________________________________ Listing 5.4 05LST04.PL-Using Variables in Functions Can Cause Unexpected Results $firstVar = 10; @array = (0..5); print("Before function call\n"); print("\tfirstVar = $firstVar\n"); print("\tarray = @array\n"); firstSub(@array); print("After function call\n"); print("\tfirstVar = $firstVar\n"); print("\tarray = @array\n"); sub firstSub{ ($firstVar, $secondVar) = @_ ; $firstVar = "A"; $secondVar = "B"; } ____________________________________________________________________________________________________________________ This program prints: Before function call firstVar = 10 array = 0 1 2 3 4 5 After function call firstVar = A array = 0 1 2 3 4 5 By using the $firstVar variable in the function you also change its value in the main program. By default, all Perl variables are accessible everywhere inside a program. This ability to globally access variables can be a good thing at times. It does help when trying to isolate a function from the rest of your program. The next section shows you how to create variables that can only be used inside functions. Example: Scope of Variables Scope refers to the visibility of variables. In other words, which parts of your program can see or use it. Normally, every variable has a global scope. Once defined, every part of your program can access a variable. It is very useful to be able to limit a variable's scope to a single function. In other words, the variable wil have a limited scope. This way, changes inside the function can't affect the main program in unexpected ways. Listing 5.5 introduces two of Perl's built-in functions that create variables of limited scope. The my() function creates a variable that only the current function can see. The local() function creates a variable that functions the current function calls can see. If that sounds confusing, don't worry. It is confusing; but, Listing 5.5 should clear things up. In this case, it's a listing that is worth a thousand words, not a picture! [pseudo.gif] Call firstSub() with a two parameters. Define the firstSub() function. Assign the first parameter to local variable $firstVar. Assign the second parameter to my variable $secondVar. Print the variables. Call the second function without any parameters. Print the variables to see what changed. Define the secondSub() function. Print the variables. Assign new values to the variables. Print the variables to see that the new values were assigned correctly. ____________________________________________________________________________________________________________________ Listing 5.5 05LST05.PL-Using the Local and My Functions to Create Local Variables firstSub("AAAAA", "BBBBB"); sub firstSub{ local ($firstVar) = $_[0]; my($secondVar) = $_[1]; print("firstSub: firstVar = $firstVar\n"); print("firstSub: secondVar = $secondVar\n\n"); secondSub(); print("firstSub: firstVar = $firstVar\n"); print("firstSub: secondVar = $secondVar\n\n"); } sub secondSub{ print("secondSub: firstVar = $firstVar\n"); print("secondSub: secondVar = $secondVar\n\n"); $firstVar = "ccccC"; $secondVar = "DDDDD"; print("secondSub: firstVar = $firstVar\n"); print("secondSub: secondVar = $secondVar\n\n"); } ____________________________________________________________________________________________________________________ This program prints: firstSub: firstVar = AAAAA firstSub: secondVar = BBBBB secondSub: firstVar = AAAAA Use of uninitialized value at test.pl line 19. secondSub: secondVar = secondSub: firstVar = ccccC secondSub: secondVar = DDDDD firstSub: firstVar = ccccC firstSub: secondVar = BBBBB The output from this example shows that secondSub() could not access the $secondVar variable that was created with my() inside firstSub(). Perl even prints out an error message that warns about the uninitialized value. The $firstVar variable, however, can be accessed and valued by secondSub(). Tip It's generally a better idea to use my() instead of local() so that you can tightly control the scope of local variables. Think about it this way-it's 4:00 in the morning and the project is due. Is that the time to be checking variable scope? No. Using my()enforces good programming practices and reduces headaches. Actually, the my() function is even more complex than I've said. The easy definition is that it creates variables that only the current function can see. The true definition is that it creates variables with lexical scope. This distinction is only important when creating modules or objects, so let's ignore the complicated definition for now. You'll hear more about it in [43]Chapter 15, "Perl Modules." If you remember, I mentioned calling parameters by reference. Passing parameters by reference means that functions can change the variable's value, and the main program sees the change. When local() is used in conjunction with assigning the @_ array elements to scalars, then the parameters are essentially being called by value. The function can change the value of the variable, but only the function is affected. The rest of the program sees the old value. Example: Using a List as a Function Parameter Now that you understand about the scope of variables, let's take another look at parameters. Because all parameters are passed to a function in one array, what if you need to pass both a scalar and an array to the same function? This next example shows you what happens. [pseudo.gif] Call the firstSub() function with two parameters: a list and a scalar. Define the firstSub() function. Assign the elements of the @_ array to @array and $firstVar. Print @array and $firstVar. firstSub((0..10), "AAAA"); sub firstSub{ local(@array, $firstVar) = @_ ; print("firstSub: array = @array\n"); print("firstSub: firstVar = $firstVar\n"); } This program prints: firstSub: array = 0 1 2 3 4 5 6 7 8 9 10 AAAA Use of uninitialized value at test.pl line 8. firstSub: firstVar = When the local variables are initialized, the @array variables grab all of the elements in the @ array, leaving none for the scalar variable. This results in the uninitialized value message displayed in the output. You can fix this by merely reversing the order of parameters. If the scalar value comes first, then the function processes the parameters without a problem. [pseudo.gif] Call the firstSub() function with two parameters: a scalar and a list. Define the firstSub() function. Assign the elements of the @_ array to $firstVar and @array. Print @array and $firstVar. firstSub("AAAA", (0..10)); sub firstSub{ local($firstVar, @array) = @_ ; print("firstSub: array = @array\n"); print("firstSub: firstVar = $firstVar\n"); } This program prints: firstSub: array = 0 1 2 3 4 5 6 7 8 9 10 firstSub: firstVar = AAAA Note You can pass as many scalar values as you want to a function, but only one array. If you try to pass more than one array, the array elements become joined together and passed as one array to the function. Your function won't be able to tell when one array starts and another ends. Example: Nesting Function Calls Function calls can be nested many levels deep. Nested function calls simply means that one function can call another which in turn can call another. Exactly how many levels you can nest depends on which version of Perl you are running and how your machine is configured. Normally, you don't have to worry about it. If you want to see how many levels your system can recurse, try the following small program: [pseudo.gif] Call the firstSub() function. Define the firstSub() function. Print $count Increment $count by one. Call the firstSub() function recursively. firstSub(); sub firstSub{ print("$count\n"); $count++; firstSub(); } My system counts up to 127 before displaying the following message: Error: Runtime exception While it is important to realize that there is a limit to the number of times your program can nest functions, you should never run into this limitation unless you are working with recursive mathematical functions. Example: Using a Private Function Occasionally, you might want to create a private function. A private function is one that is only available inside the scope where it was defined. [pseudo.gif] Assign the return value from performCalc() to $temp. Print $temp. Define the performCalc() function. Assign my scalar variables values from the @_ parameter array. Define the private function referred to by $square. Return the first element of the @_ parameter array raised to the 2nd power. Return the value of $firstVar raised to the 2nd power and $secondVar raised to the 2nd power. $temp = performCalc(10, 10); print("temp = $temp\n"); sub performCalc { my ($firstVar, $secondVar) = @_; my $square = sub { return($_[0] ** 2); }; return(&$square($firstVar) + &$square($secondVar)); }; This program prints: temp = 200 This example is rather trivial, but it serves to show that in Perl it pays to create little helper routines. A fine line needs to be drawn between what should be included as a private function and what shouldn't. I would draw the line at 5 or 6 lines of code. Anything longer probably should be made into its own function. I would also say that a private function should have only one purpose for existence. Performing a calculation and then opening a file is too much functionality for a single private function to have. The rest of the chapter is devoted to showing you some of the built-in functions of Perl. These little nuggets of functionality will become part of your arsenal of programming weapons. String Functions The first set of functions that we'll look at are those that deal with strings. These functions let you determine a string's length, search for a sub-string, and change the case of the characters in the string, among other things. Table 5.1 shows Perl's string functions. Table 5.1 String Functions Function Description chomp(STRING) OR chomp(ARRAY) Uses the value of the $/ special variable to remove endings from STRING or each element of ARRAY. The line ending is only removed if it matches the current value of $/. chop(STRING) OR chop(ARRAY) Removes the last character from a string or the last character from every element in an array. The last character chopped is returned. chr(NUMBER) Returns the character represented by NUMBER in the ASCII table. For instance, chr(65) returns the letter A. For more information about the ASCII table see Appendix E, "ASCII Table." crypt(STRING1, STRING2) Encrypts STRING1. Unfortunately, Perl does not provide a decrypt function. index(STRING, SUBSTRING, POSITION) Returns the position of the first occurrence of SUBSTRING in STRING at or after POSITION. If you don't specify POSITION, the search starts at the beginning of STRING. join(STRING, ARRAY) Returns a string that consists of all of the elements of ARRAY joined together by STRING. For instance, join(">>", ("AA", "BB", "cc")) returns "AA>>BB>>cc". lc(STRING) Returns a string with every letter of STRING in lowercase. For instance, lc("ABCD") returns "abcd". lcfirst(STRING) Returns a string with the first letter of STRING in lowercase. For instance, lcfirst("ABCD") returns "aBCD". length(STRING) Returns the length of STRING. rindex(STRING, SUBSTRING, POSITION) Returns the position of the last occurrence of SUBSTRING in STRING at or after POSITION. If you don't specify POSITION, the search starts at the end of STRING. split(PATTERN, STRING, LIMIT) Breaks up a string based on some delimiter. In an array context, it returns a list of the things that were found. In a scalar context, it returns the number of things found. substr(STRING, OFFSET, LENGTH) Returns a portion of STRING as determined by the OFFSET and LENGTH parameters. If LENGTH is not specified, then everything from OFFSET to the end of STRING is returned. A negative OFFSET can be used to start from the right side of STRING. uc(STRING) Returns a string with every letter of STRING in uppercase. For instance, uc("abcd") returns "ABCD". Ucfirst(STRING) Returns a string with the first letter of STRING in uppercase. For instance, ucfirst("abcd") returns "Abcd". Note As a general rule, if Perl sees a number where it expects a string, the number is quietly converted to a string without your needing to do anything. Note Some of these functions use the special variable $_ as the default string to work with. More information about $_ can be found in [44]Chapter 9 "Using Files," and [45]Chapter 12, "Using Special Variables." The next few sections demonstrate some of these functions. After seeing some of them work, you'll be able to use the rest of them. Example: Changing a String's Value Frequently, I find that I need to change part of a string's value, usually somewhere in the middle of the string. When this need arises, I turn to the substr() function. Normally, the substr() function returns a sub-string based on three parameters: the string to use, the position to start at, and the length of the string to return. [pseudo.gif] Assign $firstVar the return value from substr(). Print $firstVar. $firstVar = substr("0123BBB789", 4, 3); print("firstVar = $firstVar\n"); This program prints: firstVar = BBB The substr() function starts at the fifth position and returns the next three characters. The returned string can be printed like in the above example, as an array element, for string concatention, or any of a hundred other options. Things become more interesting when you put the substr() function on the left-hand side of the assignment statement. Then, you actually can assign a value to the string that substr() returns. [pseudo.gif] Initialize $firstVar with a string literal. Replace the string returned by the substr() function with "AAA". Print $firstVar. $firstVar = "0123BBB789"; substr($firstVar, 4, 3) = "AAA"; print("firstVar = $firstVar\n"); This program prints: firstVar = 0123AAA789 Example: Searching a String Another useful thing you can do with strings is search them to see if they have a given sub-string. For example if you have a full path name such as "C:\\WINDOWS \\TEMP\\WSREWE.DAT", you might need to extract the file name at the end of the path. You might do this by searching for the last backslash and then using substr() to return the sub-string. Note The path name string has double backslashes to indicate to Perl that we really want a backslash in the string and not some other escape sequence. You can read more about escape sequences in [46]Chapter 2 "Numeric and String Literals." [pseudo.gif] Assign a string literal to $pathName. Find the location of the last backslash by starting at the end of the string and working backward using the rindex() function. When the position of the last backslash is found, add one to it so that $position points at the first character ("W") of the file name. Use the substr() function to extract the file name and assign it to $fileName. Print $fileName. $pathName = "C:\\WINDOWS\\TEMP\\WSREWE.DAT"; $position = rindex($pathName, "\\") + 1; $fileName = substr($pathName, $position); print("$fileName\n"); This program prints: WSREWE.DAT If the third parameter-the length-is not supplied to substr(), it simply returns the sub-string that starts at the position specified by the second parameter and continues until the end of the string specified by the first parameter. Array Functions Arrays are a big part of the Perl language and Perl has a lot of functions to help you work with them. Some of the actions arrays perform include deleting elements, checking for the existence of an element, reversing all of the the elements in an array, and sorting the elements. Table 5.2 lists the functions you can use with arrays. Table 5.2 Array Functions Function Description defined(VARIABLE) Returns true if VARIABLE has a real value and if the variable has not yet been assigned a value. This is not limited to arrays; any data type can be checked. Also see the exists function for information about associative array keys. delete(KEY) Removes the key-value pair from the given associative array. If you delete a value from the %ENV array, the environment of the current process is changed, not that of the parent. each(ASSOC_ARRAY) Returns a two-element list that contains a key and value pair from the given associative array. The function is mainly used so you can iterate over the associate array elements. A null list is returned when the last element has been read. exists(KEY) Returns true if the KEY is part of the specified associative array. For instance, exists($array{"Orange"}) returns true if the %array associative array has a key with the value of "Orange." join(STRING, ARRAY) Returns a string that consists of all of the elements of ARRAY joined together by STRING. For instance, join(">>", ("AA", "BB", "cc")) returns "AA>>BB>>cc". keys(ASSOC_ARRAY) Returns a list that holds all of the keys in a given associative array. The list is not in any particular order. map(EXPRESSION, ARRAY) Evaluates EXPRESSION for every element of ARRAY. The special variable $ is assigned each element of ARRAY immediately before EXPRESSION is evaluated. pack(STRING, ARRAY) Creates a binary structure, using STRING as a guide, of the elements of ARRAY. You can look in [47]Chapter 8 "References," for more information. pop(ARRAY) Returns the last value of an array. It also reduces the size of the array by one. push(ARRAY1, ARRAY2) Appends the contents of ARRAY2 to ARRAY1. This increases the size of ARRAY1 as needed. reverse(ARRAY) Reverses the elements of a given array when used in an array context. When used in a scalar context, the array is converted to a string, and the string is reversed. scalar(ARRAY) Evaluates the array in a scalar context and returns the number of elements in the array. shift(ARRAY) Returns the first value of an array. It also reduces the size of the array by one. sort(ARRAY) Returns a list containing the elements of ARRAY in sorted order. See [48]Chapter 8 "References," for more information. splice(ARRAY1, OFFSET, Replaces elements of ARRAY1 with elements LENGTH, ARRAY2) in ARRAY2. It returns a list holding any elements that were removed. Remember that the $[ variable may change the base array subscript when determining the OFFSET value. split(PATTERN, STRING, LIMIT) Breaks up a string based on some delimiter. In an array context, it returns a list of the things that were found. In a scalar context, it returns the number of things found. undef(VARIABLE) Always returns the undefined value. In addition, it undefines VARIABLE, which must be a scalar, an entire array, or a subroutine name. unpack(STRING, ARRAY) Does the opposite of pack(). unshift(ARRAY1, ARRAY2) Adds the elements of ARRAY2 to the front of ARRAY1. Note that the added elements retain their original order. The size of the new ARRAY1 is returned. values(ASSOC_ARRAY) Returns a list that holds all of the values in a given associative array. The list is not in any particular order. As with the string functions, only a few of these functions will be explored. Once you see the examples, you'll be able to handle the rest with no trouble. Example: Printing an Associative Array The each() function returns key, value pairs of an associative array one-by-one in a list. This is called iterating over the elements of the array. Iteration is a synonym for looping. So, you also could say that the each() function starts at the beginning of an array and loops through each element until the end of the array is reached. This ability lets you work with key, value pairs in a quick easy manner. The each() function does not loop by itself. It needs a little help from some Perl control statements. For this example, we'll use the while loop to print an associative array. The while (CONDITION) {} control statement continues to execute any program code surrounded by the curly braces until the CONDITION turns false. [pseudo.gif] Create an associative with number, color pairs. Using a while loop, iterate over the array elements. Print the key, value pair. %array = ( "100", "Green", "200", "Orange"); while (($key, $value) = each(%array)) { print("$key = $value\n"); } This program prints: 100 = Green 200 = Orange The each() function returns false when the end of the array is reached. Therefore, you can use it as the basis of the while's condition. When the end of the array is reached, the program continues execution after the closing curly brace. In this example, the program simply ends. Example: Checking the Existence of an Element You can use the defined() function to check if an array element exists before you assign a value to it. This ability is very handy if you are reading values from a disk file and don't want to overlay values already in memory. For instance, suppose you have a disk file of customers' addresses and you would like to know if any of them are duplicates. You check for duplicates by reading the information one address at a time and assigning the address to an associative array using the customer name as the key value. If the customer name already exists as a key value, then that address should be flagged for follow up. Because we haven't talked about disk files yet, we'll need to emulate a disk file with an associative array. And, instead of using customer's address, we'll use customer number and customer name pairs. First, we see what happens when an associative array is created and two values have the same keys. [pseudo.gif] Call the createPair() function three times to create three key, value pairs in the %array associative array. Loop through %array, printing each key, value pair. Define the createPair() function. Create local variables to hold the key, value pair passed as parameters. Create an array element to hold the key, value pair. createPair("100", "Kathy Jones"); createPair("200", "Grace Kelly"); createPair("100", "George Orwell"); while (($key, $value) = each %array) { print("$key, $value\n"); }; sub createPair{ my($key, $value) = @_ ; $array{$key} = $value; }; This program prints: 100, George Orwell 200, Grace Kelly This example takes advantages of the global nature of variables. Even though the %array element is set in the createPair() function, the array is still accessible by the main program. Notice that the first key, value pair (100 and Kathy Jones) are overwritten when the third key, value pair is encountered. You can see that it is a good idea to be able to determine when an associative array element is already defined so that duplicate entries can be handled. The next program does this. [pseudo.gif] Call the createPair() function three times to create three key, value pairs in the %array associative array. Loop through %array, printing each key, value pair. Define the createPair() function. Create local variables to hold the key, value pair passed as parameters. If the key, value pair already exists in %array, then increment the customer number by one. Check to see if the new key, value pair exists. If so, keep incrementing until a nonexistent key, value pair is found. Create an array element to hold the key, value pair. createPair("100", "Kathy Jones"); createPair("200", "Grace Kelly"); createPair("100", "George Orwell"); while (($key, $value) = each %array) { print("$key, $value\n"); }; sub createPair{ my($key, $value) = @_ ; while (defined($array{$key})) { $key++; } $array{$key} = $value; }; This program prints: 100, George Orwell 101, Kathy Jones 200, Grace Kelly You can see that the customer number for Kathy Jones has been changed to 101. If the array had already had an entry for 101, the Kathy Jones' new customer number would be 102. Summary In this chapter you've learned about functions-what they are and how to call them. You saw that you can create your own function or use one of Perl's many built-in functions. Each function can accept any number of parameters which get delivered to the function in the form of the @ array. This array, like any other array in Perl, can be accessed using the array element to access an individual element. ( For instance, $_ [0] accesses the first element in the @ array.) Because Perl parameters are passed by reference, changing the @ array changes the values in the main program as well as the function. You learned about the scope of variables and how all variables are global by default. Then, you saw how to create variable with local scope using local() and my(). My() is the better choice in almost all situations because it enforces local scope and limits side effects from function to inside the functions. Then you saw that it was possible to nest function calls, which means that one function can call another, which in turn can call another. You also might call this a chain of function calls. Private functions were introduced next. A private function is one that only can be used inside the function that defines it. A list of string functions then was presented. These included functions to remove the last character, encrypt a string, find a sub-string, convert array elements into a string, change the case of a string character, and find the length of a string. Examples were shown about how to change a string's characters and how to search a string. The section on array functions showed that Perl has a large number of functions that deal specifically with arrays. The list of functions included the ability to delete elements, return key, value pairs from associative arrays, reverse an array's elements, and sort an array. Examples were shown for printing an associative array and checking for the existence of an element. The next chapter, "Statements," goes into detail about what statements are and how you create them. The information that you learned about variables and functions will come into play. You'll see how to link variables and functions together to form expressions and statements. Review Questions Answers to Review Questions are in Appendix A. 1. What is a parameter? 2. What two functions are used to create variables with local scope? 3. What does parameter passing by reference mean? 4. What is the @_ array used for? 5. Do Perl variables have global or local scope by default? 6. Why is it hard to pass two arrays to a function? 7. What is the difference between variables created with local() and variables created with my()? 8. What does the map() function do? Review Exercises 1. Create a function that prints its own parameter list. 2. Create a program that uses three functions to demonstrate function call nesting. 3. Use the chop() function in a program. Print both the returned character and the string that was passed as a parameter. 4. Run the following program to see how many levels of recursion your system configuration supports: firstSub(); sub firstSub{ print("$count\n"); $count++; firstSub(); } 5. Write a function that uses the substr() and uc() functions to change the tenth through twentieth characters to uppercase. 6. Write a function that uses the keys() function to print out the values of an associative array. 7. Create a program that uses a private function to subtract two numbers and multiply the result by four. 8. Write a program that shows what the shift() and unshift() functions do. 9. Write a program that shows what the push() and pop() functions do. ____________________________________________________________________________________________________________________ [49][pc.gif] [50][cc.gif] [51][hb.gif] [52][nc.gif] _________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________ Chapter 6 Statements ____________________________________________________________________________________________________________________ CONTENTS * [28]Understanding Expressions * [29]Statement Blocks * [30]Statement Blocks and Local Variables * [31]Statement Types + [32]Example: Using the if Modifier + [33]Example: Using the unless Modifier + [34]Example: Using the until Modifier + [35]Example: Using the while Modifier * [36]Summary * [37]Review Questions * [38]Review Exercises ____________________________________________________________________________________________________________________ If you look at a Perl program from a very high level, it is made of statements. Statements are a complete unit of instruction for the computer to process. The computer executes each statement it sees-in sequence-until a jump or branch is processed. Statements can be very simple or very complex. The simplest statement is this 123; which is a numeric literal followed by a semicolon. The semicolon is very important. It tells Perl that the statement is complete. A more complicated statement might be $bookSize = ($numOfPages >= 1200 ? "Large" : "Normal"); which says if the number of pages is 1,200 or greater, then assign "Large" to $bookSize; otherwise, assign "Normal" to $bookSize. In Perl, every statement has a value. In the first example, the value of the statement is 123. In the second example, the value of the statement could be either "Large" or "Normal" depending on the value of $numOfPages. The last value that is evaluated becomes the value for the statement. Like human language in which you put statements together from parts of speech-nouns, verbs, and modifiers-you can also break down Perl statements into parts. The parts are the literals, variables, and functions you have already seen in the earlier chapters of this book. Human language phrases-like, "walk the dog"-also have their counterparts in computer languages. The computer equivalent is an expression. Expressions are a sequence of literals, variables, and functions connected by one or more operators that evaluate to a single value-scalar or array. An expression can be promoted to a statement by adding a semicolon. This was done for the first example earlier. Simply adding a semicolon to the literal made it into a statement that Perl could execute. Expressions may have side effects, also. Functions that are called can do things that are not immediately obvious (like setting global variables) or the pre- and post-increment operators can be used to change a variable's value. Let's take a short diversion from our main discussion about statements and look at expressions in isolation. Then we'll return to statements to talk about statement blocks and statement modifiers. Understanding Expressions You can break the universe of expressions up into four types: * Simple Expressions * Simple Expressions with Side Effects * Simple Expression with Operators * Complex Expressions Simple expressions consist of a single literal or variable. Table 6.1 shows some examples. Not much can be said about these expressions because they are so basic. It might be a matter for some debate whether or not an array or associative array variable can be considered a simple expression. My vote is yes, they can. The confusion might arise because of the notation used to describe an array or associative array. For example, an array can be specified as (12, 13, 14). You can see this specification as three literal values surrounded by parentheses or one array. I choose to see one array which fits the definition of a simple expression-a single variable. Table 6.1 The Simplest Perl Expressions Simple Expression Description 123 Integer literal Chocolate is great! String literal (1, 2, 3) Array literal $numPages Variable Simple expressions with side effects are the next type of expression we'll examine. A side effect is when a variable's value is changed by the expression. Side effects can be caused using any of the unary operators: +, -, ++, --. These operators have the effect of changing the value of a variable just by the evaluation of the expression.No other Perl operators have this effect-other than the assignment operators, of course. Function calls can also have side effects- especially if local variables were not used and changes were made to global variables. Table 6.2 shows examples of different side effects. Table 6.2 Perl Expressions with Side Effects Simple Expression Description $numPages++ Increments a variable ++$numPages Increments a variable chop($firstVar) Changes the value of $firstVar-a global variable sub firstsub { $firstVar = 10; } Also changes $firstVar Note that when the expressions $numPages++ and ++$numPages are evaluated, they have the same side effect even though they evaluate to different values. The first evaluates to $numPages, and the second evaluates to $numPages + 1. The side effect is to increment $numPages by 1. The firstsub() function shown in Table 6.2 changes the value of the $firstVar variable, which has a global scope. This can also be considered a side effect, especially if $firstVar should have been declared as a local variable. Simple expressions with operators are expressions that include one operator and two operands. Any of Perl's binary operators can be used in this type of expression. Table 6.3 shows a few examples of this type of expression. Table 6.3 Perl Expressions with Operators Simple Expression Description 10 + $firstVar Adds ten to $firstVar $firstVar . "AAA" Concatenates $firstVar and "AAA" "ABC" x 5 Repeats "ABC" five times Another way of viewing 10 + $firstVar is as simple expression plus simple expression. Thus, you can say that a simple expression with an operator is defined as two simple expressions connected by an operator. When computer programmers define something in terms of itself, we call it recursion. Each time a recursion is done, the expression is broken down into simpler and simpler pieces until the computer can evaluate the pieces properly. A complex expression can use any number of literals, variables, operators, and functions in any sequence. Table 6.4 shows some complex expressions. Table 6.4 Complex Perl Expressions Complex Expression (10 + 2) + 20 / (5 ** 2) 20 - (($numPages - 1) * 2) (($numPages++ / numChapters) * (1.5 / log(10)) + 6) There is an infinite number of expressions you can form with the Perl operator set. You can get extremely complicated in your use of operators and functions if you are not careful. I prefer to keep the expressions short, easy to document, and easy to maintain. Tip Sometimes it is difficult to tell whether you have enough closing parentheses for all of your opening parentheses. Starting at the left, count each open parenthesis, and when you find a closing parenthesis, subtract one from the total. If you reach zero at the end of the expression, the parentheses are balanced. Now we'll go back to looking at statements. Statement Blocks A statement block is a group of statements surrounded by curly braces. Perl views a statement block as one statement. The last statement executed becomes the value of the statement block. This means that any place you can use a single statement-like the map function-you can use a statement block. You can also create variables that are local to a statement block. So, without going to the trouble of creating a function, you can still isolate one bit of code from another. Here is how I frequently use a statement block: $firstVar = 10; { $secondVar >>= 2; $secondVar++; } $thirdVar = 20; The statement block serves to emphasize that the inner code is set apart from the rest of the program. In this case, the initialization of $secondVar is a bit more complex than the other variables. Using a statement block does not change the program execution in any way; it simply is a visual device to mark sections of code and a way to create local variables. Statement Blocks and Local Variables Normally, it's a good idea to place all of your variable initialization at the top of a program or function. However, if you are maintaining some existing code, you may want to use a statement block and local variables to minimize the impact of your changes on the rest of the code-especially if you have just been handed responsibility for a program that someone else has written. You can use the my() function to create variables whose scope is limited to the statement block. This technique is very useful for temporary variables that won't be needed elsewhere in your program. For example, you might have a complex statement that you'd like to break into smaller ones so that it's more understandable. Or you might want to insert some print statements to help debug a piece of code and need some temporary variables to accommodate the print statement. [pseudo.gif] Assign ten to $firstVar. Start the statement block. Create a local version of $firstVar with a value of A. Print $firstVar repeated five times. End the statement block. Print the global $firstVar. $firstVar = 10; { my($firstVar) = "A"; print $firstVar x 5 . "\n"; } print("firstVar = $firstVar\n"); This program displays: AAAAA firstVar = 10 You can see that the value of $firstVar has been unchanged by the statement block even though a variable called $firstVar is used inside it. This shows that the variable used inside the statement block does indeed have a local scope. Tip Statement blocks are also good to use when you temporarily need to send debugging output to a file. Then, when all the bugs have been found and the need for debugging is over, you can remove the statement block quickly and easily because all the code is in one spot. Statement Types Just as there were several types of expressions, there are also several types of statements. Table 6.5 lists seven different types of statements. Table 6.5 Perl Statement Types Statement Type Description No-action statements These statements evaluate a value but perform no actions. Action statements These statements perform some action. Assignment statements These statements assign a value to one or more variables. They are discussed, along with the assignment operator, in [39]Chapter 4 "Operators." Decision statements These statements allow you to test a condition and choose among one or more actions. Decision statements are discussed in [40]Chapter 7 "Control Statements." Jump statements These statements let you unconditionally change the program flow to another point in your code. For instance, you could use the redo keyword to send your program flow back to the beginning of a statement block. Jump statements are discussed in [41]Chapter 7 "Control Statements." Loop statements These statements let you perform a series of statements repeatedly while some condition is true or until some condition is true. Loop statements are discussed in [42]Chapter 7 "Control Statements." Modified Statements These statements let you use the if, unless, until, and while keywords to change the behavior of a statement. Note A keyword is a word that is reserved for use by Perl. These words (if, elsif, else, while, unless, until, for, foreach, last, next, redo, and continue) are integral to the language and provide you with the ability to control program flow. No-action statements are evaluated by Perl and have a value but perform no actions. For instance, the Perl statement 10 + 20; has a value of 30, but because no variables were changed, no work was done. The value of 20 is not stored anywhere, and it is quickly forgotten when the next statement is seen. What good is a no-action statement if no work is done? A lot of Perl programmers use these simple statements as return values in functions. For instance: sub firstSub { doSomething(); condition == true ? "Success" : "Failure"; } Because Perl returns the value of the last evaluated statement when leaving a function, you can use no-action statements to let Perl know what value should be returned to the main program. Notice that even though the ternary operator was used, because there are no function calls or unary operators, no work can be done. Note I still like to use the return() function to explicitly identify the return values. The previous example looks like this when using the return() function: sub firstSub { doSomething(); return(condition == true ? "Success" : "Failure"); } Action statements use expressions to perform some task. They can increment or decrement a variable and call a function. Modified statements use expressions in conjunction with a modifying keyword to perform some action. There are four modifying keywords: if, unless, until, and while. The basic syntax of a modified statement is EXPRESSION modifier (CONDITION); Let's look at some examples of modified statements. Example: Using the if Modifier The if modifier tells Perl that the expression should be evaluated only if a given condition is true. The basic syntax of a modified statement with the if modifier is EXPRESSION if (CONDITION); This is a compact way of saying if (CONDITION) { EXPRESSION; } Let's prove that the if modifier works. Here's an example showing that the if modifier can prevent the evaluation of an expression. [pseudo.gif] Initialize the $firstVar and $secondVar variables to 20. Increment $firstVar if and only if $secondVar is equal to 10. Print the values of $firstVar and $secondVar. $firstVar = 20; $secondVar = 20; $firstVar++ if ($secondVar == 10); print("firstVar = $firstVar\n"); print("secondVar = $secondVar\n"); This program prints: firstVar = 20 secondVar = 20 The program doesn't increment $firstVar because the value of $secondVar is 20 at the time the condition is evaluated. If you changed the 10 to a 20 in the condition, Perl would increment $firstVar. You can find out about the if statement-as opposed to the if modifier-in [43]Chapter 7 "Control Statements." Note The condition expression can be as complex as you'd like. However, I believe that one of the goals of statement modifiers is to make programs easier to read and understand. Therefore, I use modifiers only with simple conditions. If complex conditions need to be met before an expression should be evaluated, using the if keyword is probably a better idea. Example: Using the unless Modifier The unless modifier is the opposite of the if modifier. This modifier evaluates an expression unless a condition is true. The basic syntax of a modified statement with the unless modifier is EXPRESSION unless (CONDITION); This is a compact way of saying if (! CONDITION) { EXPRESSION; } This modifier helps to keep program code clearly understandable because you don't have to use the logical not operator to change the value of a condition so you can evaluate an expression. Let's look back at the example from a moment ago. [pseudo.gif] Initialize the $firstVar and $secondVar variables to 20. Increment $firstVar unless $secondVar is equal to 10. Print the values of $firstVar and $secondVar. $firstVar = 20; $secondVar = 20; $firstVar++ unless ($secondVar == 10); print("firstVar = $firstVar\n"); print("secondVar = $secondVar\n"); This program prints: firstVar = 21 secondVar = 20 If you were limited to using only the if modifier, the modified statement would read $firstVar++ if ($secondVar != 10); The unless modifier is more direct. All things being equal, the concept of $secondVar being equal to 10 is easier to grasp than the concept of $secondVar not being equal to 10. Of course, this is a trivial example. Let's look at something more substantial before we move on. One of the drawbacks of associative arrays is that they quietly redefine the value of any key when that key is assigned a new value, thereby losing the old value. If you are reading from a list of key-value pairs, this might not be the behavior you need. The unless modifier can be used to prevent element assignment if the key has already been used. Listing 6.1 shows the unless modifier used in a program. [pseudo.gif] Call the assignElement() function to create two elements in the @array associative array. Call the printArray() function. Try to redefine the value associated with the "A" key by calling assignElement(). Print the array again to verify that no elements have changed. ____________________________________________________________________________________________________________________ Listing 6.1 06LST01.PL-Using the unless Modifier to Control Array Element Assignment assignElement("A", "AAAA"); assignElement("B", "BBBB"); printArray(); assignElement("A", "ZZZZ"); printArray(); sub assignElement { my($key, $value) = @_; $array{$key} = $value unless defined($array{$key}); } sub printArray { while (($key, $value) = each(%array)) { print("$key = $value\n"); } print("\n"); } ____________________________________________________________________________________________________________________ This program displays: A = AAAA B = BBBB A = AAAA B = BBBB These lines of code should look a little familiar to you. The while loop in the printArray() function was used in a [44]Chapter 5example. The assignElement() function will make an assignment unless a key-value pair with the same key already exists. In that case, the assignment statement is bypassed. Example: Using the until Modifier The until modifier is a little more complex than the if or unless modifiers. It repeatedly evaluates the expression until the condition becomes true. The basic syntax of a modified statement with the until modifier is EXPRESSION until (CONDITION); This is a compact way of saying until (CONDITION) { EXPRESSION; } The expression is evaluated only while the condition is false. If the condition is true when the statement is encountered, the expression will never be evaluated. The following example proves this: [pseudo.gif] Initialize $firstVar to 10. Repeatedly evaluate $firstVar++ until the condition $firstVar > 2 is true. Print the value of $firstVar. $firstVar = 10; $firstVar++ until ($firstVar > 2); print("firstVar = $firstVar\n"); This program displays: firstVar = 10 This shows that the expression $firstVar++ was never executed because the condition was true the first time it was evaluated. If it had been executed, the value of $firstVar would have been 11 when printed. In this case, the until modifier worked exactly like the unless modifier. However, when the condition is false for the first evaluation, Perl executes the expression repeatedly until the condition is true. Here is an example: [pseudo.gif] Initialize $firstVar to 10. Repeatedly evaluate $firstVar++ until the condition $firstVar > 20 is true. Print the value of $firstVar. $firstVar = 10; $firstVar++ until ($firstVar > 20); print("firstVar = $firstVar\n"); This program displays: firstVar = 21 In this case, the $firstVar++ expression is executed 11 times. Each execution of the expression increments the value of $firstVar. When $firstVar is equal to 21, the statement ends because 21 is greater than 20, which means that the condition is true. You can find out about the until statement-as opposed to the until modifier-in [45]Chapter 7 "Control Statements." Example: Using the while Modifier The while modifier is the opposite of the until modifier. It repeatedly evaluates the expression while the condition is true. When the condition becomes false, the statement ends. The basic syntax of a modified statement with the while modifier is EXPRESSION while (CONDITION); This is a compact way of saying while (CONDITION) { EXPRESSION; } The expression is evaluated only while the condition is true. If the condition is false when the statement is encountered, the expression will never be evaluated. Here is an example using the while modifier. [pseudo.gif] Initialize $firstVar to 10. Repeatedly evaluate $firstVar++ while the condition $firstVar < 20 is true. Print the value of $firstVar. $firstVar = 10; $firstVar++ while ($firstVar < 20); print("firstVar = $firstVar\n"); This program displays: firstVar = 21 You can compare this example directly to the last example given for the until modifier. Because the until modifier is the opposite of the while modifier, the operators in the conditions are also opposite in nature. You can find out about the while statement-as opposed to the while modifier-in [46]Chapter 7 "Control Statements." Summary This chapter discussed Perl statements and how they are built from expressions. You read about four types of expressions: simple, simple with side effects, simple with operators, and complex. Next, you read about statement blocks. These program constructs are good to logically isolate one block of statements from the main program flow. You can also use statement blocks and the my() function to create local variables. This is mainly done for debugging reasons or to make small program changes that are guaranteed not to affect other portions of the program. Then, seven types of statements were mentioned: no-action, action, assignment, decision, jump, loop, and modified. This chapter described no-action, action, and modified statements. Assignment statements were mentioned in [47]Chapter 3"Variables" and again in [48]Chapter 4 "Operators." Decision, jump, and loop statements are covered in [49]Chapter 7 "Control Statements." Modified statements use the if, unless, until, and while keywords to affect the evaluation of an expression. The if keyword evaluates an expression if a given condition is true. The unless keyword does the opposite: the expression is evaluated if a given condition is false. The until keyword repeatedly evaluates an expression until the condition is true. The while keyword is the opposite of until so that it repeatedly evaluates an expression until the condition is false. The next chapter, "Control Statements," explores the decision, jump, and loop statements in detail. Review Questions Answers to Review Questions are in Appendix A. 1. What is an expression? 2. What is a statement? 3. What are the four statement modifiers? 4. What are two uses for statement blocks? 5. What can non-action statements be used for? 6. How is the if modifier different from the unless modifier? 7. What will the following code display? $firstVar = 10; $secondVar = 20; $firstVar += $secondVar++ if ($firstVar > 10); print("firstVar = $firstVar\n"); print("firstVar = $secondVar\n"); Review Exercises 1. Write a simple expression that uses the exponentiation operator. 2. Write a complex expression that uses three operators and one function. 3. Write a Perl program that uses a statement block inside a function call. 4. Use the statement block from the previous exercise to create local variables. 5. Write a Perl program that shows if the expression clause of a while modified statement will be evaluated when the condition is false. ____________________________________________________________________________________________________________________ [50][pc.gif] [51][cc.gif] [52][hb.gif] [53][nc.gif] _________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________ Chapter 7 Control Statements ____________________________________________________________________________________________________________________ CONTENTS * [27]Decision Statements + [28]Example: The if Statement * [29]Loop Statements + [30]Example: While Loops + [31]Example: Until Loops + [32]Example: For Loops + [33]Example: Foreach Loops * [34]Jump Keywords + [35]Example: The last Keyword + [36]Example: The next Keyword + [37]Example: The redo Keyword + [38]Example: The goto Keyword * [39]Summary * [40]Review Questions * [41]Review Exercises ____________________________________________________________________________________________________________________ The last chapter, "Statements," discussed no-action, action, and modified statements. This chapter discusses three more types of statements: decision statements, loop statements, and jump statements. You see how to use the if statement to decide on one or more courses of actions. Loop statements are used to repeat a series of statements until a given condition is either true or false. And finally, we'll wrap up the chapter by looking at jump statements, which let you control program flow by moving directly to the beginning or the end of a statement block. Decision Statements Decision statements use the if keyword to execute a statement block based on the evaluation of an expression or to choose between executing one of two statement blocks based on the evaluation of an expression. They are used quite often. For example, a program might need to run one code section if a customer is female and another code section if the customer is male. Example: The if Statement The syntax for the if statement is the following: if (CONDITION) { # Code block executed # if condition is true. } else { # Code block executed # if condition is false. } Sometimes you need to choose from multiple statement blocks, such as when you need to execute a different statement block for each month. You use the if...elsif statement for this type of decision. The if...elsif statement has this syntax: if (CONDITION_ONE) { # Code block executed # if condition one is true. } elsif (CONDITION_TWO) { # Code block executed # if condition two is true. } else { # Code block executed # if all other conditions are false. } Conditional expressions can use any of the operators discussed in [42]Chapter 4 "Operators." Even assignment operators can be used because the value of an assignment expression is the value that is being assigned. That last sentence may be a bit confusing, so let's look at an example. [pseudo.gif] Assign $firstVar a value of 10. Subtract five from $firstVar and if the resulting value is true (for instance, not zero), then execute the statement block. $firstVar = 10; if ($firstVar -= 5) { print("firstVar = $firstVar\n"); } This program displays: firstVar = 5 Tip If you're a C or C++ programmer, take heed: The curly braces around the statement block are not optional in Perl. Even one-line statement blocks must be surrounded by curly braces. This example, in addition to demonstrating the use of assignment operators inside conditional expressions, also shows that the else part of the if statement is optional. If the else part was coded, then it would only be executed when $firstVar starts out with a value of 5. [pseudo.gif] Assign $firstVar a value of 10. Subtract five from $firstVar and if the resulting value is true (in other words, not zero), then print $firstVar. If not, print "firstVar is zero." $firstVar = 5; if ($firstVar -= 5) { print("firstVar = $firstVar\n"); } else { print("firstVar is zero\n"); } This program displays: firstVar is zero This example shows the use of the else clause of the if statement. Because the value of $firstVar minus 5 was zero, the statements in the else clause were executed. You also can use the if statement to select among multiple statement blocks. The if...elsif form of the statement is used for this purpose. [pseudo.gif] Initialize $month to 2. If the value of $month is 1, then print January. If the value of $month is 2, then print February. If the value of $month is 3, then print March. For every other value of $month, print a message. $month = 2; if ($month == 1) { print("January\n"); } elsif ($month == 2) { print("February\n"); } elsif ($month == 3) { print("March\n"); } else { print("Not one of the first three months\n"); } This program displays: February The else clause at the end of the elsif chain serves to catch any unknown or unforeseen values and is a good place to put error messages. Frequently, those error messages should include the errant value and be written to a log file so that the errors can be evaluated. After evaluation, you can decide if the program needs to be modified to handle that unforeseen value using another elsif clause. Loop Statements A loop is used to repeat the execution of a statement block until a certain condition is reached. A loop can be used to iterate through an array looking for a value. Loops also can be used to count quantities. Actually, the number of uses for loops is pretty much unlimited. There are three types of loops: while loops, until loops, and for loops. Example: While Loops While loops are used to repeat a block of statements while some condition is true. There are two forms of the loop: one where the condition is checked before the statements are executed (the do..while loop), and one in which the condition is checked after the statements are executed (the while loop). The do...while loop has this syntax: do { STATEMENTS } while (CONDITION); The while loop has this syntax: while (CONDITION) { STATEMENTS } continue { STATEMENTS } The statements in the continue block of the while loop are executed just before the loop starts the next iteration. The continue block rarely is used. However, you can see it demonstrated in the section, "Example: Using the -n and -p Options," in [43]Chapter 17, "Using Command-Line Options." Which type you use for any particular task is entirely dependent on your needs at the time. The statement block of a do...while loop always will be executed at least once. This is because the condition is checked after the statement block is executed rather than before. Here is an example of the do...while loop. [pseudo.gif] Initialize $firstVar to 10. Start the do...while loop. Print the value of $firstVar. Increment $firstVar. Check the while condition; if true, jump back to the start of the statement block. Print the value of $firstVar. $firstVar = 10; do { print("inside: firstVar = $firstVar\n"); $firstVar++; } while ($firstVar < 2); print("outside: firstVar = $firstVar\n"); This program displays: inside: firstVar = 10 outside: firstVar = 11 This example shows that the statement block is executed even though the condition $firstVar < 2 is false when the loop starts. This ability occasionally comes in handy while counting down-such as when printing pages of a report. [pseudo.gif] Initialize $numPages to 10. Start the do...while loop. Print a page. Decrement $numPages and then loop if the condition is still true. $numPages = 10; do { printPage(); } while (--$numPages); When this loop is done, all of the pages will have been displayed. This type of loop would be used when you know that there always will be pages to process. Notice that because the predecrement operator is used, the $numPages variable is decremented before the condition expression is evaluated. If you need to ensure that the statement block does not get executed, then you need to use the while statement. [pseudo.gif] Initialize $firstVar to 10. Start the while loop and test the condition. If false, don't execute the statement block. Print the value of $firstVar. Increment $firstVar. Jump back to the start of the statement block and test the condition again. Print the value of $firstVar. $firstVar = 10; while ($firstVar < 2) { print("inside: firstVar = $firstVar\n"); $firstVar++; }; print("outside: firstVar = $firstVar\n"); This program displays: outside: firstVar = 10 This example shows that the statement block is never evaluated if the condition is false when the while loop starts. Of course, it's more common to use while loops that actually execute the statement block-like the following: [pseudo.gif] Initialize $firstVar to 10. Start the while loop and test the condition. Print the value of $firstVar. Increment $firstVar. Jump back to the start of the statement block and test the condition again. Print the value of $firstVar. $firstVar = 10; while ($firstVar < 12) { print("inside: firstVar = $firstVar\n"); $firstVar++; }; print("outside: firstVar = $firstVar\n"); This program displays: inside: firstVar = 10 inside: firstVar = 11 outside: firstVar = 12 It's important to note that the value of $firstVar ends up as 12 and not 11 as you might expect upon casually looking at the code. When $firstVar is still 11, the condition is true, so the statement block is executed again, thereby incrementing $firstVar to 12. Then, the next time the condition is evaluated, it is false and the loop ends with $firstVar equal to 12. Example: Until Loops Until loops are used to repeat a block of statements while some condition is false. Like the previous while loop, there are also two forms of the until loop: one where the condition is checked before the statements are executed (the do...until loop), and one in which the condition is checked after the statements are executed (the until loop). The do...until loop has this syntax: do { STATEMENTS } until (CONDITION); The until loop has this syntax: until (CONDITION) { STATEMENTS } Again, the loop type you use is dependent on your needs at the time. Here is an example of the do...until loop. [pseudo.gif] Initialize $firstVar to 10. Start the do..until loop. Print the value of $firstVar. Increment $firstVar. Check the until condition; if false, jump back to the start of the statement block. Print the value of $firstVar. $firstVar = 10; do { print("inside: firstVar = $firstVar\n"); $firstVar++; } until ($firstVar < 2); print("outside: firstVar = $firstVar\n"); This program displays: inside: firstVar = 10 inside: firstVar = 11 inside: firstVar = 12 inside: firstVar = 13 inside: firstVar = 14 ... This loop continues forever because the condition can never be true. $firstVar starts out greater than 2 and is incremented inside the loop. Therefore, this is an endless loop. Tip If you ever find it hard to understand a conditional expression in a loop statement, try the following: Wrap the entire condition expression inside paren-theses and add == 1 to the right-hand side. The above loop then becomes do { ... } until (($firstVar < 2) == 1); This example shows that the statement block is executed even though the condition $firstVar < 2 is false when the loop starts. The next example shows the until loop in action, which does not execute the statement block when the conditional expression is false when the loop starts. [pseudo.gif] Initialize $firstVar to 10. Start the until loop and test the condition. If true, don't execute the state-ment block. print the value of $firstVar. Increment $firstVar. Jump back to the start of the statement block and test the condition again. Print the value of $firstVar. $firstVar = 10; until ($firstVar < 20) { print("inside: firstVar = $firstVar\n"); $firstVar++; }; print("outside: firstVar = $firstVar\n"); This program displays: outside: firstVar = 10 This example shows that the statement block is never evaluated if the condition is true when the until loop starts. Here is another example of an until loop that shows the statement block getting executed: [pseudo.gif] Initialize $firstVar to 10. Start the while loop and test the condition. Print the value of $firstVar. Increment $firstVar. Jump back to the start of the statement block and test the condition again. Print the value of $firstVar. $firstVar = 10; until ($firstVar > 12) { print("inside: firstVar = $firstVar\n"); $firstVar++; }; print("outside: firstVar = $firstVar\n"); This program displays: inside: firstVar = 10 inside: firstVar = 11 inside: firstVar = 12 outside: firstVar = 13 Example: For Loops One of the most common tasks in programming is looping a specific number of times. Whether you need to execute a certain function for every customer in your database or print a page in a report, the for loop can be used. Its syntax is: for (INITIALIZATION; CONDITION; IncREMENT/DECREMENT) { STATEMENTS } The initialization expression is executed first-before the looping starts. It can be used to initialize any variables that are used inside the loop. Of course, this could be done on the line before the for loop. However, including the initialization inside the for statement aids in identifying the loop variables. When initializing variables, be sure not to confuse the equality operator (==) with the assignment operator (=). The following is an example of what this error could look like: for ($index == 0; $index < 0; $index++) One of the equal signs should be removed. If you think you are having a problem with programming the for loop, make sure to check out the operators. The condition expression is used to determine whether the loop should continue or be ended. When the condition expression evaluates to false, the loop will end. The increment/decrement expression is used to modify the loop variables in some way each time the code block has been executed. Here is an example of a basic for loop: [pseudo.gif] Start the for loop by initializing $firstVar to zero. The $firstVar variable will be incremented each time the statement block is executed. The statement block will be executed as long as $firstVar is less than 100. Print the value of $firstVar each time through the loop. for ($firstVar = 0; $firstVar < 100; $firstVar++) { print("inside: firstVar = $firstVar\n"); } This program will display: inside: firstVar = 0 inside: firstVar = 1 ... inside: firstVar = 98 inside: firstVar = 99 This program will display the numbers 0 through 99. When the loop is over, $firstVar will be equal to 100. For loops also can be used to count backwards. [pseudo.gif] Start the for loop by initializing $firstVar to 100. The $firstVar variable will be decremented each time the statement block is executed. And the statement block will be executed as long as $firstVar is greater than 0. Print the value of $firstVar each time through the loop. for ($firstVar = 100; $firstVar > 0; $firstVar--) { print("inside: firstVar = $firstVar\n"); } This program will display: inside: firstVar = 100 inside: firstVar = 99 ... inside: firstVar = 2 inside: firstVar = 1 You can use the comma operator to evaluate two expressions at once in the initialization and the increment/decrement expressions. [pseudo.gif] Start the for loop by initializing $firstVar to 100 and $secondVar to 0. The $firstVar variable will be decremented and $secondVar will be incremented each time the statement block is executed. The statement block will be executed as long as $firstVar is greater than 0. Print the value of $firstVar and $secondVar each time through the loop. for ($firstVar = 100, $secondVar = 0; $firstVar > 0; $firstVar--, $secondVar++) { print("inside: firstVar = $firstVar secondVar = $secondVar\n"); } This program will display: inside: firstVar = 100 secondVar = 0 inside: firstVar = 99 secondVar = 1 ... inside: firstVar = 2 secondVar = 98 inside: firstVar = 1 secondVar = 99 Note The comma operator lets you use two expressions where Perl would normally let you have only one. The value of the statement becomes the value of the last expression evaluated. A more common use of the comma operator might be to initialize some flag variables that you expect the loop to change. This next example will read the first 50 lines of a file. If the end of the file is reached before the last line is read, the $endOfFile flag variable will be set to 1. [pseudo.gif] Start the for loop by initializing the end of file flag variable to zero to indicate false, then set $firstVar to 0. The $firstVar variable will be incremented each time the statement block is executed. The statement block will be executed as long as $firstVar is less than 50. Print the value of $firstVar and $secondVar each time through the loop. for ($endOfFile = 0, $firstVar = 0; $firstVar < 50; $firstVar++, $secondVar++) { if (readLine() == 0) $endOfFile = 1; } If the $endOfFile variable is 1 when the loop ends, then you know the file has less than 50 lines. Example: Foreach Loops Arrays are so useful that Perl provides a special form of the for statement just for them. The foreach statement is used solely to iterate over the elements of an array. It is very handy for finding the largest element, printing the elements, or simply seeing if a given value is a member of an array. foreach LOOP_VAR (ARRAY) { STATEMENTS } The loop variable is assigned the value of each array element, in turn until the end of the array is reached. Let's see how to use the foreach statement to find the largest array element. [pseudo.gif] Call the max() function twice with different parameters each time. Define the max() function. Create a local variable, $max, then get the first element from the parameter array. Loop through the parameter array comparing each element to $max,if the current element is greater than $max. Return the value of $max. print max(45..121, 12..23) . "\n"; print max(23..34, 356..564) . "\n"; sub max { my($max) = shift(@_); foreach $temp (@_) { $max = $temp if $temp > $max; } return($max); } This program displays: 121 564 There are a couple of important things buried in this example. One is the use of the shift() function to value a local variable and remove the first element of the parameter array from the array at the same time. If you use shift() all by itself, the value of the first element is lost. The other important thing is the use of $temp inside the foreach loop. Some Perl programmers dislike using temporary variables in this manner. Perl has an internal variable, $_, that can be used instead. If no loop variable is specified, $_ will be assigned the value of each array element as the loop iterates. [pseudo.gif] Print the return value from the max() function. Define the max() function. Create a local variable, $max, then get the first element from the parameter array. Loop through the parameter array comparing each element to $max, if the current element is greater than $max: Return the value of $max. print max(45..121, 12..23) . "\n"; print max(23..34, 356..564) . "\n"; sub max { my($max) = shift(@_); foreach (@_) { $max = $_ if $_ > $max; } return($max); } The third item has nothing to do with the foreach loop, at least not directly. But, this seems like a good time to mention it. The statement inside the loop also could be written in the following way: $max = $_ if $max < $_; with the sense of the operator reversed. However, notice that it will take more effort to understand what the statement-as a whole-is doing. The reader of your program knows that the function is looking for the greatest value in a list. If the less than operator is used, it will contradict the stated purpose of your function-at least until the reader figures out the program logic. Whenever possible, structure your program logic to agree with the main premise of the function. Now for the fourth, and final, item regarding this small program. Notice that the function name and the local variable name are the same except for the beginning dollar sign. This shows that function names and variable names use different namespaces. Remember namespaces? They were mentioned in [44]Chapter 3 "Variables." Using the foreach statement requires using a little bit of caution because the local variable (either $_ or the one you specify) accesses the array elements using the call by reference scheme. When call by reference is used, changing the value in one place (such as inside the loop) also changes the value in the main program. [pseudo.gif] Create an array from 1 to 10 with 5 repeated. Print the array. Loop through the array replacing any elements equal to 5 with "**". Print the array. @array = (1..5, 5..10); print("@array\n"); foreach (@array) { $_ = "**" if ($_ == 5); } print("@array\n"); This program displays: 1 2 3 4 5 5 6 7 8 9 10 1 2 3 4 ** ** 6 7 8 9 10 Caution If you use the foreach loop to change the value of the array elements, be sure to comment your code to explain the situation and why this method was used. Jump Keywords Perl has four keywords that let you change the flow of your programs. Table 7.1 lists the keywords along with a short description. Table 7.1 Perl's Jump Keywords Keywords Description last Jumps out of the current statement block. next Skips the rest of the statement block and continues with the next iteration of the loop. redo Restarts the statement block. goto Jumps to a specified label. Each of these keywords is described further in its own section, which follows. Example: The last Keyword The last keyword is used to exit from a statement block. This ability is useful if you are searching an array for a value. When the value is found, you can stop the loop early. [pseudo.gif] Create an array holding all 26 letters. Use a for loop to iterate over the array. The index variable will start at zero and increment while it is less than the number of elements in the array. Test the array element to see if it is equal to "T." Notice that the string equality operator is used. If the array element is "T," then exit the loop. @array = ("A".."Z"); for ($index = 0; $index < @array; $index++) { if ($array[$index] eq "T") { last; } } print("$index\n"); This program displays: 19 This loop is straightforward except for the way that it calculates the number of elements in the array. Inside the conditional expression, the @array variable is evaluated in an scalar context. The result is the number of elements in the array. When the last keyword is executed, the conditional expression and theincrement/decrement expression are not reevaluated, the statement block is left. Execution begins again immediately after the ending curly brace. You also can use a label with the last keyword to indicate which loop to exit. A label is a name followed by a colon. Labels' names usually use all capital letters, but Perl does not insist on it. When you need to exist a nested loop, labels are a big help. Let's look at this situation in two steps. Here is a basic loop: [pseudo.gif] Loop from 0 to 10 using $index as the loop variable. If $index is equal to 5 then exit the loop. Print the value of $index while inside the loop. Print the value of $index after the loop ends. for ($index = 0; $index < 10; $index++) { if ($index == 5) { last; } print("loop: index = $index\n"); } print("index = $index\n"); This program displays: loop: index = 0 loop: index = 1 loop: index = 2 loop: index = 3 loop: index = 4 index = 5 So far, pretty simple. The print statement inside the loop lets us know that the $index variable is being incremented. Now, let's add an inner loop to complicate things. [pseudo.gif] Specify a label called OUTER_LOOP. Loop from 0 to 10 using $index as the loop variable. If $index is equal to 5, then exit the loop. Start an inner loop that repeats while $index is less than 10. If $index is 4, then exit out of both inner and outer loops. Increment $index. Print the value of $index. OUTER_LOOP: for ($index = 0; $index < 10; $index++) { if ($index == 5) { last; } while ($index < 10) { if ($index == 4) { last OUTER_LOOP; } print("inner: index = $index\n"); $index++; } print("outer: index = $index\n"); } print("index = $index\n"); This program displays: inner: index = 0 inner: index = 1 inner: index = 2 inner: index = 3 index = 4 The inner while loop increments $index while it is less than 10. However, before it can reach 10 it must pass 4, which triggers the if statement and exits both loops. You can tell that the outer loop also was exited because the outer print statement is never executed. Example: The next Keyword The next keyword lets you skip the rest of the statement block and start the next iteration. One use of this behavior could be to select specific array elements for processing and ignoring the rest. For example: [pseudo.gif] Create an array of 10 elements. Print the array. Iterate over the array. Ignore the third and fifth element. Change the current element to an asterisk. Print the array to verify that it has been changed. @array = (0..9); print("@array\n"); for ($index = 0; $index < @array; $index++) { if ($index == 3 || $index == 5) { next; } $array[$index] = "*"; } print("@array\n"); This program displays: 0 1 2 3 4 5 6 7 8 9 * * * 3 * 5 * * * * This example changes every array element, except the third and fifth, to asterisks regardless of their former values. The next keyword forces Perl to skip over the assignment statement and go directly to the increment/decrement expression. You also can use the next keyword in nested loops. [pseudo.gif] Define a label called OUTER_LOOP. Start a for loop that iterates from 0 to 3 using $row as the loop variable. Start a for loop that iterates from 0 to 3 using $col as the loop variable. Display the values of $row and $col and mention that the code is inside the inner loop. If $col is equal to 1, start the next iteration of loop near the label OUTER_LOOP. Display the values of $row and $col and mention that the code is inside the outer loop. OUTER_LOOP: for ($row = 0; $row < 3; $row++) { for ($col = 0; $col < 3; $col++) { print("inner: $row,$col\n"); if ($col == 1) { next OUTER_LOOP; } } print("outer: $row,$col\n\n"); } This program displays: inner: 0,0 inner: 0,1 inner: 1,0 inner: 1,1 inner: 2,0 inner: 2,1 You can see that the next statement in the inner loop causes Perl to skip the print statement in the outer loop whenever $col is equal to 1. Example: The redo Keyword The redo keyword causes Perl to restart the current statement block. Neither the increment/decrement expression nor the conditional expression is evaluated before restarting the block. This keyword is usually used when getting input from outside the program, either from the keyboard or from a file. It is essential that the conditions that caused the redo statement to execute can be changed so that an endless loop does not occur. This example will demonstrate the redo keyword with some keyboard input: [pseudo.gif] Start a statement block. Print a prompt asking for a name. Read a string from the keyboard. Control is returned to the program when the user of the program presses the Enter key. Remove the newline character from the end of the string. If the string has zero length, it means the user simply pressed the Enter key without entering a name, so display an error message and redo the statement block. Print a thank-you message with the name in uppercase characters. print("What is your name? "); $name = ; chop($name); if (! length($name)) { print("Msg: Zero length input. Please try again\n"); redo; } print("Thank you, " . uc($name) . "\n"); } Tip It's worth noting that the statement block in this example acts like a single-time loop construct. You can use any of the jump keywords inside the statement block. The redo statement helps you to have more straightforward program flow. Without it, you would need to use a do...until loop. For example: [pseudo.gif] Start a do...until statement. Print a prompt asking for a name. Read a string from the keyboard. Control is returned to the program when the user of the program presses the enter key. Remove the newline character from the end of the string. If the string has zero length, it means the user simply pressed the Enter key without entering a name, so display an error message. Evaluate the conditional expression. If true, then the user entered a name and the loop can end. Print a thank you message with the name in uppercase characters. do { print("What is your name? "); $name = ; chomp($name); if (! length($name)) { print("Msg: Zero length input. Please try again\n"); } } until (length($name)); print("Thank you, " . uc($name) . "\n"); The do...until loop is less efficient because the length of $name needs to be tested twice. Because Perl has so many ways to do any given task, it pays to think about which method is more efficient before implementing your ideas. Example: The goto Keyword The goto statement lets your program jump directly to any label. However, because Perl also provides the loop statements and other jump keywords, its use is looked down on by most programmers. Using the goto in your programs frequently causes your program logic to become convoluted. If you write a program that you feel needs a goto in order to run, then use it-but first, try to restructure the program to avoid it. Summary This chapter was devoted to learning about three types of statements: decision, loop, and jump. Decision statements use the if keyword to execute a statement block depending on the evaluation of conditional expressions. Loop statements also execute a statement block based on a given condition, but they will repeatedly execute the block until the condition is true or while the condition is true. Jump statements are used to restart statement blocks, skip to the next iteration in a loop, and exit loops prematurely. The if statement can be used with an else clause to choose one of two statement blocks to execute. Or, you can use the elsif clause to choose from among more than two statement blocks. Both the while and until loop statements have two forms. One form (the do... form) executes a statement block and then tests a conditional expression, and the other form tests the condition before executing the statement block. The for loops are the most complicated type of loop because they involve three expressions in addition to a statement block. There is an initialization expression, a conditional expression, and an increment/decrement expression. The initialization expression is evaluated first, then the conditional expression. If the conditional expression is false, the statement block is executed. Next, the increment/decrement expression is evaluated and the loop starts again with the conditional expression. Foreach loops are used to iterate through an array. Each element in the array is assigned to a local variable as the loop progresses through the array. If you don't specify a local variable, Perl will use the $ special variable. You need to be careful when changing the value of the local variable because it uses the call by reference scheme. Therefore, any change to the local variable will be reflected in the value of the array element outside the foreach loop. The last keyword is used to jump out of the current statement block. The next keyword is used to skip the rest of the statement block and continue to the next iteration of the loop. The redo keyword is used to restart the statement block. And finally, the goto keyword should not be used because the other jump keywords are more descriptive. All of the jump keywords can be used with labels so they can be used inside nested loops. Review Questions Answers to Review Questions are in Appendix A. 1. What are the four loop keywords? 2. What are the four jump keywords? 3. Which form of the until statement is used when the statement block needs to be executed at least once? 4. What will be displayed when this program executes? $firstVar = 5; { if ($firstVar > 10) { last; } $firstVar++; redo; } print("$firstVar\n"); 5. What is the default name of the local variable in the foreach loop? 6. How is the next keyword different from the redo keyword? 7. Why is the comma operator useful in the initialization expression of a for loop? 8. What is the shift() function used for? Review Exercises 1. Use the while loop in a program to count from 1 to 100 in steps of 5. 2. Use the for loop in a program to print each number from 55 to 1. 3. Use an until loop, the next statement, and the modulus operator to loop from 0 to 100 and print out "AAA" every Sixteenth iteration. 4. Use the foreach loop to determine the smallest element in an array. 5. Use a for loop to iterate over an array and multiple each element by 3. 6. Use a do..until loop and the each() function to iterate over an associative array looking for an value equal to "AAA." When the element is found, the loop should be ended. ____________________________________________________________________________________________________________________ [45][pc.gif] [46][cc.gif] [47][hb.gif] [48][nc.gif] _________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________ Chapter 8 References ____________________________________________________________________________________________________________________ CONTENTS * [28]Reference Types + [29]Example: Passing Parameters to Functions + [30]Example: The ref() Function + [31]Example: Creating a Data Record + [32]Example: Interpolating Functions Inside Double-Quoted Strings * [33]Summary * [34]Review Questions * [35]Review Exercises ____________________________________________________________________________________________________________________ A reference is a scalar value that points to a memory location that holds some type of data. Everything in your Perl program is stored inside your computer's memory. Therefore, all of your variables and functions are located at some memory location. References are used to hold the memory addresses. When a reference is dereferenced, you retrieve the information referred to by the reference. Reference Types There are six types of references. A reference can point to a scalar, an array, a hash, a glob, a function, or another reference. Table 8.1 shows how the different types are valued with the assignment operator and how to dereference them using curly braces. Note I briefly mentioned hashes in [36]Chapter 3 "Variables." Just to refresh your memory, hashes are another name for associative arrays. Because "hash" is shorter than "associative array," I'll be using both terms in this chapter. Table 8.1 The Six Types of References Reference Assignment How to Dereference $refScalar = \$scalar; ${$refScalar} is a scalar value. $refArray = \@array; @{$refArray} is an array value. $refHash = \%hash; %{$refHash} is a hash value. $refglob = \*file; Glob references are beyond the scope of this book, but a short example can be found at http://www. mtolive.com/pbc/ch08.htm#Josh Purinton. $refFunction = \&function; &{$refFunction} is a function location. $refRef = \$refScalar; ${${$refScalar} is a scalar value. Essentially, all you need to do in order to create a reference is to add the backslash to the front of a value or variable. Example: Passing Parameters to Functions Back in [37]Chapter 5 "Functions," we talked about passing parameters to functions. At the time, we were not able to pass more than one array to a function. This was because functions only see one array (the @_ array) when looking for parameters. References can be used to overcome this limitation. Let's start off by passing two arrays into a function to show that the function only sees one array. [pseudo.gif] Call firstSub() with two arrays as parameters. Define the firstSub() function. Create local variables and assign elements from the parameter array to them. Print the local arrays. firstSub( (1..5), ("A".."E")); sub firstSub { my(@firstArray, @secondArray) = @_ ; print("The first array is @firstArray.\n"); print("The second array is @secondArray.\n"); } This program displays: The first array is 1 2 3 4 5 A B C D E. The second array is . Inside the firstSub() function, the @firstArray variable was assigned the entire parameter array, leaving nothing for the @secondArray variable. By passing references to @arrayOne and @arrayTwo, we can preserve the arrays for use inside the function. Very few changes are needed to enable the above example to use references. Take a look. [pseudo.gif] Call firstSub() using the backslash operator to pass a reference to each array. Define the firstSub() function. Create two local scalar variables to hold the array references. Print the local variables, dereferencing them to look like arrays. This is done using the @{} notation. firstSub( \(1..5), \("A".."E") ); # One sub firstSub { my($ref_firstArray, $ref_secondArray) = @_ ; # Two print("The first array is @{$ref_firstArray}.\n"); # Three print("The second array is @{$ref_secondArray}.\n"); # Three } This program displays: The first array is 1 2 3 4 5. The second array is A B C D E. Three things were done to make this example use references: 1. In the line marked "One," backslashes were added to indicate that a reference to the array should be passed. 2. In the line marked "Two," the references were taken from the parameter array and assigned to scalar variables. 3. In the lines marked "Three," the scalar values were dereferenced. Dereferencing means that Perl will use the reference as if it were a normal data type-in this case, an array variable. Example: The ref() Function Using references to pass arrays into a function worked well and it was easy, wasn't it? However, what happens if you pass a scalar reference to the firstSub() function instead of an array reference? Listing 8.1 shows how passing a scalar reference when the function demands an array reference causes problems. [pseudo.gif] Call firstSub() and pass a reference to a scalar and a reference to an array. Define the firstSub() function. Create two local scalar variables to hold the array references. Print the local variables, dereferencing them to look like arrays. ____________________________________________________________________________________________________________________ Listing 8.1 08LST01.PL-Passing a Scalar Reference When the Function Demands an Array Reference Causes Problems firstSub( \10, \("A".."E") ); sub firstSub { my($ref_firstArray, $ref_secondArray) = @_ ; print("The first array is @{$ref_firstArray}.\n"); print("The second array is @{$ref_secondArray}.\n"); } ____________________________________________________________________________________________________________________ This program displays: Not an ARRAY reference at 08lst01.pl line 9. Perl provides the ref() function so that you can check the reference type before dereferencing a reference. The next example shows how to trap the mistake of passing a scalar reference instead of an array reference. [pseudo.gif] Call firstSub() and pass a reference to each variable. Define the firstSub() function. Create two local scalar variables to hold the array references. Print the local variables if each variable is a reference to an array. Otherwise, print nothing. Listing 8.2 shows how to test for an Array Reference passed as a parameter. ____________________________________________________________________________________________________________________ Listing 8.2 08LST02.PL-How to Test for an Array Reference Passed as a Parameter firstSub( \10, \("A".."E") ); sub firstSub { my($ref_firstArray, $ref_secondArray) = @_ ; print("The first array is @{$ref_firstArray}.\n") if (ref($ref_firstArray) eq "ARRAY"); # One print("The second array is @{$ref_secondArray}.\n" if (ref($ref_secondArray) eq "ARRAY"); # Two } ____________________________________________________________________________________________________________________ This program displays: The second array is 1 2 3 4 5. Only the second parameter is printed because the first parameter-the scalar reference-failed the test on the line marked "One." The statement modifiers on the lines marked "One" and "Two" ensure that we are dereferencing an array reference. This prevents the error message that appeared earlier. Of course, in your own programs you might want to set an error flag or print a warning. For more information about statement modifiers, see [38]Chapter 6 "Statements." Table 8.2 shows some values that the ref() function can return. Table 8.2 Using the ref() Function Function Call Return Value ref( 10 ); undefined ref( \10 ); SCALAR ref( \{1 => "Joe"} ); HASH ref( \&firstSub ); CODE ref( \\10 ); REF Listing 8.3 shows another example of the ref() function in action. [pseudo.gif] Initialize scalar, array, and hash variables. Pass the variables to the printRef() function. These are non-references so the undefined value should be returned. Pass variable references to the printRef() function. This is accomplished by prefixing the variable names with a backslash. Pass a function reference and a reference to a reference to the printRef() function. Define the printRef() function. Iterate over the parameter array. Assign the reference type to $refType. If the current parameter is a reference, then print its reference type, otherwise, print that it's a non-reference. ____________________________________________________________________________________________________________________ Listing 8.3 08LST03.PL-Using the ref() Function to Determine the Reference Type of a Parameter $scalar = 10; @array = (1, 2); %hash = ( "1" => "Davy Jones" ); # I added extra spaces around the parameter list # so that the backslashes are easier to see. printRef( $scalar, @array, %hash ); printRef( \$scalar, \@array, \%hash ); printRef( \&printRef, \\$scalar ); # print the reference type of every parameter. sub printRef { foreach (@_) { $refType = ref($_); defined($refType) ? print "$refType " : print("Non-reference "); } print("\n"); } ____________________________________________________________________________________________________________________ This program displays: Non-reference Non-reference Non-reference SCALAR ARRAY HASH CODE REF By using the ref() function you can protect program code that dereferences variables from producing errors when the wrong type of reference is used. Example: Creating a Data Record Perl's associative arrays (hashes) are extremely useful when it comes to storing information in a way that facilitates easy retrieval. For example, you could store customer information like this: %record = ( "Name" => "Jane Hathaway", "Address" => "123 Anylane Rd.", "Town" => "AnyTown", "State" => "AnyState", "Zip" => "12345-1234" ); The %record associative array also can be considered a data record with five members. Each member is a single item of information. The data record is a group of members that relates to a single topic. In this case, that topic is a customer address. And, a database is one or more data records. Each member is accessed in the record by using its name as the key. For example, you can access the state member by saying $record{"State"}. In a similar manner, all of the members can be accessed. Of course, a database with only one record is not very useful. By using references, you can build a multiple record array. Listing 8.4 shows two records and how to initialize a database array. [pseudo.gif] Declare a data record called %recordOne as an associative array. Declare a data record called %recordTwo as an associative array. Declare an array called @database with references to the associative arrays as elements. ____________________________________________________________________________________________________________________ Listing 8.4 08LST04.PL-A Database with Two Records %recordOne = ( "Name" => "Jane Hathaway", "Address" => "123 Anylane Rd.", "Town" => "AnyTown", "State" => "AnyState", "Zip" => "12345-1234" ); %recordTwo = ( "Name" => "Kevin Hughes", "Address" => "123 Allways Dr.", "Town" => "AnyTown", "State" => "AnyState", "Zip" => "12345-1234" ); @database = ( \%recordOne, \%recordTwo ); ____________________________________________________________________________________________________________________ You can print the address member of the first record like this: print( %{$database[0]}->{"Address"} . "\n"); which displays: 123 Anylane Rd. Let's dissect the dereferencing expression in this print statement. Remember to work left to right and always evaluate brackets and parentheses first. Ignoring the print() function and the newline, you can evaluate this line of code in the following way: * The inner most bracket is [0], which means that we'll be looking at the first element of an array. * The square bracket operators have a left to right associativity, so we look left for the name of the array. The name of the array is database. * Next come the curly brackets, which tell Perl to dereference. Curly brackets also have a left to right associativity, so we look left to see the reference type. In this case we see a %, which means an associative array. * The -> is the infix dereference operator. It tells Perl that the thing being dereferenced on the left (the database reference in this case) is connected to something on the right. * The 'thing' on the right is the key value or "Address." Notice that it is inside curly braces exactly as if a regular hash key were being used. The variable declaration in the above example uses three variables to define the data's structure. We can condense the declaration down to one variable as shown in Listing 8.5. [pseudo.gif] Declare an array called @database with two associative arrays as elements. Because the associative arrays are not being assigned directly to a variable, they are considered anonymous. Print the value associated with the "Name" key for the first element of the @database array. Print the value associated with the "Name" key for the second element of the @database array. ____________________________________________________________________________________________________________________ Listing 8.5 08LST05.PL-Declaring the Database Structure in One Shot @database = ( { "Name" => "Jane Hathaway", "Address" => "123 Anylane Rd.", "Town" => "AnyTown", "State" => "AnyState", "Zip" => "12345-1234" }, { "Name" => "Kevin Hughes", "Address" => "123 Allways Dr.", "Town" => "AnyTown", "State" => "AnyState", "Zip" => "12345-1234" } ); print(%{$database[0]}->{"Name"} . "\n"); print(%{$database[1]}->{"Name"} . "\n"); ____________________________________________________________________________________________________________________ This program displays: Jane Hathaway Kevin Hughes Let's analyze the dereferencing code in the first print line. * The innermost bracket is [0], which means that we'll be looking at the first element of an array. * The square bracket operators have a left to right associativity, so we look left for the name of the array. The name of the array is database. * Next comes the curly brackets, which tell Perl to dereference. Curly brackets also have a left to right associativity, so we look left to see the reference type. In this case we see a %, which means an associative array. * The -> is the infix dereference operator. It tells Perl that the thing being dereferenced on the left (the database reference in this case) is connected to something on the right. * The 'thing' on the right is the key value or "Name." Notice that it is inside curly braces exactly as if a regular hash key were being used. Even though the structure declarations in the last two examples look different, they are equivalent. You can confirm this because the structures are dereferenced the same way. What's happening here? Perl is creating anonymous associative array references that become elements of the @database array. In the previous example, each hash had a name-%recordOne and %recordTwo. In the current example, there is no variable name directly associated with the hashes. If you use an anonymous variable in your programs, Perl automatically will provide a reference to it. We can explore the concepts of data records a bit further using this basic example. So far, we've used hash references as elements of an array. When one data type is stored inside of another data type, this is called nesting data types. You can nest data types as often and as deeply as you would like. At this stage of the example, %{$database[0]}->{"Name"} was used to dereference the "Name" member of the first record. This type of dereferencing uses an array subscript to tell Perl which record to look at. However, you could use an associative array to hold the records. With an associative array, you could look at the records using a customer number or other id value. Listing 8.6 shows how this can be done. [pseudo.gif] Declare a hash called %database with two keys, MRD-100 and MRD-250. Each key has a reference to an anonymous hash as its value. Find the reference to the hash associated with the key "MRD-100." Then print the value associated with the key "Name" inside the first hash. Find the reference to the hash associated with the key "MRD-250." Then print the value associated with the key "Name" inside the first hash. ____________________________________________________________________________________________________________________ Listing 8.6 08LST06.PL-Using an Associative Array to Hold the Records %database = ( "MRD-100" => { "Name" => "Jane Hathaway", "Address" => "123 Anylane Rd.", "Town" => "AnyTown", "State" => "AnyState", "Zip" => "12345-1234" }, "MRD-250" => { "Name" => "Kevin Hughes", "Address" => "123 Allways Dr.", "Town" => "AnyTown", "State" => "AnyState", "Zip" => "12345-1234" } ); print(%{$database{"MRD-100"}}->{"Name"} . "\n"); print(%{$database{"MRD-250"}}->{"Name"} . "\n"); ____________________________________________________________________________________________________________________ This program displays: Jane Hathaway Kevin Hughes You should be able to follow the same steps that we used previously to decipher the print statement in this listing. The key is that the associative array index is surrounded by the curly brackets instead of the square brackets used previously. There is one more twist that I would like to show you using this data structure. Let's see how to dynamically add information. First, we'll look at adding an entire data record, and then we'll look at adding new members to an existing data record. Listing 8.7 shows you can use a standard hash assignment to dynamically create a data record. [pseudo.gif] Assign a reference to a hash to the "MRD-300" key in the %database associative array. Assign the reference to the hash associated with the key "MRD-300" to the $refCustomer variable. Print the value associated with the key "Name" inside hash referenced by $refCustomer. Print the value associated with the key "Address" inside hash referenced by $refCustomer. ____________________________________________________________________________________________________________________ Listing 8.7 08LST07.PL-Creating a Record Using Hash Assignment $database{"MRD-300"} = { "Name" => "Nathan Hale", "Address" => "999 Centennial Ave.", "Town" => "AnyTown", "State" => "AnyState", "Zip" => "12345-1234" }; $refCustomer = $database{"MRD-300"}; print(%{$refCustomer}->{"Name"} . "\n"); print(%{$refCustomer}->{"Address"} . "\n"); ____________________________________________________________________________________________________________________ This program displays: Nathan Hale 999 Centennial Ave. Notice that by using a temporary variable ($refCustomer), the program code is more readable. The alternative would be this: print(%{$database{"MRD-300"}}->{"Name"} . "\n"); Most programmers would agree that using the temporary variable aids in the understanding of the program. Our last data structure example will show how to add members to an existing customer record. Listing 8.8 shows how to add two phone number members to customer record MRD-300. [pseudo.gif] Assign a reference to an anonymous function to $codeRef. This function will print the elements of the %database hash. Because each value in the %database hash is a reference to another hash, the function has an inner loop to dereference the sub-hash. Assign a reference to a hash to the "MRD-300" key in the %database associative array. Call the anonymous routine by dereferencing $codeRef to print the contents of %database. This is done by surrounding the code reference variable with curly braces and prefixing it with a & to indicate that it should be dereferenced as a function. Assign the reference to the hash associated with the key "MRD-300" to the $refCustomer variable. Add "Home Phone" as a key to the hash associated with the "MRD-300" key. Add "Business Phone" as a key to the hash associated with the "MRD-300" key. Call the anonymous routine by dereferencing $codeRef to print the contents of %database. ____________________________________________________________________________________________________________________ Listing 8.8 08LST08.PL-How to Dynamically Add Members to a Data Structure $codeRef = sub { while (($key, $value) = each(%database)) { print("$key = {\n"); while (($innerKey, $innerValue) = each(%{$value})) { print("\t$innerKey => $innerValue\n"); } print("};\n\n"); } }; $database{"MRD-300"} = { "Name" => "Nathan Hale", "Address" => "999 Centennial Ave.", "Town" => "AnyTown", "State" => "AnyState", "Zip" => "12345-1234" }; # print database before dynamic changes. &{$codeRef}; $refCustomer = $database{"MRD-300"}; %{$refCustomer}->{"Home Phone"} = "(111) 511-1322"; %{$refCustomer}->{"Business Phone"} = "(111) 513-4556"; # print database after dynamic changes. &{$codeRef}; ____________________________________________________________________________________________________________________ This program displays: MRD-300 = { Town => AnyTown State => AnyState Name => Nathan Hale Zip => 12345-1234 Address => 999 Centennial Ave. }; MRD-300 = { Town => AnyTown State => AnyState Name => Nathan Hale Home Phone => (111) 511-1322 Zip => 12345-1234 Business Phone => (111) 513-4556 Address => 999 Centennial Ave. }; This example does two new things. The first thing is that it uses an anonymous function referenced by $codeRef. This is done for illustration purposes. There is no reason to use an anonymous function. There are actually good reasons for you not to do so in normal programs. I think that anonymous functions make programs much harder to understand. Note When helper functions are small and easily understood, I like to place them at the beginning of code files. This helps me to quickly refresh my memory when coming back to view program code after time spent doing other things. The second thing is that a regular hash assignment statement was used to add values. You can use any of the array functions with these nested data structures. Example: Interpolating Functions Inside Double-Quoted Strings You can use references to force Perl to interpolate the return value of a function call inside double-quoted strings. This helps to reduce the number of temporary variables needed by your program. [pseudo.gif] Call the makeLine() function from inside a double-quoted string. Define the makeLine() function. Return the dash character repeated a specified number of times. The first element in the parameter array is the number of times to repeat the dash. print("Here are 5 dashes ${\makeLine(5)}.\n"); print("Here are 10 dashes ${\makeLine(10)}.\n"); sub makeLine { return("-" x $_[0]); } This program displays: Here are 5 dashes -----. Here are 10 dashes ----------. The trick in this example is that the backslash turns the scalar return value into a reference, and then the dollar sign and curly braces turn the reference back into a scalar value that the print() function can interpret correctly. If the backslash character is not used to create the reference to the scalar return value, then the ${} dereferencing operation does not have a reference to dereference, and you will get an "initialized value" error. Summary In this chapter you learned about references. References are scalar variables used to hold the memory locations. When references are dereferenced, the actual value is returned. For example, if the value of the reference is assigned like this: $refScalar = \10, then, dereferencing $refScalar would be equal to 10 and would look like this ${$refScalar}. You always can create a reference to a value or variable by preceding it with a backslash. Dereferencing is accomplished by surrounding the reference variable in curly braces and preceding the left curly brace with a character denoting what type of reference it is. For example, use @ for arrays and & for functions. There are five types of references that you can use in Perl. You can have a reference to scalars, arrays, hashes, functions, and other references. If you need to determine what type of reference is passed to a function, use the ref() function. The ref() function returns a string that indicates which type of reference was passed to it. If the parameter was not a reference, the undefined value is returned. You discovered that it is always a good idea to check reference types to prevent errors caused by passing the wrong type of reference. An example was given that caused an error by passing a scalar reference when the function expected an array reference. A lot of time was spent discussing data records and how to access information stored in them. You learned how to step through dissecting a dereferencing expression, how to dynamically add new data records to an associative array, and how to add new data members to an existing record. The last thing covered in this chapter was how to interpolate function calls inside double-quoted strings. You'll use this technique-at times-to avoid using temporary variables when printing or concatenating the output of functions to other strings. [39]Chapter 9 "Using Files," introduces you to opening, reading, and writing files. You find out how to store the data records you've constructed in this chapter to a file for long-term storage. Review Questions Answers to Review Questions are in Appendix A. 1. What is a reference? 2. How many types of references are there? 3. What does the ref() function return if passed a non-reference as a parameter? 4. What notation is used to dereference a reference value? 5. What is an anonymous array? 6. What is a nested data structure? 7. What will the following line of code display? print("${\ref(\(1..5))}"); 8. Using the %database array in Listing 8.6, what will the following line of code display? print(%{$database{"MRD-100"}}->{"Zip"} . "\n"); Review Exercises 1. Write a program that will print the dereferenced value of $ref in the following line of code: $ref = \\\45; 2. Write a function that removes the first element from each array passed to it. The return value of the function should be the number of elements removed from all arrays. 3. Add error-checking to the function written in Exercise 3 so the undef value is returned if one of the parameters is not an array. 4. Write a program based on Listing 8.7 that adds a data member indicating which weekdays a salesman may call the customer with an id of MRD-300. Use the following as an example: "Best days to call" => ["Monday", "Thursday" ] ____________________________________________________________________________________________________________________ [40][pc.gif] [41][cc.gif] [42][hb.gif] [43][nc.gif] _________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________ Chapter 9 Using Files ____________________________________________________________________________________________________________________ CONTENTS * [28]Some Files Are Standard + [29]Example: Using STDIN + [30]Example: Using Redirection to Change STDIN and STDOUT + [31]Example: Using the Diamond Operator (<>) * [32]File Test Operators + [33]Example: Using File Tests * [34]File Functions + [35]Example: Opening Files + [36]Example: Binary Files + [37]Example: Getting File Statistics + [38]Example: Using the Directory Functions + [39]Example: Printing Revisited * [40]Globbing + [41]Example: Assigning a Glob to an Array * [42]Using Data Structures with Files + [43]Example: Splitting a Record into Fields * [44]Summary * [45]Review Questions * [46]Review Exercises ____________________________________________________________________________________________________________________ If you've read the previous chapters and have executed some of the programs, then you already know that a file is a series of bytes stored on a disk instead of inside the computer's memory. A file is good for long-term storage of information. Information in the computer's memory is lost when the computer is turned off. Information on a disk, however, is persistent. It will be there when the computer is turned back on. Back in [47]Chapter 1 "Getting Your Feet Wet," you saw how to create a file using the edit program that comes with Windows 95 and Windows NT. In this chapter, you'll see how to manipulate files with Perl. There are four basic operations that you can do with files. You can open them, read from them, write to them, and close them. Opening a file creates a connection between your program and the location on the disk where the file is stored. Closing a file shuts down that connection. Every file has a unique fully qualified name so that it can't be confused with other files. The fully qualified name includes the name of the disk, the directory, and the file name. Files in different directories can have the same name because the operating system considers the directory name to be a part of the file name. Here are some fully qualified file names: c:/windows/win95.txt c:/windows/command/scandisk.ini c:/a_long_directory_name/a_long_subdirectory_name/a_long_file_name.doc Caution You may be curious to know if spaces can be used inside file names. Yes, they can. But, if you use spaces, you need to surround the file name with quotes when referring to it from a DOS or UNIX command line. Note It is very important that you check for errors when dealing with files. To simplify the examples in this chapter, little error checking will be used in the example. Instead, error checking information will be discussed in [48]Chapter 13, "Handling Errors and Signals." Some Files Are Standard In an effort to make programs more uniform, there are three connections that always exist when your program starts. These are STDIN, STDOUT, and STDERR. Actually, these names are file handles. File handles are variables used to manipulate files. Just like you need to grab the handle of a hot pot before you can pick it up, you need a file handle before you can use a file. Table 9.1 describes the three file handles. Table 9.1 The Standard File Handles Name Description STDIN Reads program input. Typically this is the computer's keyboard. STDOUT Displays program output. This is usually the computer's monitor. STDERR Displays program errors. Most of the time, it is equivalent to STDOUT, which means the error messages will be displayed on the computer's monitor. You've been using the STDOUT file handle without knowing it for every print() statement in this book. The print() function uses STDOUT as the default if no other file handle is specified. Later in this chapter, in the "Examples: Printing Revisited" section, you will see how to send output to a file instead of to the monitor. Example: Using STDIN Reading a line of input from the standard input, STDIN, is one of the easiest things that you can do in Perl. This following three-line program will read a line from the keyboard and then display it. This will continue until you press Ctrl+Z on DOS systems or Ctrl-D on UNIX systems. ____________________________________________________________________________________________________________________ Listing 9.1 09LST01.PL-Read from Standard Input Until an End-of-File Character Is Found while () { print(); } ____________________________________________________________________________________________________________________ The <> characters, when used together, are called the diamond operator. It tells Perl to read a line of input from the file handle inside the operator. In this case, STDIN. Later, you'll use the diamond operator to read from other file handles. In this example, the diamond operator assigned the value of the input string to $_ . Then, the print() function was called with no parameters, which tells print() to use $_ as the default parameter. Using the $_ variable can save a lot of typing, but I'll let you decide which is more readable. Here is the same program without using $_. while ($inputLine = ) { print($inputLine); } When you pressed Ctrl+Z or Ctrl+D, you told Perl that the input file was finished. This caused the diamond operator to return the undefined value which Perl equates to false and caused the while loop to end. In DOS (and therefore in all of the flavors of Windows), 26-the value of Ctrl+Z-is considered to be the end-of-file indicator. As DOS reads or writes a file, it monitors the data stream and when a value of 26 is encountered the file is closed. UNIX does the same thing when a value of 4-the value of Ctrl+D-is read. Tip When a file is read using the diamond operator, the newline character that ends the line is kept as part of the input string. Frequently, you'll see the chop() function used to remove the newline. For instance, chop($inputLine = );. This statement reads a line from the input file, assigns its value to $inputLine and then removes that last character from $inputLine-which is almost guaranteed to be a newline character. If you fear that the last character is not a newline, use the chomp() function instead. Example: Using Redirection to Change STDIN and STDOUT DOS and UNIX let you change the standard input from being the keyboard to being a file by changing the command line that you use to execute Perl programs. Until now, you probably used a command line similar to: perl -w 09lst01.pl In the previous example, Perl read the keyboard to get the standard input. But, if there was a way to tell Perl to use the file 09LST01.PL as the standard input, you could have the program print itself. Pretty neat, huh? Well, it turns out that you can change the standard input. It's done this way: perl -w 09lst01.pl < 09lst01.pl The < character is used to redirect the standard input to the 09LST01.PL file. You now have a program that duplicates the functionality of the DOS type command. And it only took three lines of Perl code! You can redirect standard output to a file using the > character. So, if you wanted a copy of 09LST01.PL to be sent to OUTPUT.LOG, you could use this command line: perl -w 09lst01.pl <09lst01.pl >output.log Keep this use of the < and > characters in mind. You'll be using them again shortly when we talk about the open() function. The < character will signify that files should be opened for input and the > will be used to signify an output file. But first, let's continue talking about accessing files listed on the command line. Example: Using the Diamond Operator (<>) If no file handle is used with the diamond operator, Perl will examine the @ARGV special variable. If @ARGV has no elements, then the diamond operator will read from STDIN-either from the keyboard or from a redirected file. So, if you wanted to display the contents of more than one file, you could use the program shown in Listing 9.2. ____________________________________________________________________________________________________________________ Listing 9.2 09LST02.PL-Read from Multiple Files or from STDIN while (<>) { print(); } ____________________________________________________________________________________________________________________ The command line to run the program might look like this: perl -w 09lst02.pl 09lst01.pl 09lst02.pl And the output would be: while () { print(); } while (<>) { print(); } Perl will create the @ARGV array from the command line. Each file name on the command line-after the program name-will be added to the @ARGV array as an element. When the program runs the diamond operator starts reading from the file name in the first element of the array. When that entire file has been read, the next file is read from, and so on, until all of the elements have been used. When the last file has be finished, the while loop will end. Using the diamond operator to iterate over a list of file names is very handy. You can use it in the middle of your program by explicitly assigning a list of file names to the @ARGV array. Listing 9.3 shows what this might look like in a program. ____________________________________________________________________________________________________________________ Listing 9.3 09LST03.PL-Read from Multiple Files Using the @ARGV Array @ARGV = ("09lst01.pl", "09lst02.pl"); while (<>) { print(); } ____________________________________________________________________________________________________________________ This program displays: while () { print(); } while (<>) { print(); } Next, we will take a look at the ways that Perl lets you test files, and following that, the functions that can be used with files. File Test Operators Perl has many operators that you can use to test different aspects of a file. For example, you can use the -e operator to ensure that a file exists before deleting it. Or, you can check that a file can be written to before appending to it. By checking the feasibility of the impending file operation, you can reduce the number of errors that your program will encounter. Table 9.2 shows a complete list of the operators used to test files. Table 9.2 Perl's File Test Operators Operator Description -A OPERAND Returns the access age of OPERAND when the program started. -b OPERAND Tests if OPERAND is a block device. -B OPERAND Tests if OPERAND is a binary file. If OPERAND is a file handle, then the current buffer is examined, instead of the file itself. -c OPERAND Tests if OPERAND is a character device. -C OPERAND Returns the inode change age of OPERAND when the program started. -d OPERAND Tests if OPERAND is a directory. -e OPERAND Tests if OPERAND exists. -f OPERAND Tests if OPERAND is a regular file as opposed to a directory, symbolic link or other type of file. -g OPERAND Tests if OPERAND has the setgid bit set. -k OPERAND Tests if OPERAND has the sticky bit set. -l OPERAND Tests if OPERAND is a symbolic link. Under DOS, this operator always will return false. -M OPERAND Returns the age of OPERAND in days when the program started. -o OPERAND Tests if OPERAND is owned by the effective uid. Under DOS, it always returns true. -O OPERAND Tests if OPERAND is owned by the read uid/gid. Under DOS, it always returns true. -p OPERAND Tests if OPERAND is a named pipe. -r OPERAND Tests if OPERAND can be read from. -R OPERAND Tests if OPERAND can be read from by the real uid/gid. Under DOS, it is identical to -r. -s OPERAND Returns the size of OPERAND in bytes. Therefore, it returns true if OPERAND is non-zero. -S OPERAND Tests if OPERAND is a socket. -t OPERAND Tests if OPERAND is opened to a tty. -T OPERAND Tests if OPERAND is a text file. If OPERAND is a file handle, then the current buffer is examined, instead of the file itself. -u OPERAND Tests if OPERAND has the setuid bit set. -w OPERAND Tests if OPERAND can be written to. -W OPERAND Tests if OPERAND can be written to by the real uid/gid. Under DOS, it is identical to -w. -x OPERAND Tests if OPERAND can be executed. -X OPERAND Tests if OPERAND can be executed by the real uid/gid. Under DOS, it is identical to -x. -z OPERAND Tests if OPERAND size is zero. Note If the OPERAND is not specified in the file test, the $ variable will be used instead. The operand used by the file tests can be either a file handle or a file name. The file tests work by internally calling the operating system to determine information about the file in question. The operators will evaluate to true if the test succeeds and false if it does not. If you need to perform two or more tests on the same file, you use the special underscore (_) file handle. This tells Perl to use the file information for the last system query and saves time. However, the underscore file handle does have some caveats. It does not work with the -t operator. In addition, the lstat() function and -l test will leave the system buffer filled with information about a symbolic link, not a real file. The -T and -B file tests will examine the first block or so of the file. If more than 10 percent of the bytes are non-characters or if a null byte is encountered, then the file is considered a binary file. Binary files are normally data files, as opposed to text or human-readable files. If you need to work with binary files, be sure to use the binmode() file function, which is described in the section, "Example: Binary Files," later in this chapter. Example: Using File Tests For our first example with file tests, let's examine a list of files from the command line and determine if each is a regular file or a special file. [pseudo.gif] Start a foreach loop that looks at the command line array. Each element in the array is assigned to the default loop variable $_. Print the file name contained in $_. Print a message indicating the type of file by checking the evaluation of the -f operator. ____________________________________________________________________________________________________________________ Listing 9.4 09LST04.PL-Using the -f Operator to Find Regular Files Inside a foreach Loop foreach (@ARGV) { print; print((-f) ? " -REGULAR\n" : " -SPECIAL\n") } ____________________________________________________________________________________________________________________ When this program is run using the following command line: perl -w 09lst01.pl \perl5 perl.exe \windows the following is displayed: 09lst01.pl -REGULAR \perl5 -SPECIAL perl.exe -REGULAR \windows -SPECIAL Each of the directories listed on the command line were recognized as special files. If you want to ignore all special files in the command line, you do so like this: [pseudo.gif] Start a foreach loop that looks at the command line array. If the current file is special, then skip it and go on to the next iteration of the foreach loop. Print the current file name that is contained in $_. Print a message indicating the type of file. ____________________________________________________________________________________________________________________ Listing 9.5 09LST05.PL-Using the -f Operator to Find Regular Files Inside a foreach Loop foreach (@ARGV) { next unless -f; # ignore all non-normal files. print; print((-f) ? " -REGULAR\n" : " -SPECIAL\n") } ____________________________________________________________________________________________________________________ When this program is run using the following command line: perl -w 09lst01.pl \perl perl.exe \windows the following is displayed: 09lst01.pl -REGULAR perl.exe -REGULAR Notice that only the regular file names are displayed. The two directories on the command line were ignored. As mentioned above, you can use the underscore file handle to make two tests in a row on the same file so that your program can execute faster and use less system resources. This could be important if your application is time critical or makes many repeated tests on a large number of files. [pseudo.gif] Start a foreach loop that looks at the command line array. If the current file is special, then skip it and go on to the next iteration of the foreach loop. Determine the number of bytes in the file with the -s operator using the underscore file handle so that a second operating system call is not needed. Print a message indicating the name and size of the file. ____________________________________________________________________________________________________________________ Listing 9.6 09LST06.PL-Finding the Size in Bytes of Regular Files Listed on the Command Line foreach (@ARGV) { next unless -f; $fileSize = -s _; print("$_ is $fileSize bytes long.\n"); } ____________________________________________________________________________________________________________________ When this program is run using the following command line: perl -w 09lst06.pl \perl5 09lst01.pl \windows perl.exe the following is displayed: 09lst01.pl is 36 bytes long. perl.exe is 61952 bytes long. Tip Don't get the underscore file handle confused with the $_ special variable. The underscore file handle tells Perl to use the file information from the last system call and the $ variable is used as the default parameter for a variety of functions. File Functions Table 9.3 Perl's File Functions Function Description binmode(FILE_HANDLE) This function puts FILE_HANDLE into a binary mode. For more information, see the section, "Example: Binary Files," later in this chapter. chdir(DIR_NAME) Causes your program to use DIR_NAME as the current directory. It will return true if the change was successful, false if not. chmod(MODE, FILE_LIST) This UNIX-based function changes the permissions for a list of files. A count of the number of files whose permissions was changed is returned. There is no DOS equivalent for this function. chown(UID, GID, FILE_LIST) This UNIX-based function changes the owner and group for a list of files. A count of the number of files whose ownership was changed is returned. There is no DOS equivalent for this function. close(FILE_HANDLE) Closes the connection between your program and the file opened with FILE_HANDLE. closedir(DIR_HANDLE) Closes the connection between your program and the directory opened with DIR_HANDLE. eof(FILE_HANDLE) Returns true if the next read on FILE_HANDLE will result in hitting the end of the file or if the file is not open. If FILE_HANDLE is not specified the status of the last file read is returned. All input functions return the undefined value when the end of file is reached, so you'll almost never need to use eof(). fcntl(FILE_HANDLE, Implements the fcntl() function which lets FUncTION, SCALAR) you perform various file control operations. Its use is beyond the scope of this book. fileno(FILE_HANDLE) Returns the file descriptor for the specified FILE_HANDLE. flock(FILEHANDLE, OPERATION) This function will place a lock on a file so that multiple users or programs can't simultaneously use it. The flock() function is beyond the scope of this book. getc(FILE_HANDLE) Reads the next character from FILE_HANDLE. If FILE_HANDLE is not specified, a character will be read from STDIN. glob(EXPRESSION) Returns a list of files that match the specification of EXPRESSION, which can contain wildcards. For instance, glob("*.pl") will return a list of all Perl program files in the current directory. ioctl(FILE_HANDLE, Implements the ioctl() function which lets FUncTION, SCALAR) you perform various file control operations. Its use is beyond the scope of this book. For more in-depth discussion of this function see Que's Special Edition Using Perl for Web Programming. link(OLD_FILE_NAME, This UNIX-based function creates a new NEW_FILE_NAME) file name that is linked to the old file name. It returns true for success and false for failure. There is no DOS equivalent for this function. lstat(FILE_HANDLE_OR_ Returns file statistics in a 13-element array. FILE_NAME) lstat() is identical to stat() except that it can also return information about symbolic links. See the section,"Example: Getting File Statistics," for more information. mkdir(DIR_NAME, MODE) Creates a directory named DIR_NAME. If you try to create a subdirectory, the parent must already exist. This function returns false if the directory can't be created. The special variable $! is assigned the error message. open(FILE_HANDLE, EXPRESSION) Creates a link between FILE_HANDLE and a file specified by EXPRESSION. See the section, "Example: Opening a File," for more information. opendir(DIR_HANDLE, DIR_NAME) Creates a link between DIR_HANDLE and the directory specified by DIR_NAME. opendir() returns true if successful, false otherwise. pipe(READ_HANDLE, Opens a pair of connected pipes like the WRITE_HANDLE) corresponding system call. Its use is beyond the scope of this book. For more on this function see Que's Special Edition Using Perl for Web Programming. print FILE_HANDLE (LIST) Sends a list of strings to FILE_HANDLE. If FILE_HANDLE is not specified, then STDOUT is used. See the section, "Example: Printing Revisited," for more information. printf FILE_HANDLE Sends a list of strings in a format specified by (FORMAT, LIST) FORMAT to FILE_HANDLE. If FILE_HANDLE is not specified, then STDOUT is used. See the section, "Example: Printing Revisited," for more information. read(FILE_HANDLE, BUFFER, Reads bytes from FILE_HANDLE starting at LENGTH,LENGTH OFFSET) OFFSET position in the file into the scalar variable called BUFFER. It returns the number of bytes read or the undefined value. readdir(DIR_HANDLE) Returns the next directory entry from DIR_HANDLE when used in a scalar context. If used in an array context, all of the file entries in DIR_HANDLE will be returned in a list. If there are no more entries to return, the undefined value or a null list will be returned depending on the context. readlink(EXPRESSION) This UNIX-based function returns that value of a symbolic link. If an error occurs, the undefined value is returned and the special variable $! is assigned the error message. The $_ special variable is used if EXPRESSION is not specified. rename(OLD_FILE_NAME, Changes the name of a file. You can use this NEW_FILE_NAME) function to change the directory where a file resides, but not the disk drive or volume. rewinddir(DIR_HANDLE) Resets DIR_HANDLE so that the next readdir() starts at the beginning of the directory. rmdir(DIR_NAME) Deletes an empty directory. If the directory can be deleted it returns false and $! is assigned the error message. The $ special variable is used if DIR_NAME is not specified. seek(FILE_HANDLE, POSITION, Moves to POSITION in the file connected to WHEncE) FILE_HANDLE. The WHEncE parameter determines if POSITION is an offset from the beginning of the file (WHEncE=0), the current position in the file (WHEncE=1), or the end of the file (WHEncE=2). seekdir(DIR_HANDLE, POSITION) Sets the current position for readdir(). POSITION must be a value returned by the telldir() function. select(FILE_HANDLE) Sets the default FILE_HANDLE for the write() and print() functions. It returns the currently selected file handle so that you may restore it if needed. You can see the section, "Example: Printing Revisited," to see this function in action. sprintf(FORMAT, LIST) Returns a string whose format is specified by FORMAT. stat(FILE_HANDLE_OR_ Returns file statistics in a 13-element array. FILE_NAME) See the section, "Example: Getting File Statistics," for more information. symlink(OLD_FILE_NAME, This UNIX-based function creates a new NEW_FILE_NAME) file name symbolically linked to the old file name. It returns false if the NEW_FILE_NAME cannot be created. sysread(FILE_HANDLE, BUFFER, Reads LENGTH bytes from FILE_HANDLE starting LENGTH, OFFSET) at OFFSET position in the file into the scalar variable called BUFFER. It returns the number of bytes read or the undefined value. syswrite(FILE_HANDLE, BUFFER, Writes LENGTH bytes from FILE_HANDLE starting LENGTH, OFFSET) at OFFSET position in the file into the scalar variable called BUFFER. It returns the number of bytes written or the undefined value. tell(FILE_HANDLE) Returns the current file position for FILE_HANDLE. If FILE_HANDLE is not specified, the file position for the last file read is returned. telldir(DIR_HANDLE) Returns the current position for DIR_HANDLE. The return value may be passed to seekdir() to access a particular location in a directory. truncate(FILE_HANDLE, LENGTH) Truncates the file opened on FILE_HANDLE to be LENGTH bytes long. unlink(FILE_LIST) Deletes a list of files. If FILE_LIST is not specified, then $ will be used. It returns the number of files successfully deleted. Therefore, it returns false or 0 if no files were deleted. utime(FILE_LIST) This UNIX-based function changes the access and modification times on each file in FILE_LIST. write(FILE_HANDLE) Writes a formatted record to FILE_HANDLE. See [49]Chapter 11, "Creating Reports," for more information. Note The UNIX-based functions will be discussed further in [50]Chapter 18, "Using Internet Protocols." UNIX-based implementations of Perl have several database functions available to them. For example, dbmopen() and dbmclose(). These functions are beyond the scope of this book. Example: Opening Files The open() function is used to open a file and create a connection to it called a file handle. The basic open() function call looks like this: open(FILE_HANDLE); The FILE_HANDLE parameter in this version of open() is the name for the new file handle. It is also the name of the scalar variable that holds the file name that you would like to open for input. For example: [pseudo.gif] Assign the file name, FIXED.DAT, to the $INPUT_FILE variable. All capital letters are used for the variable name to indicate that it is also the name of the file handle. Open the file for reading. Read the entire file into @array. Each line of the file becomes a single element of the array. Close the file. Use a foreach loop to look at each element of @array. Print $_, the loop variable, which contains one of the elements of @array. ____________________________________________________________________________________________________________________ Listing 9.7 09LST07.PL-How to Open a File for Input $INPUT_FILE = "fixed.dat"; open(INPUT_FILE); @array = ; close(INPUT_FILE); foreach (@array) { print(); } ____________________________________________________________________________________________________________________ This program displays: 1212Jan Jaspree Painter 3453Kelly Horton Jockey It is considered good programming practice to close any connections that are made with the open() function as soon as possible. While not strictly needed, it does ensure that all temporary buffers and caches are written to the hard disk in case of a power failure or other catastrophic failure. Note DOS-and by extension, Windows-limits the number of files that you can have open at any given time. Typically, you can have from 20 to 50 files open. Normally, this is plenty. If you need to open more files, please see your DOS documentation. The open() function has many variations to let you access files in different ways. Table 9.4 shows all of the different methods used to open a file. Table 9.4 The Different Ways to Open a File Open Statement Description open(FILE_HANDLE); Opens the file named in $FILE_HANDLE and connect to it using FILE_HANDLE as the file handle. The file will be opened for input only. open(FILE_HANDLE, FILENAME.EXT); Opens the file called FILENAME.EXT for input using FILE_HANDLE as the file handle. open(FILE_HANDLE, +FILENAME.EXT); Opens FILENAME.EXT for output using FILE_HANDLE as the file handle. open(FILE_HANDLE, -); Opens standard input. open(FILE_HANDLE, >-); Opens standard output. open(FILE_HANDLE, >>FILENAME.EXT); Opens FILENAME.EXT for appending using FILE_HANDLE as the file handle. open(FILE_HANDLE, +FILENAME.EXT); Opens FILENAME.EXT for both input and output using FILE_HANDLE as the file handle. open(FILE_HANDLE, +>>FILENAME.EXT); Opens FILENAME.EXT for both input and output using FILE_HANDLE as the file handle. open(FILE_HANDLE, | PROGRAM) Sends the output printed to FILE_HANDLE to another program. open(FILE_HANDLE, PROGRAM |) Reads the output from another program using FILE_HANDLE. Note I am currently researching the differences between +<, +>, and +>>. The research should be available by 12/1/97 as a link from http:\\www.mtolive.com\pbe\index.html. For information about handling failures while opening files, see [51]Chapter 13, "Handling Errors and Signals." By prefixing the file name with a > character you open the file for output. This next example opens a file that will hold a log of messages. [pseudo.gif] Call the open() function to open the MESSAGE.LOG file for writing with LOGFILE as the file handle. If the open was successful, a true value will be returned and the statement block will be executed. Send the first message to the MESSAGE.LOG file using the print() function. Notice that an alternate method is being used to call print(). Send the second message to the MESSAGE.LOG file. Close the file. if (open(LOGFILE, ">message.log")) { print LOGFILE ("This is message number 1.\n"); print LOGFILE ("This is message number 2.\n"); close(LOGFILE); } This program displays nothing. Instead, the output from the print() function is sent directly to the MESSAGE.LOG file using the connection established by the open() function. In this example, the print() function uses the first parameter as a file handle and the second parameter as a list of things to print. You can find more information about printing in the section, "Example: Printing Revisited," later in this chapter. If you needed to add something to the end of the MESSAGE.LOG file, you use >> as the file name prefix when opening the file. For example: [pseudo.gif] Call the open() function to open the MESSAGE.LOG file for appending with LOGFILE as the file handle. If the file does not exist, it will be created; otherwise, anything printed to LOGFILE will be added to the end of the file. Send a message to the MESSAGE.LOG file. Send a message to the MESSAGE.LOG file. Close the file. if (open(LOGFILE, ">>message.log")) { print LOGFILE ("This is message number 3.\n"); print LOGFILE ("This is message number 4.\n"); close(LOGFILE); } Now, when MESSAGE.LOG is viewed, it contains the following lines: This is message number 1. This is message number 2. This is message number 3. This is message number 4. Example: Binary Files When you need to work with data files, you will need to know what binary mode is. There are two major differences between binary mode and text mode: * In DOS and Windows, line endings are indicated by two characters-the newline and carriage return characters. When in text mode, these characters are input as a single character, the newline character. In binary mode, both characters can be read by your program. UNIX systems only use one character, the newline, to indicate line endings. * In DOS and Windows, the end of file character is 26. When a byte with this value is read in text mode, the file is considered ended and your program cannot read any more information from the file. UNIX considers the end-of-file character to be 4. For both operating systems, binary mode will let the end-of-file character be treated as a regular character. Note The examples in this section relate to the DOS operating system. In order to demonstrate these differences, we'll use a data file called BINARY.DAT with the following contents: 01 02 03 First, we'll read the file in the default text mode. [pseudo.gif] Initialize a buffer variable. Both read() and sysread() need their buffer variables to be initialized before the function call is executed. Open the BINARY.DAT file for reading. Read the first 20 characters of the file using the read() function. Close the file. Create an array out of the characters in the $buffer variable and iterate over that array using a foreach loop. Print the value of the current array element in hexadecimal format. Print a newline character. The current array element is a newline character. ____________________________________________________________________________________________________________________ Listing 9.8 09LST08.PL-Reading a File to Show Text Mode Line Endings $buffer = ""; open(FILE, ">binary.dat"); read(FILE, $buffer, 20, 0); close(FILE); foreach (split(//, $buffer)) { printf("%02x ", ord($_)); print "\n" if $_ eq "\n"; } ____________________________________________________________________________________________________________________ This program displays: 30 31 0a 30 32 0a 30 33 0a This example does a couple of things that haven't been seen yet in this book. The Read() function is used as an alternative to the line-by-line input done with the diamond operator. It will read a specified number of bytes from the input file and assign them to a buffer variable. The fourth parameter specifies an offset at which to start reading. In this example, we started at the beginning of the file. The split() function in the foreach loop breaks a string into pieces and places those pieces into an array. The double slashes indicate that each character in the string should be an element of the new array. For more information about the split() function, see [52]Chapter 5 "Functions," and [53]Chapter 10, "Regular Expressions." Once the array of characters has been created, the foreach loop iterates over the array. The printf() statement converts the ordinal value of the character into hexadecimal before displaying it. The ordinal value of a character is the value of the ASCII representation of the character. For example, the ordinal value of '0' is 0x30 or 48. The next line, the print statement, forces the output onto a new line if the current character is a newline character. This was done simply to make the output display look a little like the input file. For more information about the printf() function, see the section, "Example: Printing Revisited," later in this chapter. Now, let's read the file in binary mode and see how the output is changed. [pseudo.gif] Initialize a buffer variable. Open the BINARY.DAT file for reading. Change the mode to binary. Read the first 20 characters of the file using the read() function. Close the file. Create an array out of the characters in the $buffer variable and iterate over that array using a foreach loop. Print the value of the current array element in hexadecimal format. Print a newline character. The current array element is a newline character. ____________________________________________________________________________________________________________________ Listing 9.9 09LST09.PL-Reading a File to Show Binary Mode Line Endings $buffer = ""; open(FILE, "03 Since the end-of-file character is a non-printing character, it can't be shown directly. In the spot above is really the value 26. Here is the program that you saw previously read the BINARY.DAT file, only this time, it will read EOF.DAT. [pseudo.gif] Initialize a buffer variable. Open the BINARY.DAT file for reading. Read the first 20 characters of the file using the read() function. Close the file. Create an array of out of the characters in the $buffer variable and iterate over that array using a foreach loop. Print the value of the current array element in hexadecimal format. Print a newline character. The current array element is a newline character. ____________________________________________________________________________________________________________________ Listing 9.10 09LST10.PL-Reading a File to Show the Text Mode End-of-File Character $buffer = ""; open(FILE, "; Unfortunately, there is no similar way to read an entire file into a hash. But, it's still pretty easy to do. The following example will use the line number as the hash key for each line of a file. [pseudo.gif] Open the FIXED.DAT file for reading. For each line of FIXED.DAT create a hash element using the record number special variable ($.) as the key and the line of input ($_) as the value. Close the file. Iterate over the keys of the hash. Print each key, value pair. ____________________________________________________________________________________________________________________ Listing 9.12 09LST12.PL-Reading a Fixed Length Record with Fixed Length Fields into a Hash open(FILE, ") { $hash{$.} = $_; } close(FILE); foreach (keys %hash) { print("$_: $hash{$_}"); } ____________________________________________________________________________________________________________________ This program displays: 1: 1212Jan Jaspree Painter 2: 3453Kelly Horton Jockey Example: Getting File Statistics The file test operators can tell you a lot about a file, but sometimes you need more. In those cases, you use the stat() or lstat() function. The stat() returns file information in a 13-element array. You can pass either a file handle or a file name as the parameter. If the file can't be found or another error occurs, the null list is returned. Listing 9.13 shows how to use the stat() function to find out information about the EOF.DAT file used earlier in the chapter. [pseudo.gif] Assign the return list from the stat() function to 13 scalar variables. Print the scalar values. ____________________________________________________________________________________________________________________ Listing 9.13 09LST13.PL-Using the stat() Function ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mtime, $ctime, $blksize, $blocks) = stat("eof.dat"); print("dev = $dev\n"); print("ino = $ino\n"); print("mode = $mode\n"); print("nlink = $nlink\n"); print("uid = $uid\n"); print("gid = $gid\n"); print("rdev = $rdev\n"); print("size = $size\n"); print("atime = $atime\n"); print("mtime = $mtime\n"); print("ctime = $ctime\n"); print("blksize = $blksize\n"); print("blocks = $blocks\n"); ____________________________________________________________________________________________________________________ In the DOS environment, this program displays: dev = 2 ino = 0 mode = 33206 nlink = 1 uid = 0 gid = 0 rdev = 2 size = 13 atime = 833137200 mtime = 833195316 ctime = 833194411 blksize = blocks = Some of this information is specific to the UNIX environment and is beyond the scope of this book. For more information on this topic, see Que's 1994 edition of Using Unix. One interesting piece of information is the $mtime value-the date and time of the last modification made to the file. You can interpret this value by using the following line of code: ($sec, $min, $hr, $day, $month, $year, $day_Of_Week, $julianDate, $dst) = localtime($mtime); If you are only interested in the modification date, you can use the array slice notation to just grab that value from the 13-element array returned by stat(). For example: $mtime = (stat("eof.dat"))[9]; Notice that the stat() function is surrounded by parentheses so that the return value is evaluated in an array context. Then the tenth element is assigned to $mtime. You can use this technique whenever a function returns a list. Example: Using the Directory Functions Perl has several functions that let you work with directories. You can make a directory with the mkdir() function. You can delete a directory with the rmdir() function. Switching from the current directory to another is done using the chdir() function. Finding out which files are in a directory is done with the opendir(), readdir(), and closedir() functions. The next example will show you how to create a list of all Perl programs in the current directory-well, at least those files that end with the pl extension. [pseudo.gif] Open the current directory using DIR as the directory handle. Read a list of file names using the readdir() function; extract only those that end in pl; and the sorted list. The sorted list is assigned to the @files array variable. Close the directory. Print the file names from the @files array unless the file is a directory. ____________________________________________________________________________________________________________________ Listing 9.14 09LST14.PL-Print All Files in the Current Directory Whose Name Ends in PL opendir(DIR, "."); @files = sort(grep(/pl$/, readdir(DIR))); closedir(DIR); foreach (@files) { print("$_\n") unless -d; } ____________________________________________________________________________________________________________________ For more information about the grep() function, see [54]Chapter 10, "Regular Expressions." This program will display each file name that ends in pl on a separate line. If you need to know the number of Perl programs, evaluate the @files array in a scalar context. For example: $num_Perl_Programs = @files; Tip For this example, I modified the naming convention used for the variables. I feel that $num_Perl_Programs is easier to read than $numPerlPrograms. No naming convention should be inflexible. Use it as a guideline and break the rules when it seems wise. Example: Printing Revisited We've been using the print() function throughout this book without really looking at how it works. Let's remedy that now. The print() function is used to send output to a file handle. Most of the time, we've been using STDOUT as the file handle. Because STDOUT is the default, we did not need to specify it. The syntax for the print() function is: print FILE_HANDLE (LIST) You can see from the syntax that print() is a list operator because it's looking for a list of values to print. If you don't specify a list, then $ will be used. You can change the default file handle by using the select() function. Let's take a look at this: [pseudo.gif] Open TESTFILE.DAT for output. Change the default file handle for write and print statements. Notice that the old default handle is returned and saved in the $oldHandle variable. This line prints to the default handle which now the TESTFILE.DAT file. Change the default file handle back to STDOUT. This line prints to STDOUT. open(OUTPUT_FILE, ">testfile.dat"); $oldHandle = select(OUTPUT_FILE); print("This is line 1.\n"); select($oldHandle); print("This is line 2.\n"); This program displays: This is line 2. and creates the TESTFILE.DAT file with a single line in it: This is line 1. Perl also has the printf() function which lets you be more precise in how things are printed out. The syntax for printf() looks like this: printf FILE_HANDLE (FORMAT_STRING, LIST) Like print(), the default file handle is STDOUT. The FORMAT_STRING parameter controls what is printed and how it looks. For simple cases, the formatting parameter looks identical to the list that is passed to printf(). For example: [pseudo.gif] Create two variables to hold costs for January and February. Print the cost variables using variable interpolation. Notice that the dollar sign needs to be preceded by the backslash to avoid interpolation that you don't want. $januaryCost = 123.34; $februaryCost = 23345.45; printf("January = \$$januaryCost\n"); printf("February = \$$februaryCost\n"); This program displays: January = $123.34 February = $23345.45 In this example, only one parameter is passed to the printf() function-the formatting string. Because the formatting string is enclosed in double quotes, variable interpolation will take place just like for the print() function. This display is not good enough for a report because the decimal points of the numbers do not line up. You can use the formatting specifiers shown in Table 9.5 together with the modifiers shown in Table 9.6 to solve this problem. Table 9.5 Format Specifiers for the printf() Function Specifier Description c Indicates that a single character should be printed. s Indicates that a string should be printed. d Indicates that a decimal number should be printed. u Indicates that an unsigned decimal number should be printed. x Indicates that a hexadecimal number should be printed. o Indicates that an octal number should be printed. e Indicates that a floating point number should be printed in scientific notation. f Indicates that a floating point number should be printed. g Indicates that a floating point number should be printed using the most space-spacing format, either e or f. Table 9.6 Format Modifiers for the printf() Function Modifier Description - Indicates that the value should be printed left-justified. # Forces octal numbers to be printed with a leading zero. Hexadecimal numbers will be printed with a leading 0x. + Forces signed numbers to be printed with a leading + or - sign. 0 Pads the displayed number with zeros instead of spaces. . Forces the value to be at least a certain width. For example, %10.3f means that the value will be at least 10 positions wide. And because f is used for floating point, at most 3 positions to the right of the decimal point will be displayed. %.10s will print a string at most 10 characters long. [pseudo.gif] Create two variables to hold costs for January and February. Print the cost variables using format specifiers. $januaryCost = 123.34; $februaryCost = 23345.45; printf("January = \$%8.2f\n", $januaryCost); printf("February = \$%8.2f\n", $februaryCost); This program displays: January = $ 123.34 February = $23345.45 This example uses the f format specifier to print a floating point number. The numbers are printed right next to the dollar sign because $februaryCost is 8 positions width. If you did not know the width of the numbers that you need to print in advance, you could use the following technique. [pseudo.gif] Create two variables to hold costs for January and February. Find the length of the largest number. Print the cost variables using variable interpolation to determine the width of the numbers to print. Define the max() function. You can look in the "Example: Foreach Loops" of [55]Chapter 7 "Control Statements," for more information about the max() function. ____________________________________________________________________________________________________________________ Listing 9.15 09LST15.PL-Using Variable Interpolation to Align Numbers When Printing $januaryCost = 123.34; $februaryCost = 23345.45; $maxLength = length(max($januaryCost, $februaryCost)); printf("January = \$%$maxLength.2f\n", $januaryCost); printf("February = \$%$maxLength.2f\n", $februaryCost); sub max { my($max) = shift(@_); foreach $temp (@_) { $max = $temp if $temp > $max; } return($max); } ____________________________________________________________________________________________________________________ This program displays: January = $ 123.34 February = $23345.45 While taking the time to find the longest number is more work, I think you'll agree that the result is worth it. Tip In the next chapter, "Regular Expressions," you see how to add commas when printing numbers for even more readability when printing numbers. So far, we've only looked at printing numbers. You also can use printf() to control printing strings. Like the printing of numbers above, printf() is best used for controlling the alignment and length of strings. Here is an example: [pseudo.gif] Assign "John O'Mally" to $name. Print using format specifiers to make the value 10 characters wide but only print the first 5 characters from the string. $name = "John O'Mally"; printf("The name is %10.5s.\n", $name); This program displays: The name is John. The left side of the period modifier controls the width of the printed value also called the print field. If the length of the string to be printed is less than the width of the print field, then the string is right justified and padded with spaces. You can left-justify the string by using the dash modifier. For example: [pseudo.gif] Assign "John O'Mally" to $name. Print using format specifiers to left-justify the value. $name = "John O'Mally"; printf("The name is %-10.5s.\n", $name); This program displays: The name is John . The period way off to the right shows that the string was left-justified and padded with spaces until it was 10 positions wide. Globbing Perl supports a feature called globbing which lets you use wildcard characters to find file names. A wildcard character is like the wild card in poker. It can have more than one meaning. Let's look at some of the simpler examples. Example: Assigning a Glob to an Array One common chore for computer administrators is the removal of backup files. You can use the globbing technique with the unlink() function to perform this chore. unlink(<*.bak>); The file specification, *.bak, is placed between the diamond operator and when evaluated returns a list of files that match the specification. An asterisk means zero or more of any character will be matched. So this unlink() call will delete all files with a BAK extension. You can use the following: To get a list of all files that start with the letter f. @array = ; The next chapter, "Regular Expressions," will show you more ways to specify file names. Most of the meta-characters used in [56]Chapter 10 can be used inside globs. Using Data Structures with Files In the last chapter, you saw how to create complex data structures. Creating a program to read and write those structures is beyond the scope of this book. However, the following examples will show you how to use simpler data structures. The same techniques can be applied to the more complicated data structures as well. Example: Splitting a Record into Fields This example will show you how to read a file line-by-line and break the input records into fields based on a separator string. The file, FIELDS.DAT, will be used with the following contents: 1212:Jan:Jaspree:Painter 3453:Kelly:Horton:Jockey The individual fields or values are separated from each other by the colon (:) character. The split() function will be used to create an array of fields. Then a foreach loop will print the fields. Listing 9.16 shows how to input lines from a file and split them into fields. [pseudo.gif] Use the qw() notation to create an array of words. Open the FIELDS.DAT file for input. Loop while there are lines to read in the file. Use the split function to create an array of fields, using the colon as the field separator. The scalar value of @fieldList is passed to split to indicate how many fields to expect. Each element in the new array is then added to the %data hash with a key of the field name. Loop through @fieldList array. Print each element and its value in the %data hash. ____________________________________________________________________________________________________________________ Listing 9.16 09LST16.PL-Reading Records from Standard Input @fieldList = qw(fName lName job age); open(FILE, ") { @data{@fieldList} = split(/:/, $_, scalar @fieldList); foreach (@fieldList) { printf("%10.10s = %s\n", $_, $data{$_}); } } close(FILE); ____________________________________________________________________________________________________________________ This program will display: fName = 1212 lName = Jan job = Jaspree age = Painter fName = 3453 lName = Kelly job = Horton age = Jockey The first line of this program uses the qw() notation to create an array of words. It is identical to @fieldList = ("fName", "lName", "job", "age"); but without the distracting quotes and commas. The split statement might require a little explanation. It is duplicated here so that you can focus on it. @data{@fieldList} = split(/:/, $_, scalar @fieldList); Let's use the first line of the input file as an example. The first line looks like this: 1212:Jan:Jaspree:Painter The first thing that happens is that split creates an array using the colon as the separator, creating an array that looks like this: ("1212", "Jan", "Jaspree", "Painter") You can substitute this list in place of the split() function in the statement. @data{@fieldList} = ("1212", "Jan", "Jaspree", "Painter"); And, you already know that @fieldList is a list of field name. So, the statement can be further simplified to: @data{"fName", "lName", "job", "age"} = ("1212", "Jan", "Jaspree", "Painter"); This assignment statement shows that each array element on the right is paired with a key value on the left so that four separate hash assignments are taking place in this statement. Summary This was a rather long chapter, and we've really only talked about the basics of using files. You have enough information now to explore the rest of the file functions. You also could create functions to read more complicated data structures with what you've learned so far. Let's review what you know about files. You read that files are a series of bytes stored somewhere outside the computer's memory. Most of the time, a file will be on a hard disk in a directory. But, the file also could be on a floppy disk or on a networked computer. The physical location is not important as long as you know the fully qualified file name. This name will include any computer name, drive name, and directory name that is needed to uniquely identify the file. There are three files-actually file handles-that always are opened before your program starts. These are STDIN, STDOUT, and STDERR. The STDIN file handle is used to connect to the standard input, usually the keyboard. You can use the < character to override the standard input on the command line so that input comes from a file instead of the keyboard. The STDOUT file handle is used to connect to the standard output, usually the monitor. The > character is used to override the standard output. And finally, the STDERR file handle is used when you want to output error messages. STDERR usually points to the computer's monitor. The diamond operator (<>) is used to read an entire line of text from a file. It stops reading when the end of line character-the newline-character is read. The returned string always includes the newline character. If no file handle is used with the diamond operator, it will attempt to read from files listed in the @ARGV array. If that array is empty, it will read from STDIN. Next, you read about Perl's file test operators. There are way too many to recap here, but some of the more useful ones are the -d used to test for a directory name, -e used to see if a file exists, and -w to see if a file can be written to. The special file handle, _, can be used to prevent Perl from making a second system call if you need to make two tests on the same file one right after another. A table of file functions (refer to Table 9.3) was shown which shows many functions that deal with opening files, reading and writing information, and closing files. Some functions were specific to UNIX, although not many. You learned how to open a file and that files can be opened for input, for output, or for appending. When you read a file, you can use text mode (the default) or binary mode. In binary mode on DOS systems, line endings are read as two characters-the line feed and the carriage return. On both DOS and UNIX systems, binary mode lets you read the end of file character as regular characters with no special meaning. Reading file information directly from the directory was shown to be very easy by using the opendir(), readdir(), and closedir() functions. An example was given that showed how to find all files with an extension of PL by using the grep() function in conjunction with readdir(). Then, we looked closely at the print() and printf() functions. Both can be used to send output to a file handle. The select() function was used to change the default handle from STDOUT to another file. In addition, some examples were given of the formatting options available with the printf() function. The topic of globbing was briefly touched on. Globs let you specify a file name using wildcards. A list of file names is returned that can be processed like any other array. And finally, you read about how to split a record into fields based on a separator character. This chapter covered a lot of ground. And some of the examples did not relate to each other. Instead, I tried to give you a feel for the many ways that files can be used. An entire book can be written on the different ways to use files. But, you now know enough to create any kind of file that you might need. [57]Chapter 10, "Regular Expressions," will cover this difficult topic. In fact, Perl's regular expressions are one of the main reasons to learn the language. Few other languages will give you equivalent functionality. Review Questions Answers to Review Questions are in Appendix A. 1. What is a file handle? 2. What is binary mode? 3. What is a fully qualified file name? 4. Are variables in the computer's memory considered persistent storage? 5. What is the <> operator used for? 6. What is the default file handle for the printf() function? 7. What is the difference between the following two open statements? open(FILE_ONE, ">FILE_ONE.DAT"); open(FILE_TWO, ">>FILE_TWO.DAT"); 8. What value will the following expression return? (stat("09lst01.pl"))[7]; 9. What is globbing? 10. What will the following statement display? printf("%x", 16); Review Exercises 1. Write a program to open a file and display each line along with its line number. 2. Write a program that prints to four files at once. 3. Write a program that gets the file statistics for PERL.EXE and displays its size in bytes. 4. Write a program that uses the sysread() function. The program should first test the file for existence and determine the file size. Then the file size should be passed to the sysread() function as one of its parameters. 5. Write a program that reads from the file handle in the following line of code. Read all of the input into an array and then sort and print the array. open(FILE, "dir *.pl |"); 6. Using the binary mode, write a program that reads the PERL.EXE and print any characters that are greater than or equal to "A" and less than or equal to "Z." 7. Write a program that reads a file with two fields. The first field is a customer ID and the second field is the customer name. Use the ! character as a separator between the fields. Store the information into a hash with the customer id as the key and the customer name as the value. 8. Write a program that reads a file into a array, then displays 20 lines at time. ____________________________________________________________________________________________________________________ [58][pc.gif] [59][cc.gif] [60][hb.gif] [61][nc.gif] _________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________ Chapter 10 Regular Expressions ____________________________________________________________________________________________________________________ CONTENTS * [27]Pattern Delimiters * [28]The Matching Operator (m//) + [29]The Matching Options * [30]The Substitution Operator (s///) + [31]The Substitution Options * [32]The Translation Operator (tr///) + [33]The Translation Options * [34]The Binding Operators (=~ and !~) * [35]How to Create Patterns + [36]Example: Character Classes + [37]Example: Quantifiers + [38]Example: Pattern Memory + [39]Example: Pattern Precedence + [40]Example: Extension Syntax * [41]Pattern Examples + [42]Example: Using the Match Operator + [43]Example: Using the Substitution Operator + [44]Example: Using the Translation Operator + [45]Example: Using the Split() Function * [46]Summary * [47]Review Questions * [48]Review Exercises ____________________________________________________________________________________________________________________ You can use a regular expression to find patterns in strings: for example, to look for a specific name in a phone list or all of the names that start with the letter a. Pattern matching is one of Perl's most powerful and probably least understood features. But after you read this chapter, you'll be able to handle regular expressions almost as well as a Perl guru. With a little practice, you'll be able to do some incredibly handy things. There are three main uses for regular expressions in Perl: matching, substitution, and translation. The matching operation uses the m// operator, which evaluates to a true or false value. The substitution operation substitutes one expression for another; it uses the s// operator. The translation operation translates one set of characters to another and uses the tr// operator. These operators are summarized in Table 10.1. Table 10.1 Perl's Regular Expression Operators Operator Description m/PATTERN/ This operator returns true if PATTERN is found in $_. s/PATTERN/ This operator replaces the sub-string matched by REPLACEMENT/ PATTERN with REPLACEMENT. tr/CHARACTERS/ This operator replaces characters specified by REPLACEMENTS/ CHARACTERS with the characters in REPLACEMENTS. All three regular expression operators work with $_ as the string to search. You can use the binding operators (see the section "The Binding Operators" later in this section) to search a variable other than $_. Both the matching (m//) and the substitution (s///) operators perform variable interpolation on the PATTERN and REPLACEMENT strings. This comes in handy if you need to read the pattern from the keyboard or a file. If the match pattern evaluates to the empty string, the last valid pattern is used. So, if you see a statement like print if //; in a Perl program, look for the previous regular expression operator to see what the pattern really is. The substitution operator also uses this interpretation of the empty pattern. In this chapter, you learn about pattern delimiters and then about each type of regular expression operator. After that, you learn how to create patterns in the section "How to Create Patterns." Then, the "Pattern Examples" section shows you some situations and how regular expressions can be used to resolve the situations. Pattern Delimiters Every regular expression operator allows the use of alternative pattern delimiters. A delimiter marks the beginning and end of a given pattern. In the following statement, m//; you see two of the standard delimiters-the slashes (//). However, you can use any character as the delimiter. This feature is useful if you want to use the slash character inside your pattern. For instance, to match a file you would normally use: m/\/root\/home\/random.dat/ This match statement is hard to read because all of the slashes seem to run together (some programmers say they look like teepees). If you use an alternate delimiter, if might look like this: m!/root/home/random.dat! or m{/root/home/random.dat} You can see that these examples are a little clearer. The last example also shows that if a left bracket is used as the starting delimiter, then the ending delimiter must be the right bracket. Both the match and substitution operators let you use variable interpolation. You can take advantage of this to use a single-quoted string that does not require the slash to be escaped. For instance: $file = '/root/home/random.dat'; m/$file/; You might find that this technique yields clearer code than simply changing the delimiters. If you choose the single quote as your delimiter character, then no variable interpolation is performed on the pattern. However, you still need to use the backslash character to escape any of the meta-characters discussed in the "How to Create Patterns" section later in this chapter. Tip I tend to avoid delimiters that might be confused with characters in the pattern. For example, using the plus sign as a delimiter (m+abc+) does not help program readability. A casual reader might think that you intend to add two expressions instead of matching them. Caution The ? has a special meaning when used as a match pattern delimiter. It works like the / delimiter except that it matches only once between calls to the reset() function. This feature may be removed in future versions of Perl, so avoid using it. The next few sections look at the matching, substitution, and translation operators in more detail. The Matching Operator (m//) The matching operator (m//) is used to find patterns in strings. One of its more common uses is to look for a specific string inside a data file. For instance, you might look for all customers whose last name is "Johnson," or you might need a list of all names starting with the letter s. The matching operator only searches the $_ variable. This makes the match statement shorter because you don't need to specify where to search. Here is a quick example: $_ = "AAA bbb AAA"; print "Found bbb\n" if m/bbb/; The print statement is executed only if the bbb character sequence is found in the $_ variable. In this particular case, bbb will be found, so the program will display the following: Found bbb The matching operator allows you to use variable interpolation in order to create the pattern. For example: $needToFind = "bbb"; $_ = "AAA bbb AAA"; print "Found bbb\n" if m/$needToFind/; Using the matching operator is so commonplace that Perl allows you to leave off the m from the matching operator as long as slashes are used as delimiters: $_ = "AAA bbb AAA"; print "Found bbb\n" if /bbb/; Using the matching operator to find a string inside a file is very easy because the defaults are designed to facilitate this activity. For example: $target = "M"; open(INPUT, ") { if (/$target/) { print "Found $target on line $."; } } close(INPUT); Note The $. special variable keeps track of the record number. Every time the diamond operators read a line, this variable is incremented. This example reads every line in an input searching for the letter M. When an M is found, the print statement is executed. The print statement prints the letter that is found and the line number it was found on. The Matching Options The matching operator has several options that enhance its utility. The most useful option is probably the capability to ignore case and to create an array of all matches in a string. Table 10.2 shows the options you can use with the matching operator. Table 10.2 Options for the Matching Operator Option Description g This option finds all occurrences of the pattern in the string. A list of matches is returned or you can iterate over the matches using a loop statement. i This option ignores the case of characters in the string. m This option treats the string as multiple lines. Perl does some optimization by assuming that $_ contains a single line of input. If you know that it contains multiple newline characters, use this option to turn off the optimization. o This option compiles the pattern only once. You can achieve some small performance gains with this option. It should be used with variable interpolation only when the value of the variable will not change during the lifetime of the program. s This option treats the string as a single line. x This option lets you use extended regular expressions. Basically, this means that Perl will ignore white space that's not escaped with a backslash or within a character class. I highly recommend this option so you can use spaces to make your regular expressions more readable. See the section, "Example: Extension Syntax," later in this chapter for more information. All options are specified after the last pattern delimiter. For instance, if you want the match to ignore the case of the characters in the string, you can do this: $_ = "AAA BBB AAA"; print "Found bbb\n" if m/bbb/i; This program finds a match even though the pattern uses lowercase and the string uses uppercase because the /i option was used, telling Perl to ignore the case. The result from a global pattern match can be assigned to an array variable or used inside a loop. This feature comes in handy after you learn about meta-characters in the section called "How to Create Patterns" later in this chapter. For more information about the matching options, see the section, "Pattern Examples" later in this chapter. The Substitution Operator (s///) The substitution operator (s///) is used to change strings. It requires two operands, like this: s/a/z/; This statement changes the first a in $_ into a z. Not too complicated, huh? Things won't get complicated until we start talking about regular expressions in earnest in the section, "How to Create Patterns?" later in the chapter. You can use variable interpolation with the substitution operator just as you can with the matching operator. For instance: $needToReplace = "bbb"; $replacementText = "1234567890"; $_ = "AAA bbb AAA"; $result = s/$needToReplace/$replacementText/; Note You can use variable interpolation in the replacement pattern as shown here, but none of the meta-characters described later in the chapter can be used in the replacement pattern. This program changes the $_ variable to hold "AAA 1234567890 AAA" instead of its original value, and the $result variable will be equal to 1-the number of substitutions made. Frequently, the substitution operator is used to remove substrings. For instance, if you want to remove the "bbb" sequence of characters from the $_ variable, you could do this: s/bbb//; By replacing the matched string with nothing, you have effectively deleted it. If brackets of any type are used as delimiters for the search pattern, you need to use a second set of brackets to enclose the replacement pattern. For instance: $_ = "AAA bbb AAA"; $result = s{bbb}{1234567890}; The Substitution Options Like the matching operator, the substitution operator has several options. One interesting option is the capability to evaluate the replacement pattern as an expression instead of a string. You could use this capability to find all numbers in a file and multiply them by a given percentage, for instance. Or, you could repeat matched strings by using the string repetition operator. Table 10.3 shows all of the options you can use with the substitution operator. Table 10.3 Options for the Substitution Operator Option Description e This option forces Perl to evaluate the replacement pattern as an expression. g This option replaces all occurrences of the pattern in the string. i This option ignores the case of characters in the string. m This option treats the string as multiple lines. Perl does some optimization by assuming that $_ contains a single line of input. If you know that it contains multiple newline characters, use this option to turn off the optimization. o This option compiles the pattern only once. You can achieve some small performance gains with this option. It should be used with variable interpolation only when the value of the variable will not change during the lifetime of the program. s This option treats the string as a single line. x This option lets you use extended regular expressions. Basically, this means that Perl ignores white space that is not escaped with a backslash or within a character class. I highly recommend this option so you can use spaces to make your regular expressions more readable. See the section, "Example: Extension Syntax," later in this chapter for more information. The /e option changes the interpretation of the pattern delimiters. If used, variable interpolation is active even if single quotes are used. In addition, if back quotes are used as delimiters, the replacement pattern is executed as a DOS or UNIX command. The output of the command then is used as the replacement text. The Translation Operator (tr///) The translation operator (tr///) is used to change individual characters in the $_ variable. It requires two operands, like this: tr/a/z/; This statement translates all occurrences of a into z. If you specify more than one character in the match character list, you can translate multiple characters at a time. For instance: tr/ab/z/; translates all a and all b characters into the z character. If the replacement list of characters is shorter than the target list of characters, the last character in the replacement list is repeated as often as needed. However, if more than one replacement character is given for a matched character, only the first is used. For instance: tr/WWW/ABC/; results in all W characters being converted to an A character. The rest of the replacement list is ignored. Unlike the matching and substitution operators, the translation operator doesn't perform variable interpolation. Note The tr operator gets its name from the UNIX tr utility. If you are familiar with the tr utility, then you already know how to use the tr operator. The UNIX sed utility uses a y to indicate translations. To make learning Perl easier for sed users, y is supported as a synonym for tr. The Translation Options The translation operator has options different from the matching and substitution operators. You can delete matched characters, replace repeated characters with a single character, and translate only characters that don't match the character list. Table 10.4 shows the translation options. Table 10.4 Options for the Translation Operator Option Description c This option complements the match character list. In other words, the translation is done for every character that does not match the character list. d This option deletes any character in the match list that does not have a corresponding character in the replacement list. s This option reduces repeated instances of matched characters to a single instance of that character. Normally, if the match list is longer than the replacement list, the last character in the replacement list is used as the replacement for the extra characters. However, when the d option is used, the matched characters simply are deleted. If the replacement list is empty, then no translation is done. The operator still will return the number of characters that matched, though. This is useful when you need to know how often a given letter appears in a string. This feature also can compress repeated characters using the s option. Tip UNIX programmers may be familiar with using the tr utility to convert lowercase characters to uppercase characters, or vice versa. Perl now has the lc() and uc() functions that can do this much quicker. The Binding Operators (=~ and !~) The search, modify, and translation operations work on the $_ variable by default. What if the string to be searched is in some other variable? That's where the binding operators come into play. They let you bind the regular expression operators to a variable other than $_. There are two forms of the binding operator: the regular =~ and its complement !~. The following small program shows the syntax of the =~ operator: $scalar = "The root has many leaves"; $match = $scalar =~ m/root/; $substitution = $scalar =~ s/root/tree/; $translate = $scalar =~ tr/h/H/; print("\$match = $match\n"); print("\$substitution = $substitution\n"); print("\$translate = $translate\n"); print("\$scalar = $scalar\n"); This program displays the following: $match = 1 $substitution = 1 $translate = 2 $scalar = The tree has many leaves This example uses all three of the regular expression operators with the regular binding operator. Each of the regular expression operators was bound to the $scalar variable instead of $_. This example also shows the return values of the regular expression operators. If you don't need the return values, you could do this: $scalar = "The root has many leaves"; print("String has root.\n") if $scalar =~ m/root/; $scalar =~ s/root/tree/; $scalar =~ tr/h/H/; print("\$scalar = $scalar\n"); This program displays the following: String has root. $scalar = The tree has many leaves The left operand of the binding operator is the string to be searched, modified, or transformed; the right operand is the regular expression operator to be evaluated. The complementary binding operator is valid only when used with the matching regular expression operator. If you use it with the substitution or translation operator, you get the following message if you're using the -w command-line option to run Perl: Useless use of not in void context at test.pl line 4. You can see that the !~ is the opposite of =~ by replacing the =~ in the previous example: $scalar = "The root has many leaves"; print("String has root.\n") if $scalar !~ m/root/; $scalar =~ s/root/tree/; $scalar =~ tr/h/H/; print("\$scalar = $scalar\n"); This program displays the following: $scalar = The tree has many leaves The first print line does not get executed because the complementary binding operator returns false. How to Create Patterns So far in this chapter, you've read about the different operators used with regular expressions, and you've seen how to match simple sequences of characters. Now we'll look at the wide array of meta-characters that are used to harness the full power of regular expressions. Meta-characters are characters that have an additional meaning above and beyond their literal meaning. For example, the period character can have two meanings in a pattern. First, it can be used to match a period character in the searched string-this is its literal meaning. And second, it can be used to match any character in the searched string except for the newline character-this is its meta-meaning. When creating patterns, the meta-meaning always will be the default. If you really intend to match the literal character, you need to prefix the meta-character with a backslash. You might recall that the backslash is used to create an escape sequence. For more information about escape sequences, see [49]Chapter 2 "Example: Double Quoted Strings." Patterns can have many different components. These components all combine to provide you with the power to match any type of string. The following list of components will give you a good idea of the variety of ways that patterns can be created. The section "Pattern Examples" later in this chapter shows many examples of these rules in action. Variable Interpolation: Any variable is interpolated, and the essentially new pattern then is evaluated as a regular expression. Remember that only one level of interpolation is done. This means that if the value of the variable includes, for example, $scalar as a string value, then $scalar will not be interpolated. In addition, back-quotes do not interpolate within double-quotes, and single-quotes do not stop interpolation of variables when used within double-quotes. Self-Matching Characters: Any character will match itself unless it is a meta-character or one of $, @, and &. The meta-characters are listed in Table 10.5, and the other characters are used to begin variable names and function calls. You can use the backslash character to force Perl to match the literal meaning of any character. For example, m/a/ will return true if the letter a is in the $_ variable. And m/\$/ will return true if the character $ is in the $_ variable. Table 10.5 Regular Expression Meta-Characters, Meta-Brackets, and Meta-Sequences Meta-Character Description ^ This meta-character-the caret-will match the beginning of a string or if the /m option is used, matches the beginning of a line. It is one of two pattern anchors-the other anchor is the $. . This meta-character will match any character except for the new line unless the /s option is specified. If the /s option is specified, then the newline also will be matched. $ This meta-character will match the end of a string or if the /m option is used, matches the end of a line. It is one of two pattern anchors-the other anchor is the ^. | This meta-character-called alternation-lets you specify two values that can cause the match to succeed. For instance, m/a|b/ means that the $_ variable must contain the "a" or "b" character for the match to succeed. * This meta-character indicates that the "thing" immediately to the left should be matched 1 or more times in order to be evaluated as true. ? This meta-character indicates that the "thing" immediately to the left should be matched 0 or 1 times in order to be evaluated as true. When used in conjunction with the +, _, ?, or {n, m} meta-characters and brackets, it means that the regular expression should be non-greedy and match the smallest possible string. Meta-Brackets Description () The parentheses let you affect the order of pattern evaluation and act as a form of pattern memory. See the section "Pattern Memory" later in this chapter for more information. (?...) If a question mark immediately follows the left parentheses, it indicates that an extended mode component is being specified. See the section, "Example: Extension Syntax," later in this chapter for more information. {n, m} The curly braces specify how many times the "thing" immediately to the left should be matched. {n} means that it should be matched exactly n times. {n,} means it must be matched at least n times. {n, m} means that it must be matched at least n times and not more than m times. [] The square brackets let you create a character class. For instance, m/[abc]/ will evaluate to true if any of "a", "b", or "c" is contained in $_. The square brackets are a more readable alternative to the alternation meta-character. Meta-Sequences Description \ This meta-character "escapes" the following character. This means that any special meaning normally attached to that character is ignored. For instance, if you need to include a dollar sign in a pattern, you must use \$ to avoid Perl's variable interpolation. Use \\ to specify the backslash character in your pattern. \0nnn Any Octal byte. \a Alarm. \A This meta-sequence represents the beginning of the string. Its meaning is not affected by the /m option. \b This meta-sequence represents the backspace character inside a character class; otherwise, it represents a word boundary. A word boundary is the spot between word (\w) and non-word(\W) characters. Perl thinks that the \W meta-sequence matches the imaginary characters off the ends of the string. \B Match a non-word boundary. \cn Any control character. \d Match a single digit character. \D Match a single non-digit character. \e Escape. \E Terminate the \L or \U sequence. \f Form Feed. \G Match only where the previous m//g left off. \l Change the next character to lowercase. \L Change the following characters to lowercase until a \E sequence is encountered. \n Newline. \Q Quote Regular Expression meta-characters literally until the \E sequence is encountered. \r Carriage Return. \s Match a single whitespace character. \S Match a single non-whitespace character. \t Tab. \u Change the next character to uppercase. \U Change the following characters to uppercase until a \E sequence is encountered. \v Vertical Tab. \w Match a single word character. Word characters are the alphanumeric and underscore characters. \W Match a single non-word character. \xnn Any Hexadecimal byte. \Z This meta-sequence represents the end of the string. Its meaning is not affected by the /m option. \$ Dollar Sign. \@ Ampersand. Character Sequences: A sequence of characters will match the identical sequence in the searched string. The characters need to be in the same order in both the pattern and the searched string for the match to be true. For example, m/abc/; will match "abc" but not "cab" or "bca". If any character in the sequence is a meta-character, you need to use the backslash to match its literal value. Alternation: The alternation meta-character (|) will let you match more than one possible string. For example, m/a|b/; will match if either the "a" character or the "b" character is in the searched string. You can usesequences of more than one character with alternation. For example, m/dog|cat/; will match if either of the strings "dog" or "cat" is in the searched string. Tip Some programmers like to enclose the alternation sequence inside parentheses to help indicate where the sequence begins and ends. m/(dog|cat)/; However, this will affect something called pattern memory, which you'll be learning about in the section, "Example: Pattern Memory," later in the chapter. Character Classes: The square brackets are used to create character classes. A character class is used to match a specific type of character. For example, you can match any decimal digit using m/[0123456789]/;. This will match a single character in the range of zero to nine. You can find more information about character classes in the section, "Example: Character Classes," later in this chapter. Symbolic Character Classes: There are several character classes that are used so frequently that they have a symbolic representation. The period meta-character stands for a special character class that matches all characters except for the newline. The rest are \d, \D, \s, \S, \w, and \W. These are mentioned in Table 10.5 earlier and are discussed in the section, "Example: Character Classes," later in this chapter. Anchors: The caret (^) and the dollar sign meta-characters are used to anchor a pattern to the beginning and the end of the searched string. The caret is always the first character in the pattern when used as an anchor. For example, m/^one/; will only match if the searched string starts with a sequence of characters, one. The dollar sign is always the last character in the pattern when used as an anchor. For example, m/(last|end)$/; will match only if the searched string ends with either the character sequence last or the character sequence end. The \A and \Z meta-sequences also are used as pattern anchors for the beginning and end of strings. Quantifiers: There are several meta-characters that are devoted to controlling how many characters are matched. For example, m/a{5}/; means that five a characters must be found before a true result can be returned. The *, +, and ? meta-characters and the curly braces are all used as quantifiers. See the section, "Example: Quantifiers," later in this chapter for more information. Pattern Memory: Parentheses are used to store matched values into buffers for later recall. I like to think of this as a form of pattern memory. Some programmers call them back-references. After you use m/(fish|fowl)/; to match a string and a match is found, the variable $1 will hold either fish or fowl depending on which sequence was matched. See the section, "Example: Pattern Memory," later in this chapter for more information. Word Boundaries: The \b meta-sequence will match the spot between a space and the first character of a word or between the last character of a word and the space. The \b will match at the beginning or end of a string if there are no leading or trailing spaces. For example, m/\bfoo/; will match foo even without spaces surrounding the word. It also will match $foo because the dollar sign is not considered a word character. The statement m/foo\b/; will match foo but not foobar, and the statement m/\bwiz/; will match wizard but not geewiz. See the section, "Example: Character Classes," later in this chapter for more information about word boundaries. The \B meta-sequence will match everywhere except at a word boundary. Quoting Meta-Characters: You can match meta-characters literally by enclosing them in a \Q..\E sequence. This will let you avoid using the backslash character to escape all meta-characters, and your code will be easier to read. Extended Syntax: The (?...) sequence lets you use an extended version of the regular expression syntax. The different options are discussed in the section, "Example: Extension Syntax," later in this chapter. Combinations: Any of the preceding components can be combined with any other to create simple or complex patterns. The power of patterns is that you don't always know in advance the value of the string that you will be searching. If you need to match the first word in a string that was read in from a file, you probably have no idea how long it might be; therefore, you need to build a pattern. You might start with the \w symbolic character class, which will match any single alphanumeric or underscore character. So, assuming that the string is in the $_ variable, you can match a one-character word like this: m/\w/; If you need to match both a one-character word and a two-character word, you can do this: m/\w|\w\w/; This pattern says to match a single word character or two consecutive word characters. You could continue to add alternation components to match the different lengths of words that you might expect to see, but there is a better way. You can use the + quantifier to say that the match should succeed only if the component is matched one or more times. It is used this way: m/\w+/; If the value of $_ was "AAA BBB", then m/\w+/; would match the "AAA" in the string. If $_ was blank, full of white space, or full of other non-word characters, an undefined value would be returned. The preceding pattern will let you determine if $_ contains a word but does not let you know what the word is. In order to accomplish that, you need to enclose the matching components inside parentheses. For example: m/(\w+)/; By doing this, you force Perl to store the matched string into the $1 variable. The $1 variable can be considered as pattern memory. This introduction to pattern components describes most of the details you need to know in order to create your own patterns or regular expressions. However, some of the components deserve a bit more study. The next few sections look at character classes, quantifiers, pattern memory, pattern precedence, and the extension syntax. Then the rest of the chapter is devoted to showing specific examples of when to use the different components. Example: Character Classes A character class defines a type of character. The character class [0123456789] defines the class of decimal digits, and [0-9a-f] defines the class of hexadecimal digits. Notice that you can use a dash to define a range of consecutive characters. Character classes let you match any of a range of characters; you don't know in advance which character will be matched. This capability to match non-specific characters is what meta-characters are all about. You can use variable interpolation inside the character class, but you must be careful when doing so. For example, $_ = "AAABBBccC"; $charList = "ADE"; print "matched" if m/[$charList]/; will display matched This is because the variable interpolation results in a character class of [ADE]. If you use the variable as one-half of a character range, you need to ensure that you don't mix numbers and digits. For example, $_ = "AAABBBccC"; $charList = "ADE"; print "matched" if m/[$charList-9]/; will result in the following error message when executed: /[ADE-9]/: invalid [] range in regexp at test.pl line 4. At times, it's necessary to match on any character except for a given character list. This is done by complementing the character class with the caret. For example, $_ = "AAABBBccC"; print "matched" if m/[^ABC]/; will display nothing. This match returns true only if a character besides A, B, or C is in the searched string. If you complement a list with just the letter A, $_ = "AAABBBccC"; print "matched" if m/[^A]/; then the string "matched" will be displayed because B and C are part of the string-in other words, a character besides the letter A. Perl has shortcuts for some character classes that are frequently used. Here is a list of what I call symbolic character classes: \w This symbol matches any alphanumeric character or the underscore character. It is equivalent to the character class [a-zA-Z0-9_]. \W This symbol matches every character that the \w symbol does not. In other words, it is the complement of \w. It is equivalent to [^a-zA-Z0-9_]. \s This symbol matches any space, tab, or newline character. It is equivalent to [\t \n]. \S This symbol matches any non-whitespace character. It is equivalent to [^\t \n]. \d This symbol matches any digit. It is equivalent to [0-9]. \D This symbol matches any non-digit character. It is equivalent to [^0-9]. You can use these symbols inside other character classes, but not as endpoints of a range. For example, you can do the following: $_ = "\tAAA"; print "matched" if m/[\d\s]/; which will display matched because the value of $_ includes the tab character. Tip Meta-characters that appear inside the square brackets that define a character class are used in their literal sense. They lose their meta-meaning. This may be a little confusing at first. In fact, I have a tendency to forget this when evaluating patterns. Note I think that most of the confusion regarding regular expressions lies in the fact that each character of a pattern might have several possible meanings. The caret could be an anchor, it could be a caret, or it could be used to complement a character class. Therefore, it is vital that you decide which context any given pattern character or symbol is in before assigning a meaning to it. Example: Quantifiers Perl provides several different quantifiers that let you specify how many times a given component must be present before the match is true. They are used when you don't know in advance how many characters need to be matched. Table 10.6 lists the different quantifiers that can be used. Table 10.6 The Six Types of Quantifiers Quantifier Description * The component must be present zero or more times. + The component must be present one or more times. ? The component must be present zero or one times. {n} The component must be present n times. {n,} The component must be present at least n times. {n,m} The component must be present at least n times and no more than m times. If you need to match a word whose length is unknown, you need to use the + quantifier. You can't use an * because a zero length word makes no sense. So, the match statement might look like this: m/\w+/; This pattern will match "QQQ" and "AAAAA" but not "" or " BBB". In order to account for the leading white space, which may or may not be at the beginning of a string, you need to use the asterisk (*) quantifier in conjunction with the \s symbolic character class in the following way: m/\s*\w+/; Tip Be careful when using the * quantifier because it can match an empty string, which might not be your intention. The pattern /b*/ will match any string-even one without any b characters. At times, you may need to match an exact number of components. The following match statement will be true only if five words are present in the $_ variable: $_ = "AA AB AC AD AE"; m/(\w+\s+){5}/; In this example, we are matching at least one word character followed by zero or more white space characters. The {5} quantifier is used to ensure that that combination of components is present five times. The * and + quantifiers are greedy. They match as many characters as possible. This may not always be the behavior that you need. You can create non-greedy components by following the quantifier with a ?. Use the following file specification in order to look at the * and + quantifiers more closely: $_ = '/user/Jackie/temp/names.dat'; The regular expression .* will match the entire file specification. This can be seen in the following small program: $_ = '/user/Jackie/temp/names.dat'; m/.*/; print $&; This program displays /user/Jackie/temp/names.dat You can see that the * quantifier is greedy. It matched the whole string. If you add the ? modifier to make the .* component non-greedy, what do you think the program would display? $_ = '/user/Jackie/temp/names.dat'; m/.*?/; print $&; This program displays nothing because the least amount of characters that the * matches is zero. If we change the * to a +, then the program will display / Next, let's look at the concept of pattern memory, which lets you keep bits of matched string around after the match is complete. Example: Pattern Memory Matching arbitrary numbers of characters is fine, but without the capability to find out what was matched, patterns would not be very useful. Perl lets you enclose pattern components inside parentheses in order to store the string that matched the components into pattern memory. You also might hear pattern memory referred to as pattern buffers. This memory persists after the match statement is finished executing so that you can assign the matched values to other variables. You saw a simple example of this earlier right after the component descriptions. That example looked for the first word in a string and stored it into the first buffer, $1. The following small program $_ = "AAA BBB ccC"; m/(\w+)/; print("$1\n"); will display AAA You can use as many buffers as you need. Each time you add a set of parentheses, another buffer is used. If you want to find all the words in the string, you need to use the /g match option. In order to find all the words, you can use a loop statement that loops until the match operator returns false. $_ = "AAA BBB ccC"; while (m/(\w+)/g) { print("$1\n"); } The program will display AAA BBB ccC If looping through the matches is not the right approach for your needs, perhaps you need to create an array consisting of the matches. $_ = "AAA BBB ccC"; @matches = m/(\w+)/g; print("@matches\n"); The program will display AAA BBB ccC Perl also has a few special variables to help you know what matched and what did not. These variables occasionally will save you from having to add parentheses to find information. $+ This variable is assigned the value that the last bracket match matched. $& This variable is assigned the value of the entire matched string. If the match is not successful, then $& retains its value from the last successful match. $` This variable is assigned everything in the searched string that is before the matched string. $' This variable is assigned everything in the search string that is after the matched string. Tip If you need to save the value of the matched strings stored in the pattern memory, make sure to assign them to other variables. Pattern memory is local to the enclosing block and lasts only until another match is done. Example: Pattern Precedence Pattern components have an order of precedence just as operators do. If you see the following pattern: m/a|b+/ it's hard to tell if the pattern should be m/(a|b)+/ # match either the "a" character repeated one # or more times or the "b" character repeated one # or more times. or m/a|(b+)/ # match either the "a" character or the "b" character # repeated one or more times. The order of precedence shown in Table 10.7 is designed to solve problems like this. By looking at the table, you can see that quantifiers have a higher precedence than alternation. Therefore, the second interpretation is correct. Table 10.7 The Pattern Component Order of Precedence Precedence Level Component 1 Parentheses 2 Quantifiers 3 Sequences and Anchors 4 Alternation Tip You can use parentheses to affect the order in which components are evaluated because they have the highest precedence. However, unless you use the extended syntax, you will be affecting the pattern memory. Example: Extension Syntax The regular expression extensions are a way to significantly add to the power of patterns without adding a lot of meta-characters to the proliferation that already exists. By using the basic (?...) notation, the regular expression capabilities can be greatly extended. At this time, Perl recognizes five extensions. These vary widely in functionality-from adding comments to setting options. Table 10.8 lists the extensions and gives a short description of each. Table 10.8 Five Extension Components Extension Description (?# TEXT) This extension lets you add comments to your regular expression. The TEXT value is ignored. (?:...) This extension lets you add parentheses to your regular expression without causing a pattern memory position to be used. (?=...) This extension lets you match values without including them in the $& variable. (?!...) This extension lets you specify what should not follow your pattern. For instance, /blue(?!bird)/ means that "bluebox" and "bluesy" will be matched but not "bluebird". (?sxi) This extension lets you specify an embedded option in the pattern rather than adding it after the last delimiter. This is useful if you are storing patterns in variables and using variable interpolation to do the matching. By far the most useful feature of extended mode, in my opinion, is the ability to add comments directly inside your patterns. For example, would you rather a see a pattern that looks like this: # Match a string with two words. $1 will be the # first word. $2 will be the second word. m/^\s+(\w+)\W+(\w+)\s+$/; or one that looks like this: m/ (?# This pattern will match any string with two) (?# and only two words in it. The matched words) (?# will be available in $1 and $2 if the match) (?# is successful.) ^ (?# Anchor this match to the beginning) (?# of the string) \s* (?# skip over any whitespace characters) (?# use the * because there may be none) (\w+) (?# Match the first word, we know it's) (?# the first word because of the anchor) (?# above. Place the matched word into) (?# pattern memory.) \W+ (?# Match at least one non-word) (?# character, there may be more than one) (\w+) (?# Match another word, put into pattern) (?# memory also.) \s* (?# skip over any whitespace characters) (?# use the * because there may be none) $ (?# Anchor this match to the end of the) (?# string. Because both ^ and $ anchors) (?# are present, the entire string will) (?# need to match the pattern. A) (?# sub-string that fits the pattern will) (?# not match.) /x; Of course, the commented pattern is much longer, but it takes the same amount of time to execute. In addition, it will be much easier to maintain the commented pattern because each component is explained. When you know what each component is doing in relation to the rest of the pattern, it becomes easy to modify its behavior when the need arises. Extensions also let you change the order of evaluation without affecting pattern memory. For example, m/(?:a|b)+/; will match either the a character repeated one or more times or the b character repeated one or more times. The pattern memory will not be affected. At times, you might like to include a pattern component in your pattern without including it in the $& variable that holds the matched string. The technical term for this is a zero-width positive look-ahead assertion. You can use this to ensure that the string following the matched component is correct without affecting the matched value. For example, if you have some data that looks like this: David Veterinarian 56 Jackie Orthopedist 34 Karen Veterinarian 28 and you want to find all veterinarians and store the value of the first column, you can use a look-ahead assertion. This will do both tasks in one step. For example: while (<>) { push(@array, $&) if m/^\w+(?=\s+Vet)/; } print("@array\n"); This program will display: David Karen Let's look at the pattern with comments added using the extended mode. In this case, it doesn't make sense to add comments directly to the pattern because the pattern is part of the if statement modifier. Adding comments in that location would make the comments hard to format. So let's use a different tactic. $pattern = '^\w+ (?# Match the first word in the string) (?=\s+ (?# Use a look-ahead assertion to match) (?# one or more whitespace characters) Vet) (?# In addition to the whitespace, make) (?# sure that the next column starts) (?# with the character sequence "Vet") '; while (<>) { push(@array, $&) if m/$pattern/x; } print("@array\n"); Here we used a variable to hold the pattern and then used variable interpolation in the pattern with the match operator. You might want to pick a more descriptive variable name than $pattern, however. Tip Although the Perl documentation does not mention it, I believe you have only one look-ahead assertion per pattern, and it must be the last pattern component. The last extension that we'll discuss is the zero-width negative assertion. This type of component is used to specify values that shouldn't follow the matched string. For example, using the same data as in the previous example, you can look for everyone who is not a veterinarian. Your first inclination might be to simply replace the (?=...) with the (?!...) in the previous example. while (<>) { push(@array, $&) if m/^\w+(?!\s+Vet)/; } print("@array\n"); Unfortunately, this program displays Davi Jackie Kare which is not what you need. The problem is that Perl is looking at the last character of the word to see if it matches the Vet character sequence. In order to correctly match the first word, you need to explicitly tell Perl that the first word ends at a word boundary, like this: while (<>) { push(@array, $&) if m/^\w+\b(?!\s+Vet)/; } print("@array\n"); This program displays Jackie which is correct. Tip There are many ways of matching any value. If the first method you try doesn't work, try breaking the value into smaller components and match each boundary. If all else fails, you can always ask for help on the comp.lang.perl.misc newsgroup. Pattern Examples In order to demonstrate many different patterns, I will depart from the standard example format in this section. Instead, I will explain a matching situation and then a possible resolution will immediately follow. After the resolution, I'll add some comments to explain how the match is done. In all of these examples, the string to search will be in the $_ variable. Example: Using the Match Operator If you need to find repeated characters in a string like the AA in "ABC AA ABC", then do this: m/(.)\1/; This pattern uses pattern memory to store a single character. Then a back-reference (\1) is used to repeat the first character. The back-reference is used to reference the pattern memory while still inside the pattern. Anywhere else in the program, use the $1 variable. After this statement, $1 will hold the repeated character. This pattern will match two of any non-newline character. If you need to find the first word in a string, then do this: m/^\s*(\w+)/; After this statement, $1 will hold the first word in the string. Any whitespace at the beginning of the string will be skipped by the \s* meta-character sequence. Then the \w+ meta-character sequence will match the next word. Note that the *-which matches zero or more-is used to match the whitespace because there may not be any. The +-which matches one or more-is used for the word. If you need to find the last word in a string, then do this: m/ (\w+) (?# Match a word, store its value into pattern memory) [.!?]? (?# Some strings might hold a sentence. If so, this) (?# component will match zero or one punctuation) (?# characters) \s* (?# Match trailing whitespace using the * because there) (?# might not be any) $ (?# Anchor the match to the end of the string) /x; After this statement, $1 will hold the last word in the string. You need to expand the character class, [.!?], by adding more punctuation. If you need to know that there are only two words in a string, you can do this: m/^(\w+)\W+(\w+)$/x; After this statement, $1 will hold the first word and $2 will hold the second word, assuming that the pattern matches. The pattern starts with a caret and ends with a dollar sign, which means that the entire string must match the pattern. The \w+ meta-character sequence matches one word. The \W+ meta-character sequence matches the whitespace between words. You can test for additional words by adding one \W+(\w+) meta-character sequence for each additional word to match. If you need to know that there are only two words in a string while ignoring leading or trailing spaces, you can do this: m/^\s*(\w+)\W+(\w+)\s*$/; After this statement, $1 will hold the first word and $2 will hold the second word, assuming that the pattern matches. The \s* meta-character sequence will match any leading or trailing whitespace. If you need to assign the first two words in a string to $one and $two and the rest of the string to $rest, you can do this: $_ = "This is the way to San Jose."; $word = '\w+'; # match a whole word. $space = '\W+'; # match at least one character of whitespace $string = '.*'; # match any number of anything except # for the newline character. ($one, $two, $rest) = (m/^($word) $space ($word) $space ($string)/x); After this statement, $1 will hold the first word, $2 will hold the second word, and $rest will hold everything else in the $_ variable. This example uses variable interpolation to, hopefully, make the match pattern easier to read. This technique also emphasizes which meta-sequence is used to match words and whitespace. It lets the reader focus on the whole of the pattern rather than the individual pattern components by adding a level of abstraction. If you need to see if $_ contains a legal Perl variable name, you can do this: $result = m/ ^ (?# Anchor the pattern to the start of the string) [\$\@\%] (?# Use a character class to match the first) (?# character of a variable name) [a-z] (?# Use a character class to ensure that the) (?# character of the name is a letter) \w* (?# Use a character class to ensure that the) (?# rest of the variable name is either an) (?# alphanumeric or an underscore character) $ (?# Anchor the pattern to the end of the) (?# string. This means that for the pattern to) (?# match, the variable name must be the only) (?# value in $_. /ix; # Use the /i option so that the search is # case-insensitive and use the /x option to # allow extensions. After this statement, $result will be true if $_ contains a legal variable name and false if it does not. If you need to see if $_ contains a legal integer literal, you can do this: $result = m/ (?# First check for just numbers in $_) ^ (?# Anchor to the start of the string) \d+ (?# Match one or more digits) $ (?# Anchor to the end of the string) | (?# or) (?# Now check for hexadecimal numbers) ^ (?# Anchor to the start of the string) 0x (?# The "0x" sequence starts a hexadecimal number) [\da-f]+ (?# Match one or more hexadecimal characters) $ (?# Anchor to the end of the string) /i; After this statement, $result will be true if $_ contains an integer literal and false if it does not. If you need to match all legal integers in $_, you can do this: @results = m/^\d+$|^0[x][\da-f]+$/gi; After this statement, @result will contain a list of all integer literals in $_. @result will contain an empty list if no literals are found. If you need to match the end of the first word in a string, you can do this: m/\w\W/; After this statement is executed, $& will hold the last character of the first word and the next character that follows it. If you want only the last character, use pattern memory, m/(\w)\W/;. Then $1 will be equal to the last character of the first word. If you use the global option, @array = m/\w\W/g;, then you can create an array that holds the last character of each word in the string. If you need to match the start of the second word in a string, you can do this: m/\W\w/; After this statement, $& will hold the first character of the second word and the whitespace character that immediately precedes it. While this pattern is the opposite of the pattern that matches the end of words, it will not match the beginning of the first word! This is because of the \W meta-character. Simply adding a * meta-character to the pattern after the \W does not help, because then it would match on zero non-word characters and therefore match every word character in the string. If you need to match the file name in a file specification, you can do this: $_ = '/user/Jackie/temp/names.dat'; m!^.*/(.*)!; After this match statement, $1 will equal names.dat. The match is anchored to the beginning of the string, and the .* component matches everything up to the last slash because regular expressions are greedy. Then the next (.*) matches the file name and stores it into pattern memory. You can store the file path into pattern memory by placing parentheses around the first .* component. If you need to match two prefixes and one root word, like "rockfish" and "monkfish," you can do this: m/(?:rock|monk)fish/x; The alternative meta-character is used to say that either rock or monk followed by fish needs to be found. If you need to know which alternative was found, then use regular parentheses in the pattern. After the match, $1 will be equal to either rock or monk. If you want to search a file for a string and print some of the surrounding lines, you can do this: # read the whole file into memory. open(FILE, "; close(FILE); # specify which string to find. $stringToFind = "A"; # iterate over the array looking for the # string. for ($index = 0; $index <= $#array; $index++) { last if $array[$index] =~ /$stringToFind/; } # Use $index to print two lines before # and two lines after the line that contains # the match. foreach (@array[$index-2..$index+2]) { print("$index: $_"); $index++; } There are many ways to perform this type of search, and this is just one of them. This technique is only good for relatively small files because the entire file is read into memory at once. In addition, the program assumes that the input file always contains the string that you are looking for. Example: Using the Substitution Operator If you need to remove white space from the beginning of a string, you can do this: s/^\s+//; This pattern uses the \s predefined character class to match any whitespace character. The plus sign means to match zero or more white space characters, and the caret means match only at the beginning of the string. If you need to remove whitespace from the end of a string, you can do this: s/\s+$//; This pattern uses the \s predefined character class to match any whitespace character. The plus sign means to match zero or more white space characters, and the dollar sign means match only at the end of the string. If you need to add a prefix to a string, you can do this: $prefix = "A"; s/^(.*)/$prefix$1/; When the substitution is done, the value in the $prefix variable will be added to the beginning of the $_ variable. This is done by using variable interpolation and pattern memory. Of course, you also might consider using the string concatenation operator; for instance, $_ = "A" . $_;, which is probably faster. If you need to add a suffix to a string, you can do this: $suffix = "Z"; s/^(.*)/$1$suffix/; When the substitution is done, the value in the $suffix variable will be added to the end of the $_ variable. This is done by using variable interpolation and pattern memory. Of course, you also might consider using the string concatenation operator; for instance, $_ .= "Z";, which is probably faster. If you need to reverse the first two words in a string, you can do this: s/^\s*(\w+)\W+(\w+)/$2 $1/; This substitution statement uses the pattern memory variables $1 and $2 to reverse the first two words in a string. You can use a similar technique to manipulate columns of information, the last two words, or even to change the order of more than two matches. If you need to duplicate each character in a string, you can do this: s/\w/$& x 2/eg; When the substitution is done, each character in $_ will be repeated. If the original string was "123abc", the new string would be "112233aabbcc". The e option is used to force evaluation of the replacement string. The $& special variable is used in the replacement pattern to reference the matched string, which then is repeated by the string repetition operator. If you need to capitalize all the words in a sentence, you can do this: s/(\w+)/\u$1/g; When the substitution is done, each character in $_ will have its first letter capitalized. The /g option means that each word-the \w+ meta-sequence-will be matched and placed in $1. Then it will be replaced by \u$1. The \u will capitalize whatever follows it; in this case, it's the matched word. If you need to insert a string between two repeated characters, you can do this: $_ = "!!!!"; $char = "!"; $insert = "AAA"; s{ ($char) # look for the specified character. (?=$char) # look for it again, but don't include # it in the matched string, so the next } # search also will find it. { $char . $insert # concatenate the specified character # with the string to insert. }xeg; # use extended mode, evaluate the # replacement pattern, and match all # possible strings. print("$_\n"); This example uses the extended mode to add comments directly inside the regular expression. This makes it easy to relate the comment directly to a specific pattern element. The match pattern does not directly reflect the originally stated goal of inserting a string between two repeated characters. Instead, the example was quietly restated. The new goal is to substitute all instances of $char with $char . $insert, if $char is followed by $char. As you can see, the end result is the same. Remember that sometimes you need to think outside the box. If you need to do a second level of variable interpolation in the replacement pattern, you can do this: s/(\$\w+)/$1/eeg; This is a simple example of secondary variable interpolation. If $firstVar = "AAA" and $_ = '$firstVar', then $_ would be equal to "AAA" after the substitution was made. The key is that the replacement pattern is evaluated twice. This technique is very powerful. It can be used to develop error messages used with variable interpolation. $errMsg = "File too large"; $fileName = "DATA.OUT"; $_ = 'Error: $errMsg for the file named $fileName'; s/(\$\w+)/$1/eeg; print; When this program is run, it will display Error: File too large for the file named DATA.OUT The values of the $errMsg and $fileName variables were interpolated into the replacement pattern as needed. Example: Using the Translation Operator If you need to count the number of times a given letter appears in a string, you can do this: $cnt = tr/Aa//; After this statement executes, $cnt will hold the number of times the letter a appears in $_. The tr operator does not have an option to ignore the case of the string, so both upper- and lowercase need to be specified. If you need to turn the high bit off for every character in $_, you can do this: tr [\200-\377] [\000-\177]; This statement uses the square brackets to delimit the character lists. Notice that spaces can be used between the pairs of brackets to enhance readability of the lists. The octal values are used to specify the character ranges. The translation operator is more efficient-in this instance-than using logical operators and a loop statement. This is because the translation can be done by creating a simple lookup table. Example: Using the Split() Function If you need to split a string into words, you can do this: s/^\s+//; @array = split; After this statement executes, @array will be an array of words. Before splitting the string, you need to remove any beginning white space. If this is not done, split will create an array element with the white space as the first element in the array, and this is probably not what you want. If you need to split a string contained in $line instead of $_ into words, you can do this: $line =~ s/^\s+//; @array = split(/\W/, $line); After this statement executes, @array will be an array of words. If you need to split a string into characters, you can do this: @array = split(//); After this statement executes, @array will be an array of characters. split recognizes the empty pattern as a request to make every character into a separate array element. If you need to split a string into fields based on a delimiter sequence of characters, you can do this: @array = split(/:/); @array will be an array of strings consisting of the values between the delimiters. If there are repeated delimiters-:: in this example-then an empty array element will be created. Use /:+/ as the delimiter to match in order to eliminate the empty array elements. Summary This chapter introduced you to regular expressions or patterns, regular expression operators, and the binding operators. There are three regular expression operators-m//, s///, and tr///-which are used to match, substitute, and translate and use the $_ variable as the default operand. The binding operators, =~ and !~, are used to bind the regular expression operators to a variable other than $_. While the slash character is the default pattern delimiter, you can use any character in its place. This feature is useful if the pattern contains the slash character. If you use an opening bracket or parenthesis as the beginning delimiter, use the closing bracket or parenthesis as the ending delimiter. Using the single-quote as the delimiter will turn off variable interpolation for the pattern. The matching operator has six options: /g, /i, /m, /o, /s, and /x. These options were described in Table 10.2. I've found that the /x option is very helpful for creating maintainable, commented programs. The /g option, used to find all matches in a string, also is useful. And, of course, the capability to create case-insensitive patterns using the /i option is crucial in many cases. The substitution operator has the same options as the matching operator and one more-the /e option. The /e option lets you evaluate the replacement pattern and use the new value as the replacement string. If you use back-quotes as delimiters, the replacement pattern will be executed as a DOS or UNIX command, and the resulting output will become the replacement string. The translation operator has three options: /c, /d, and /s. These options are used to complement the match character list, delete characters not in the match character list, and eliminate repeated characters in a string. If no replacement list is specified, the number of matched characters will be returned. This is handy if you need to know how many times a given character appears in a string. The binding operators are used to force the matching, substitution, and translation operators to search a variable other than $_. The =~ operator can be used with all three of the regular expression operators, while the !~ operator can be used only with the matching operator. Quite a bit of space was devoted to creating patterns, and the topic deserves even more space. This is easily one of the more involved features of the Perl language. One key concept is that a character can have multiple meanings. For example, the plus sign can mean a plus sign in one instance (its literal meaning), and in another it means match something one or more times (its meta-meaning). You learned about regular expression components and that they can be combined in an infinite number of ways. Table 10.5 listed most of the meta-meanings for different characters. You read about character classes, alternation, quantifiers, anchors, pattern memory, word boundaries, and extended components. The last section of the chapter was devoted to presenting numerous examples of how to use regular expressions to accomplish specific goals. Each situation was described, and a pattern that matched that situation was shown. Some commentary was given for each example. In the next chapter, you'll read about how to present information by using formats. Formats are used to help relieve some of the programming burden from the task of creating reports. Review Questions Answers to Review Questions are in Appendix A. 1. Can you use variable interpolation with the translation operator? 2. What happens if the pattern is empty? 3. What variable does the substitution operator use as its default? 4. Will the following line of code work? m{.*]; 5. What is the /g option of the substitution operator used for? 6. What does the \d meta-character sequence mean? 7. What is the meaning of the dollar sign in the following pattern? /AA[.<]$]ER/ 8. What is a word boundary? 9. What will be displayed by the following program? $_ = 'AB AB AC'; print m/c$/i; Review Exercises 1. Write a pattern that matches either "top" or "topgun". 2. Write a program that accepts input from STDIN and changes all instances of the letter a into the letter b. 3. Write a pattern that stores the first character to follow a tab into pattern memory. 4. Write a pattern that matches the letter g between three and seven times. 5. Write a program that finds repeated words in an input file and prints the repeated word and the line number on which it was found. 6. Create a character class for octal numbers. 7. Write a program that uses the translation operator to remove repeated instances of the tab character and then replaces the tab character with a space character. 8. Write a pattern that matches either "top" or "topgun" using a zero-width positive look-ahead assertion. ____________________________________________________________________________________________________________________ [50][pc.gif] [51][cc.gif] [52][hb.gif] [53][nc.gif] ____________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________ Chapter 11 Creating Reports ____________________________________________________________________________________________________________________ CONTENTS * [27]What's a Format Statement? + [28]Example: Using Field Lines + [29]Example: Report Headings + [30]Example: Using Functions in the Value Line + [31]Example: Changing Formats + [32]Example: Using Long Pieces of Text in Reports + [33]Example: Writing to a File Instead of STDOUT * [34]Summary * [35]Review Questions * [36]Review Exercises ____________________________________________________________________________________________________________________ Perl has a few special features that let you create simple reports. The reports can have a header area where you can place a title, page number, and other information that stays the same from one page to the next. Perl will track how many lines have been used in the report and automatically generate new pages as needed. Compared to learning about regular expressions, learning how to create reports will be a breeze. There are only a few tricky parts, which I'll be sure to point out. This chapter starts out by using the print() function to display a CD collection and then gradually moves from displaying the data to a fully formatted report. The data file shown in Listing 11.1 is used for all of the examples in this chapter. The format is pretty simple: the CD album's title, the artist's name, and the album's price. Listing 11.1 FORMAT.DAT-The Data File The Lion King! Tumbleweed Connection!Elton John!123.32 Photographs & Memories!Jim Croce!4.95 Heads & Tales!Harry Chapin!12.50 You'll find that Perl is very handy for small text-based data files like this. You can create them in any editor and use any field delimiter you like. In this file, I used an exclamation point to delimit the field. However, I could just as easily have used a caret, a tilde, or any other character. Now that we have some data, let's look at Listing 11.2, which is a program that reads the data file and displays the information. Open the FORMAT.DAT file. [pseudo.gif] Read all the file's lines and place them in the @lines array. Each line becomes a different element in the array. Close the file. Iterate over the @lines array. $_ is set to a different array element each time through the loop. Remove the linefeed character from the end of the string. Split the string into three fields using the exclamation point as the delimiter. Place each field into the $album, $artist, and $price variables. Print the variables. Listing 11.2 11LIST02.PL-A Program to Read and Display the Data File open(FILE, "; close(FILE); foreach (@lines) { chop; ($album, $artist, $price) = (split(/!/)); print("Album=$album Artist=$artist Price=$price\n"); } This program displays: Use of uninitialized value at 11lst02.pl line 8. Album=The Lion King Artist= Price= Album=Tumbleweed Connection Artist=Elton John Price=123.32 Album=Photographs & Memories Artist=Jim Croce Price=4.95 Album=Heads & Tales Artist=Harry Chapin Price=12.50 Why is an error being displayed on the first line of the output? If you said that the split() function was returning the undefined value when there was no matching field in the input file, you were correct. The first input line was the following: The Lion King! There are no entries for the Artist or Price fields. Therefore, the $artist and $price variables were assigned the undefined value, which resulted in Perl complaining about uninitialized values. You can avoid this problem by assigning the empty string to any variable that has the undefined value. Listing 11.3 shows a program that does this. Open the FORMAT.DAT file, read all the lines into @lines, and then close the file. [pseudo.gif] Iterate over the @lines array. Remove the linefeed character. Split the string into three fields. If any of the three fields is not present in the line, provide a default value of an empty string. Print the variables. ____________________________________________________________________________________________________________________ Listing 11.3 11LST03.PL-How to Avoid the Uninitialized Error When Using the Split() Function open(FILE, "; close(FILE); foreach (@lines) { chop; ($album, $artist, $price) = (split(/!/)); $album = "" if !defined($album); These lines assign null $artist = "" if !defined($artist); strings if no info is $price = "" if !defined($price); present in the record. print("Album=$album Artist=$artist Price=$price\n"); } The first four lines this program displays are the following: Album=The Lion King Artist= Price= Album=Tumbleweed Connection Artist=Elton John Price=123.32 Album=Photographs & Memories Artist=Jim Croce Price=4.95 Album=Heads & Tales Artist=Harry Chapin Price=12.50 ____________________________________________________________________________________________________________________ The error has been eliminated, but it is still very hard to read the output because the columns are not aligned. The rest of this chapter is devoted to turning this jumbled output into a report. Perl reports have heading and have detail lines. A heading is used to identify the report title, the page number, the date, and any other information that needs to appear at the top of each page. Detail lines are used to show information about each record in the report. In the data file being used for the examples in this chapter (refer to Listing 11.1), each CD has its own detail line. Headings and detail lines are defined by using format statements, which are discussed in the next section. What's a Format Statement? Perl uses formats as guidelines when writing report information. A format is used to tell Perl what static text is needed and where variable information should be placed. Formats are defined by using the format statement. The syntax for the format statement is format FORMATNAME = FIELD_LINE VALUE_LINE The FORMATNAME is usually the same name as the file handle that is used to accept the report output. The section "Example: Changing Formats," later in this chapter, talks about using the format statement where the FORMATNAME is different from the file handle. If you don't specify a FORMATNAME, Perl uses STDOUT. The FIELD_LINE part of the format statement consists of text and field holders. A field holder represents a given line width that Perl will fill with the value of a variable. The VALUE_LINE line consists of a comma-delimited list of expressions used to fill the field holders in FIELD_LINE. Report headings, which appear at the top of each page, have the following format: format FORMATNAME_TOP = FIELD_LINE VALUE_LINE Yes, the only difference between a detail line and a heading is that _TOP is appended to the FORMATNAME. Note The location of format statements is unimportant because they define only a format and never are executed. I feel that they should appear either at the beginning of a program or the end of a program, rarely in the middle. Placing format statements in the middle of your program might make them hard to find when they need to be changed. Of course, you should be consistent where you place them. A typical format statement might look like this: format = The total amount is $@###.## $total The at character @ is used to start a field holder. In this example, the field holder is seven characters long (the at sign and decimal point count, as well as the pound signs #). The next section, "Example: Using Field Lines," goes into more detail about field lines and field holders. Format statements are used only when invoked by the write() function. The write() function takes only one parameter: a file handle to send output to. Like many things in Perl, if no parameter is specified, a default is provided. In this case, STDOUT will be used when no FORMATNAME is specified. In order to use the preceding format, you simply assign a value to $total and then call the write() function. For example: $total = 243.45 write(); $total = 50.00 write(); These lines will display: The total amount is $ 243.45 The total amount is $ 50.50 The output will be sent to STDOUT. Notice that the decimal points are automatically lined up when the lines are displayed. Example: Using Field Lines The field lines of a format statement control what is displayed and how. The simplest field line contains only static text. You can use static or unchanging text as labels for variable information, dollar signs in front of amounts, a separator character such as a comma between first and last name, or whatever else is needed. However, you'll rarely use just static text in your format statement. Most likely, you'll use a mix of static text and field holders. You saw a field holder in action in the last section in which I demonstrated sending the report to STDOUT. I'll repeat the format statement here so you can look at it in more detail: format = The total amount is $@###.## $total The character sequence The total amount is $ is static text. It will not change no matter how many times the report is printed. The character sequence @###.##, however, is a field holder. It reserves seven spaces in the line for a number to be inserted. The third line is the value line; it tells Perl which variable to use with the field holder. Table 11.1 contains a list of the different format characters you can use in field lines. Table 11.1 Field Holder Formats Format Character Description @ This character represents the start of a field holder. < This character indicates that the field should be left-justified. > This character indicates that the field should be right-justified. | This character indicates that the field should be centered. # This character indicates that the field will be numeric. If used as the first character in the line, it indicates that the entire line is a comment. . This character indicates that a decimal point should be used with numeric fields. ^ This character also represents the start of a field holder. Moreover, it tells Perl to turn on word-wrap mode. See the section "Example: Using Long Pieces of Text in Reports" later in this chapter for more information about word-wrapping. ~ This character indicates that the line should not be written if it is blank. ~~ This sequence indicates that lines should be written as needed until the value of a variable is completely written to the output file. @* This sequence indicates that a multi-line field will be used. Let's start using some of these formatting characters by formatting a report to display information about the FORMAT.DAT file we used earlier. The program in Listing 11.4 displays the information in nice, neat columns. [pseudo.gif] Declare a format for the STDOUT file handle. Open the FORMAT.DAT file, read all the lines into @lines, and then close the file. Iterate over the @lines array. Remove the linefeed character. Split the string into three fields. If any of the three fields is not present in the line, provide a default value of an empty string. Notice that a numeric value must be given to $price instead of the empty string. Invoke the format statement by using the write() function. ____________________________________________________________________________________________________________________ Listing 11.4 11LST04.PL-Using a Format with STDOUT format = Album=@<<<<<<<<<<<<< Artist=@>>>>>>>>>>>> Price=$@##.## $album, $artist, $price . open(FILE, "; close(FILE); foreach (@lines) { chop; ($album, $artist, $price) = (split(/!/)); $album = "" if !defined($album); $artist = "" if !defined($artist); $price = 0 if !defined($price); write(); } ____________________________________________________________________________________________________________________ This program displays the following: Album=The Lion King Artist= Price=$ 0.00 Album=Tumbleweed Con Artist= Elton John Price=$123.32 Album=Photographs & Artist= Jim Croce Price=$ 4.95 Album=Heads & Tales Artist= Harry Chapin Price=$ 12.50 You can see that the columns are now neatly aligned. This was done with the format statement and the write() function. The format statement used in this example used three field holders. The first field holder, @<<<<<<<<<<<<<, created a left-justified spot for a 14-character-wide field filled by the value in $album. The second field holder, @>>>>>>>>>>>>, created a right-justified spot for a 12-character-wide field filled by the value in $artist. The last field holder, @##.##, created a six-character-wide field filled by the numeric value in $price. You might think it's wasteful to have the field labels repeated on each line, and I would agree with that. Instead of placing field labels on the line, you can put them in the report heading. The next section discusses how to do this. Example: Report Headings Format statements for a report heading use the same format as the detail line format statement, except that _TOP is appended to the file handle. In the case of STDOUT, you must specify STDOUT_TOP. Simply using _TOP will not work. To add a heading to the report about the CD collection, you might use the following format statement: format STDOUT_TOP = @|||||||||||||||||||||||||||||||||||| Pg @< "CD Collection of David Medinets", $% Album Artist Price ----------------- ---------------- ------- . Adding this format statement to Listing 11.4 produces this output: CD Collection of David Medinets Pg 1 Album Artist Price ----------------- ---------------- ------- The Lion King $ 0.00 Tumbleweed Connec Elton John $123.32 Photographs & Mem Jim Croce $ 4.95 Heads & Tales Harry Chapin $ 12.50 Whenever a new page is generated, the heading format is automatically invoked. Normally, a page is 60 lines long. However, you can change this by setting the $= special variable. Another special variable, $%, holds the current page number. It will be initialized to zero when your program starts. Then, just before invoking the heading format, it is incremented so its value is one. You can change $% if you need to change the page number for some reason. You might notice that the | formatting character was used to center the report title over the columns. You also might notice that placing the field labels into the heading allows the columns to be expanded in width. Unfortunately, Perl does not truly have any facility for adding footer detail lines. However, you can try a bit of "magic" in order to fool Perl into creating footers with static text. The $^L variable holds the string that Perl writes before every report page except for the first, and the $= variable holds the number of lines per page. By changing $^L to hold your footer and by reducing the value in $= by the number of lines your footer will need, you can create primitive footers. Listing 11.5 displays the CD collection report on two pages by using this technique. [pseudo.gif] Declare a format for the STDOUT file handle. Declare a heading format for the STDOUT file handle. Open the FORMAT.DAT file, read all the lines into @lines, and then close the file. Assign a value of 6 to $=. Normally, it has a value of 60. Changing the value to 6 will create very short pages-ideal for small example programs. Assign a string to $^L, which usually is equal to the form-feed character. The form-feed character causes printers to eject a page. Iterate over the @lines array. Remove the linefeed character. Split the string into three fields. If any of the three fields is not present in the line, provide a default value of an empty string. Notice that a numeric value must be given to $price instead of the empty string. Invoke the format statement using the write() function. Print the footer on the last page. You need to explicitly do this because the last page of the report probably will not be a full page. ____________________________________________________________________________________________________________________ Listing 11.5 11LST05.PL-Tricking Perl into Creating Primitive Footers format = Album=@<<<<<<<<<<<<< Artist=@>>>>>>>>>>>> Price=$@##.## $album, $artist, $price . format STDOUT_TOP = @|||||||||||||||||||||||||||||||||||| Pg @< "CD Collection of David Medinets", $% Album Artist Price ----------------- ---------------- ------- . open(FILE, "; close(FILE); $= = 6; $^L = '-' x 60 . "\n" . "Copyright, 1996, Eclectic Consulting\n" . "\n\n"; foreach (@lines) { chop(); ($album, $artist, $price) = (split(/!/)); $album = "" if !defined($album); $artist = "" if !defined($artist); $price = 0 if !defined($price); write(); } print("$^L"); ____________________________________________________________________________________________________________________ This program displays the following: CD Collection of David Medinets Pg 1 Album Artist Price ----------------- ---------------- ------- Album=The Lion King Artist= Price=$ 0.00 Album=Tumbleweed Con Artist= Elton John Price=$123.32 ------------------------------------------------------------ Copyright, 1996, Eclectic Consulting CD Collection of David Medinets Pg 2 Album Artist Price ----------------- ---------------- ------- Album=Photographs & Artist= Jim Croce Price=$ 4.95 Album=Heads & Tales Artist= Harry Chapin Price=$ 12.50 ------------------------------------------------------------ Copyright, 1996, Eclectic Consulting Let me explain the assignment to $^L in more detail. The assignment is duplicated here for your convenience: $^L = '-' x 60 . "\n" . "Copyright, 1996 by Eclectic Consulting\n" . "\n\n"; The first part of the assignment, '-' x 60, creates a line of 60 dash characters. Then a newline character is concatenated to the line of dashes. Next, the copyright line is appended. Finally, two more linefeeds are appended to separate the two pages of output. Normally, you wouldn't add the ending linefeeds because the form-feed character makes them unnecessary. Here's how the code would look when designed to be sent to a printer: $^L = '-' x 60 . "\n" . "Copyright, 1996 by Eclectic Consulting" . "\014"; The "\014" string is the equivalent of a form-feed character because the ASCII value for a form-feed is 12, which is 14 in octal notation. Note I feel that it's important to say that the coding style in this example is not really recommended for "real" programming. I concatenated each footer element separately so I could discuss what each element did. The last three elements in the footer assignment probably should be placed inside one string literal for efficiency. Tip This example is somewhat incomplete. If the last page of the report ends at line 20 and there are 55 lines per page, simply printing the $^L variable will not place the footer at the bottom of the page. Instead, the footer will appear after line 20. This probably is not the behavior you would like. Try the following statement to fix this problem: print("\n" x $- . "$^L"); This will concatenate enough linefeeds to the beginning of the footer variable to place the footer at the bottom of the page. Example: Using Functions in the Value Line You've already seen the value line in action. Most of the time, its use will be very simple: create the field holder in the field line and then put the variable name in the value line. But there are some other value line capabilities you should know about. In addition to simple scalar variables, you can specify array variables and even functions on the value line. Listing 11.6 shows a program that uses a function to add ellipses to a string if it is too wide for a column. [pseudo.gif] Declare a format for the STDOUT file handle. In this example, the value line calls the dotize() function. Declare a heading format for the STDOUT file handle. Declare the dotize() function. Initialize local variables called $width and $string. If the width of $string is greater than $width, return a value that consists of $string shortened to $width-3 with ... appended to the end; otherwise, return $string. Open the FORMAT.DAT file, read all the lines into @lines, and then close the file. Iterate over the @lines array. Remove the linefeed character. Split the string into three fields. If any of the three fields is not present in the line, provide a default value of an empty string. Notice that a numeric value must be given to $price instead of the empty string. Invoke the format statement by using the write() function. ____________________________________________________________________________________________________________________ Listing 11.6 11LIST05.PL-Using a Function with a Value Line format = @<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<< $@##.## dotize(17, $album), dotize(16, $artist), $price . format STDOUT_TOP = @|||||||||||||||||||||||||||||||||||| Pg @< "CD Collection of David Medinets", $% Album Artist Price ----------------- ---------------- ------- . sub dotize { my($width, $string) = @_; if (length($string) > $width) { return(substr($string, 0, $width - 3) . "..."); } else { return($string); } } open(FILE, "; close(FILE); foreach (@lines) { chop(); ($album, $artist, $price) = (split(/!/)); $album = "" if !defined($album); $artist = "" if !defined($artist); $price = 0 if !defined($price); write(); } ____________________________________________________________________________________________________________________ This program displays the following: CD Collection of David Medinets Pg 1 Album Artist Price ----------------- ---------------- ------- The Lion King $ 0.00 Tumbleweed Con... Elton John $123.32 Photographs & ... Jim Croce $ 4.95 Heads & Tales Harry Chapin $ 12.50 The second and third detail lines have benefited from the dotize() function. You can use a similar technique to invoke any function in the value line. You also can use expressions directly in the value line, but it might be harder to maintain because the intent of the expression might not be clear. Example: Changing Formats So far, you've seen only how to use a single format statement per report. If Perl could handle only one format per report, it wouldn't have much utility as a reporting tool. Fortunately, by using the $~ special variable, you can control which format is used for any given write() function call. Listing 11.7 shows a program that tracks the price of the CDs in the collection and displays the total using an alternate format statement. [pseudo.gif] Declare a format for the STDOUT file handle. Declare a format for the total price information. Declare a heading format for the STDOUT file handle. Declare the dotize() function. Initialize local variables called $width and $string. If the width of $string is greater than $width, return a value that consists of $string shortened to $width-3 with ... appended to the end; otherwise, return $string. Open the FORMAT.DAT file, read all the lines into @lines, and then close the file. Initialize the $total variable to zero. Iterate over the @lines array. Remove the linefeed character. Split the string into three fields. Provide a default value for any empty variables. Invoke the format statement by using the write() function. Change the current format by assigning a value to the $~ special variable. Invoke the format statement by using the write() function. ____________________________________________________________________________________________________________________ Listing 11.7 11LST07.PL-Using an Alternative format Statement format = @<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<< $@###.## dotize(17, $album), dotize(16, $artist), $price . format STDOUT_TOTAL = --------------------------------------------- $@###.## $total . format STDOUT_TOP = @|||||||||||||||||||||||||||||||||||| Pg @< "CD Collection of David Medinets", $% Album Artist Price ----------------- ---------------- -------- . sub dotize { my($width, $string) = @_; if (length($string) > $width) { return(substr($string, 0, $width - 3) . "..."); } else { return($string); } } open(FILE, "; close(FILE); $total = 0; foreach (@lines) { chop(); ($album, $artist, $price) = (split(/!/)); $album = "" if !defined($album); $artist = "" if !defined($artist); $price = 0 if !defined($price); write(); $total += $price; } $~ = "STDOUT_TOTAL"; write(); ____________________________________________________________________________________________________________________ This program displays the following: CD Collection of David Medinets Pg 1 Album Artist Price ----------------- ---------------- -------- The Lion King $ 0.00 Tumbleweed Con... Elton John $ 123.32 Photographs & ... Jim Croce $ 4.95 Heads & Tales Harry Chapin $ 12.50 --------------------------------------------- $ 140.77 This example shows you how to keep a running total and how to switch to an alternative detail line format. If you need to switch to an alternative heading format, assign the new header format name to the $^ special variable. Example: Using Long Pieces of Text in Reports By using the ^, ~, and ~~ formatting characters in your format statements, you can use long pieces of text in a report: for example, the first paragraph of a paper's abstract or some notes associated with a database record. Listing 11.8 shows a program that prints the definition of a word. The definition is too long to fit in one column, so the ^ formatting character is used to split the text onto multiple lines. [pseudo.gif] Declare a format for the STDOUT file handle. The field and value lines are repeated enough times to print the entire length of the expected output. Initialize the $word and $definition variables. The $definition variable is initialized by using concatenated strings to avoid line breaks caused by the book printing process. A line of asterisks is printed. The format is invoked. Another line of asterisks is printed. ____________________________________________________________________________________________________________________ Listing 11.8 11LST08.PL-Using the ^ Formatting Character to Print Long Text Values format = ^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $word, $definition ^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $word, $definition ^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $word, $definition ^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $word, $definition ^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $word, $definition ^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< $word, $definition . $word = "outlier"; $definition = "1. someone sleeping outdoors. " . "2. someone whose office is not at home. " . "3. an animal who strays from the fold. " . "4. something that has been separated from the main body."; print("****************\n"); write(); print("****************\n"); ____________________________________________________________________________________________________________________ This program displays the following: **************** outlier 1. someone sleeping outdoors. 2. someone whose office is not at home. 3. an animal who strays from the fold. 4. something that has been separated from the main body. **************** The ^ formatting character causes Perl to do word-wrapping on the specified variable. Word-wrapping means that Perl will accumulate words into a temporary buffer, stopping when the next word will cause the length of the accumulated string to exceed the length of the field. The accumulated string is incorporated into the report, and the accumulated words are removed from the variable. Therefore, the next time Perl looks at the variable, it can start accumulating words that have not been used yet. Note Any linefeed characters in the variable are ignored when the ^ formatting character is used in the format statement. Caution Because the value of the variable used in the value line changes when word-wrapping is being used, make sure to use only copies of variables in the format statement. By using copies of the variables, you'll still have the original value available for further processing. The asterisks in the preceding example were printed to show that a blank line was printed by the format. This was caused because the $definition variable ran out of words before the format ran out of space. Extra blank lines can be eliminated by placing the ~ character somewhere-usually at the beginning or end-of the field line. The format statement then would look like this: format = ^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~ $word, $definition ^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~ $word, $definition ^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~ $word, $definition ^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~ $word, $definition ^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~ $word, $definition ^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~ $word, $definition . The new report would not have a blank line. **************** outlier 1. someone sleeping outdoors. 2. someone whose office is not at home. 3. an animal who strays from the fold. 4. something that has been separated from the main body. **************** It is rather wasteful to have to repeat the field lines often enough to account for the longest possible length of $definition. In fact, if you are reading the definitions from a file, you might not know how long the definitions could be ahead of time. Perl provides the ~~ character sequence to handle situations like this. By placing ~~ on the field line, Perl will repeat the field line as often as needed until a blank line would be printed. Using this technique would change the format statement to this: format = ^<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ~~ $word, $definition You might be wondering how Perl decides when a word ends. This behavior is controlled by the $: variable. The default value for $: is a string consisting of the space, newline, and dash characters. Example: Writing to a File Instead of STDOUT Up to this point in the chapter, we've only looked at writing a report to the display or STDOUT. This was done to simplify and shorten the examples. Writing a report to a file requires that you open a file for output and specify the file handle as a parameter to the write() function. All functionality you've seen so far can be used with files. Listing 11.9 shows how easy it is to convert an existing program from using STDOUT to using a file. The program shown is a reworking of the program in Listing 11.4. Four changes needed to be made for the conversion. The format statement was changed to specify a format name identical to the file handle used in the second open() statement. A second open() statement was added. The write() function was changed to specify the file handle to use, and a second close() statement was added. [pseudo.gif] Declare a format for the CD_REPORT file handle. Open the FORMAT.DAT file, read all the lines into @lines, and then close the file. Open the FORMAT.RPT file for output to hold the report. Iterate over the @lines array. Remove the linefeed character. Split the string into three fields. If any of the three fields is not present in the line, provide a default value of an empty string. Notice that a numeric value must be given to $price instead of the empty string. Invoke the format statement by using the write() function specifying the file handle to use. Close the FORMAT.RPT file. ____________________________________________________________________________________________________________________ Listing 11.9 11LST09.PL-Using a Format with STDOUT format CD _REPORT = Album=@<<<<<<<<<<<<< Artist=@>>>>>>>>>>>> Price=$@##.## $album, $artist, $price . open(FILE, "; close(FILE); open(CD_REPORT, ">format.rpt"); foreach (@lines) { chop; ($album, $artist, $price) = (split(/!/)); $album = "" if !defined($album); $artist = "" if !defined($artist); $price = 0 if !defined($price); write(CD_REPORT); } ____________________________________________________________________________________________________________________ close(CD_REPORT); This program creates a file called FORMAT.RPT that contains the following: Album=The Lion King Artist= Price=$ 0.00 Album=Tumbleweed Con Artist= Elton John Price=$123.32 Album=Photographs & Artist= Jim Croce Price=$ 4.95 Album=Heads & Tales Artist= Harry Chapin Price=$ 12.50 The contents of FORMAT.RPT are identical to the display created by the program in Listing 11.4. Using more than one format in reports destined for files is slightly more complicated than it was when STDOUT was used. The process is more involved because you need to make the output file handle the default file handle before setting the $~ or $^ special variables. Listing 11.10 shows how to use an alternative format statement. [pseudo.gif] Declare a format for the CD_REPORT file handle. Declare a format for the total price information using CD_REPORT_TOTAL as the format name. Declare a heading format for the CD_REPORT file handle using CD_REPORT_TOP as the format name. Declare the dotize() function. Initialize local variables called $width and $string. If the width of $string is greater than $width, return a value that consists of $string shortened to $width-3 with ... appended to the end; otherwise, return $string. Open the FORMAT.DAT file, read all the lines into @lines, and then close the file. Open the FORMAT.RPT file for output to hold the report. Initialize the $total variable to zero. Iterate over the @lines array. Remove the linefeed character. Split the string into three fields. Provide a default value for any empty variables. Invoke the format statement by using the write() function specifying the CD_REPORT file name. Change the current format by assigning a value to the $~ special variable. This statement uses some advanced concepts and is explained further after the listing. Invoke the format statement by using the write() function. Close the FORMAT.RPT file. ____________________________________________________________________________________________________________________ Listing 11.10 11LST10.PL-Using an Alternative format Statement format CD_REPORT = @<<<<<<<<<<<<<<<< @<<<<<<<<<<<<<<< $@###.## dotize(17, $album), dotize(16, $artist), $price . format CD_REPORT_TOTAL = --------------------------------------------- $@###.## $total . format CD_REPORT_TOP = @|||||||||||||||||||||||||||||||||||| Pg @< "CD Collection of David Medinets", $% Album Artist Price ----------------- ---------------- -------- . sub dotize { my($width, $string) = @_; if (length($string) > $width) { return(substr($string, 0, $width - 3) . "..."); } else { return($string); } } open(FILE, "; close(FILE); open(CD_REPORT, ">format.rpt"); $total = 0; foreach (@lines) { chop(); ($album, $artist, $price) = (split(/!/)); $album = "" if !defined($album); $artist = "" if !defined($artist); $price = 0 if !defined($price); write(CD_REPORT); $total += $price; } ____________________________________________________________________________________________________________________ select((select(CD_REPORT), $~ = "CD_REPORT_TOTAL")[0]); write(CD_REPORT); close(CD_REPORT); This program creates a file called FORMAT.RPT that contains the following: CD Collection of David Medinets Pg 1 Album Artist Price ----------------- ---------------- -------- The Lion King $ 0.00 Tumbleweed Con... Elton John $ 123.32 Photographs & ... Jim Croce $ 4.95 Heads & Tales Harry Chapin $ 12.50 --------------------------------------------- $ 140.77 The contents of FORMAT.RPT are identical to the display created by the program in Listing 11.7. The statement that changes a default file handle and format name is a little complicated. Let's take a closer look at it. select((select(CD_REPORT), $~ = "CD_REPORT_TOTAL")[0]); In order to understand most statements, you need to look at the innermost parenthesis first, and this one is no different. The innermost expression to evaluate is select(CD_REPORT), $~ = "CD_REPORT_TOTAL" You might recall that the comma operator lets you place one or more statements where normally you can place only one. That's what is happening here. First, CD_REPORT is selected as the default file handle for the print and write statements, and then the $~ variable is changed to the new format name. By enclosing the two statements inside parentheses, their return values are used in an array context. You probably already have guessed that the [0] notation then is used to retrieve the first element of the array: the value returned from the select() function. Because the select() function returns the value of the previous default file handle, after executing the second select(), the default file handle is restored to its previous value. This bit of code could have been written like this: $oldhandle = select(CD_REPORT); $~ = "CD_REPORT_TOTAL"; select($oldhandle); Summary In this chapter, you learned how to create simple reports that incorporate headers, footers, and detail lines. Headers are used at the top of each page and can consist of both static text and values from variables. Footers are used at the bottom of each page and can consist only of static text. Detail lines make up the body of a report. Header and detail lines are defined by using format statements that have alternating field and value lines. The field lines hold the static text and field holders while the value lines hold a comma-delimited list of expressions. You can use several different format characters when creating the field holder to have left-justified, right-justified, or centered fields. You also can use word-wrapping to display long pieces of text in your reports. Directing a report to a file instead of to STDOUT requires some simple steps. The output file needs to be opened; the file handle needs to be specified as the format name in the format statement; the format name needs to be specified in the write statement; and the output file needs to be closed. The next chapter focuses on special variables. All the different special variables you have seen so far-and more-are discussed along with some examples of how to use them. Review Questions Answers to Review Questions are in Appendix A. 1. What is the syntax of the format statement? 2. What is a footer? 3. What function is used to invoke the format statement? 4. How can you change a detail format line into a header format line? 5. What is the > format character used for? 6. What is the $^L variable used for? 7. Can associative array variables be used in value lines? 8. What will the following line of code do? select((select(ANNUAL_RPT), $^ = "REGIONAL_SALES")[0]); Review Exercises 1. Modify the program in Listing 11.4 to display the second field as left-justified instead of right-justified. 2. Create a report that has both a price and a tax column. Use a tax rate of seven percent. 3. Modify the program in Listing 11.7 to display an average of the CD prices instead of the total of the prices. 4. Create a program that sends the report in the preceding exercise to a file. Use the select statement to change the default file handle so that a file handle does not need to be passed to the write() function. 5. Modify Listing 11.5 so that each pass through the loop checks the value of $-. When the value of $- is one less than $=, change the value of $^L to emulate a footer with variable text. 6. Create a report that uses a detail line format with more than one line. How would this affect the program written for Exercise 5? ____________________________________________________________________________________________________________________ [37][pc.gif] [38][cc.gif] [39][hb.gif] [40][nc.gif] ____________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________ Chapter 12 Using Special Variables ____________________________________________________________________________________________________________________ CONTENTS * [27]What Are the Special Variables? + [28]Example: Using the DATA File Handle * [29]Summary * [30]Review Questions * [31]Review Exercises ____________________________________________________________________________________________________________________ Perl uses quite a few special variables to control various behaviors of functions. You can use special variables to hold the results of searches, the values of environment variables, and flags to control debugging. In short, every aspect of Perl programming uses special variables. What Are the Special Variables? Table 12.1 shows a list of the special variables you can use in your programs. The order of this list is identical to the list in the file PERLVAR.htm, which comes with your Perl distribution. This table lets you quickly find any special variable you may come across in examples or someone else's code. Table 12.1 Perl's Special Variables Variable Name Description $_ The default parameter for a lot of functions. $. Holds the current record or line number of the file handle that was last read. It is read-only and will be reset to 0 when the file handle is closed. $/ Holds the input record separator. The record separator is usually the newline character. However, if $/ is set to an empty string, two or more newlines in the input file will be treated as one. $, The output separator for the print() function. Nor-mally, this variable is an empty string. However, setting $, to a newline might be useful if you need to print each element in the parameter list on a separate line. $\ Added as an invisible last element to the parameters passed to the print() function. Normally, an empty string, but if you want to add a newline or some other suffix to everything that is printed, you can assign the suffix to $\. $# The default format for printed numbers. Normally, it's set to %.20g, but you can use the format specifiers covered in the section "Example: Printing Revisited" in [32]Chapter 9to specify your own default format. $% Holds the current page number for the default file handle. If you use select() to change the default file handle, $% will change to reflect the page number of the newly selected file handle. $= Holds the current page length for the default file handle. Changing the default file handle will change $= to reflect the page length of the new file handle. $- Holds the number of lines left to print for the default file handle. Changing the default file handle will change $- to reflect the number of lines left to print for the new file handle. $~ Holds the name of the default line format for the default file handle. Normally, it is equal to the file handle's name. $^ Holds the name of the default heading format for the default file handle. Normally, it is equal to the file handle's name with _TOP appended to it. $| If nonzero, will flush the output buffer after every write() or print() function. Normally, it is set to 0. $$ This UNIX-based variable holds the process number of the process running the Perl interpreter. $? Holds the status of the last pipe close, back-quote string, or system() function. You can find more information about the $? Variable in [33]Chapter 13, "Handling Errors and Signals." $& Holds the string that was matched by the last successful pattern match. $` Holds the string that preceded whatever was matched by the last successful pattern match. $´ Holds the string that followed whatever was matched by the last successful pattern match. $+ Holds the string matched by the last bracket in the last successful pattern match. For example, the statement /Fieldname: (.*)|Fldname: (.*)/ && ($fName = $+); will find the name of a field even if you don't know which of the two possible spellings will be used. $* Changes the interpretation of the ^ and $ pattern anchors. Setting $* to 1 is the same as using the /m option with the regular expression matching and substitution operators. Normally, $* is equal to 0. $0 Holds the name of the file containing the Perl script being executed. $ This group of variables ($1, $2, $3, and so on) holds the regular expression pattern memory. Each set of parentheses in a pattern stores the string that match the components surrounded by the parentheses into one of the $ variables. $[ Holds the base array index. Normally, it's set to 0. Most Perl authors recommend against changing it without a very good reason. $] Holds a string that identifies which version of Perl you are using. When used in a numeric context, it will be equal to the version number plus the patch level divided by 1000. $" This is the separator used between list elements when an array variable is interpolated into a double-quoted string. Normally, its value is a space character. $; Holds the subscript separator for multidimensional array emulation. Its use is beyond the scope of this book. $! When used in a numeric context, holds the current value of errno. If used in a string context, will hold the error string associated with errno. For more information about errno, see [34]Chapter 13, "Handling Errors and Signals." $@ Holds the syntax error message, if any, from the last eval() function call. For more information about errno, see [35]Chapter 13, "Handling Errors and Signals." $< This UNIX-based variable holds the read uid of the current process. $> This UNIX-based variable holds the effective uid of the current process. $) This UNIX-based variable holds the read gid of the current process. If the process belongs to multiple groups, then $) will hold a string consisting of the group names separated by spaces. $: Holds a string that consists of the characters that can be used to end a word when word-wrapping is performed by the ^ report formatting character. Normally, the string consists of the space, newline, and dash characters. $^D Holds the current value of the debugging flags. For more information, see [36]Chapter 16, "Debugging Perl." $^F Holds the value of the maximum system file description. Normally, it's set to 2. The use of this variable is beyond the scope of this book. $^I Holds the file extension used to create a backup file for the in-place editing specified by the -i command line option. For example, it could be equal to ".bak." $^L Holds the string used to eject a page for report printing. [37]Chapter 11, "Creating Reports," shows how to use this variable to create simple footers. $^P This variable is an internal flag that the debugger clears so it will not debug itself. $^T Holds the time, in seconds, at which the script begins running. $^W Holds the current value of the -w command line option. $^X Holds the full pathname of the Perl interpreter being used to run the current script. $ARGV Holds the name of the current file being read when using the diamond operator (<>). @ARGV This array variable holds a list of the command line arguments. You can use $#ARGV to determine the number of arguments minus one. @F This array variable holds the list returned from autosplit mode. Autosplit mode is associated with the -a command line option. @Inc This array variable holds a list of directories where Perl can look for scripts to execute. The list is mainly used by the require statement. You can find more information about require statements in [38]Chapter 15, "Perl Modules." %Inc This hash variable has entries for each filename included by do or require statements. The key of the hash entries are the filenames, and the values are the paths where the files were found. %ENV This hash variable contains entries for your current environment variables. Changing or adding an entry affects only the current process or a child process, never the parent process. See the section "Example: Using the %ENV Variable" later in this chapter. %SIG This hash variable contains entries for signal handlers. For more information about signal handlers, see [39]Chapter 13, "Handling Errors and Signals." _ This file handle (the underscore) can be used when testing files. If used, the information about the last file tested will be used to evaluate the new test. DATA This file handle refers to any data following __END__. STDERR This file handle is used to send output to the standard error file. Normally, this is connected to the display, but it can be redirected if needed. STDIN This file handle is used to read input from the standard input file. Normally, this is connected to the keyboard, but it can be changed. STDOUT This file handle is used to send output to the standard output file. Normally, this is the display, but it can be changed. Table 12.2 puts the variables into different categories so you can see how they relate to one another. This organization is better than Table 12.1 when you are creating your own programs. Some of the categories covered in Table 12.2 have their own chapters. The subheadings in the table point out which chapter you can look at for more information. Table 12.2 Perl's Special Variables Variable Name Description Variables That Affect Arrays $" The separator used between list elements when an array variable is interpolated into a double-quoted string. Normally, its value is a space character. $[ Holds the base array index. Normally, set to 0. Most Perl authors recommend against changing it without a very good reason. $; Holds the subscript separator for multidimensional array emulation. Its use is beyond the scope of this book. For a more in-depth look at Perl programming, see Que's Special Edition Using Perl for Web Programming. Variables Used with Files (See [40]Chapter 9 "Using Files") $. This variable holds the current record or line number of the file handle last read. It is read-only and will be reset to 0 when the file handle is closed. $/ This variable holds the input record separator. The record separator is usually the newline character. However, if $/ is set to an empty string, two or more newlines in the input file will be treated as one. $| This variable, if nonzero, will flush the output buffer after every write() or print() function. Normally, it is set to 0. $^F This variable holds the value of the maximum system file description. Normally, it's set to 2. The use of this variable is beyond the scope of this book. $ARGV This variable holds the name of the current file being read when using the diamond operator (<>). _ This file handle (the underscore) can be used when testing files. If used, the information about the last file tested will be used to evaluate the latest test. DATA This file handle refers to any data following __END__. STDERR This file handle is used to send output to the standard error file. Normally, this is connected to the display, but it can be redirected if needed. STDIN This file handle is used to read input from the standard input file. Normally, this is connected to the keyboard, but it can be changed. STDOUT This file handle is used to send output to the standard output file. Normally, this is the display, but it can be changed. Variables Used with Patterns (See [41]Chapter 10, "Regular Expressions") $& This variable holds the string that was matched by the last successful pattern match. $` This variable holds the string that preceded whatever was matched by the last successful pattern match. $´ This variable holds the string that followed whatever was matched by the last successful pattern match. $+ This variable holds the string matched by the last bracket in the last successful pattern match. For example, the statement /Fieldname: (.*)|Fldname: (.*)/ && ($fName = $+); will find the name of a field even if you don't know which of the two possible spellings will be used. $* This variable changes the interpretation of the ^ and $ pattern anchors. Setting $* to 1 is the same as using the /m option with the regular expression matching and substitution operators. Normally, $* is equal to 0. $ This group of variables ($1, $2, $3, and so on) holds the regular expression pattern memory. Each set of parentheses in a pattern stores the string that matches the components surrounded by the parentheses into one of the $ variables. Variables Used with Printing $, This variable is the output separator for the print() function. Normally, this variable is an empty string. However, setting $, to a newline might be useful if you need to print each element in the parameter list on a separate line. $\ The variable is added as an invisible last element to the parameter list passed to the print() function. Normally, it's an empty string, but if you want to add a newline or some other suffix to everything that is printed, you can assign the suffix to $\. $# This variable is the default format for printed numbers. Normally, it's set to %.20g, but you can use the format specifiers covered in by the section "Example: Printing Revisited" in [42]Chapter 9to specify your own default format. Variables Used with Processes (See [43]Chapter 13, "Handling Errors and Signals") $$ This UNIX-based variable holds the process number of the process running the Perl interpreter. $? This variable holds the status of the last pipe close, back-quote string, or system() function. More information about the $? variable can be found in [44]Chapter 13, "Handling Errors and Signals." $0 This variable holds the name of the file containing the Perl script being executed. $] This variable holds a string that identifies which version of Perl you are using. When used in a numeric context, it will be equal to the version number plus the patch level divided by 1000. $! This variable, when used in a numeric context, holds the current value of errno. If used in a string context, it will hold the error string associated with errno. For more information about errno, see [45]Chapter 13, "Handling Errors and Signals." $@ This variable holds the syntax error message, if any, from the last eval() function call. For more information about errno, see [46]Chapter 13, "Handling Errors and Signals." $< This UNIX-based variable holds the read uid of the current process. $> This UNIX-based variable holds the effective uid of the current process. $) This UNIX-based variable holds the read gid of the current process. If the process belongs to multiple groups, then $) will hold a string consisting of the group names separated by spaces. $^T This variable holds the time, in seconds, at which the script begins running. $^X This variable holds the full pathname of the Perl interpreter being used to run the current script. %ENV This hash variable contains entries for your current environment variables. Changing or adding an entry will affect only the current process or a child process, never the parent process. See the section "Example: Using the %ENV Variable" later in this chapter. %SIG This hash variable contains entries for signal handlers. For more information about signal handlers, see [47]Chapter 13, "Handling Errors and Signals." Variables Used with Reports (see [48]Chapter 11, "Creating Reports") $% This variable holds the current page number for the default file handle. If you use select() to change the default file handle, $% will change to reflect the page number of the newly selected file handle. $= This variable holds the current page length for the default file handle. Changing the default file handle will change $= to reflect the page length of the new file handle. $- This variable holds the number of lines left to print for the default file handle. Changing the default file handle will change $- to reflect the number of lines left to print for the new file handle. $~ This variable holds the name of the default line format for the default file handle. Normally, it is equal to the file handle's name. $^ This variable holds the name of the default heading format for the default file handle. Normally, it is equal to the file handle's name with _TOP appended to it. $: This variable holds a string that consists of the characters that can be used to end a word when word-wrapping is performed by the ^ report formatting character. Normally, the string consists of the space, newline, and dash characters. $^L This variable holds the string used to eject a page for report printing. [49]Chapter 11, "Creating Reports," shows how to use this variable to create simple footers. Miscellaneous Variables $_ This variable is used as the default parameter for a lot of functions. $^D This variable holds the current value of the debugging flags. For more information, see [50]Chapter 16, "Debugging Perl." $^I This variable holds the file extension used to create a backup file for the in-place editing specified by the -i command line option. For example, it could be equal to ".bak." $^P This variable is an internal flag that the debugger clears so that it will not debug itself. $^W This variable holds the current value of the -w command line option. @ARGV This array variable holds a list of the command line arguments. You can use $#ARGV to determine the number of arguments minus one. @F This array variable holds the list returned from autosplit mode. Autosplit mode is associated with the -a command line option. @Inc This array variable holds a list of directories where Perl can look for scripts to execute. The list is used mainly by the require statement. You can find more information about require statements in [51]Chapter 15, "Perl Modules." %Inc This hash variable has entries for each filename included by do or require statements. The key of the hash entries are the filenames and the values are the paths where the files were found. Most of these variables are discussed in other chapters of the book, and some of the variables are simple enough to use that you don't need to see examples by this time. However, the DATA file handle and the %ENV associated array deserve some additional mention. They are discussed in the following sections. Example: Using the DATA File Handle As you no doubt realize by now, Perl has some really odd features, and the DATA file handle is one of them. This file handle lets you store read-only data in the same file as your Perl script, which might come in handy if you need to send both code and data to someone via e-mail. When using the DATA file handle, you don't need to open or close the file handle-just start reading from the file handle using the diamond operator. The following simple example shows you how to use the DATA file handle. [pseudo.gif] Read all the lines that follow the line containing __END__. Loop through the @lines array, printing each element. Everything above the __END__ line is code; everything below is data. @lines = ; foreach (@lines) { print("$_"); } __END__ Line one Line two Line three This program displays the following: Line one Line two Line three Example: Using the %ENV Variable Environment variables are used by the operating system to store bits of information that are needed to run the computer. They are called environment variables because you rarely need to use them and because they simply remain in the background-just another part of the overall computing environment of your system. When your Perl process is started, it is given a copy of the environment variables to use as needed. You can change the environment variables, but the changes will not persist after the process running Perl is ended. The changes will, however, affect the current process and any child processes that are started. You can print out the environment variables by using these lines of code: foreach $key (keys(%ENV)) { printf("%-10.10s: $ENV{$key}\n", $key); } On my Windows 95 machine, this program displays the following: WINBOOTDIR: C:\WINDOWS TMP : C:\WINDOWS\TEMP PROMPT : $p$g CLASSPATH : .\;e:\jdk\classes; TEMP : C:\WINDOWS\TEMP COMSPEC : C:\WINDOWS\COMMAND.COM CMDLINE : perl -w 12lst01.pl BLASTER : A220 I10 D3 H7 P330 T6 WINDIR : C:\WINDOWS PATH : C:\WINDOWS;C:\WINDOWS\COMMAND;C:\PERL5\BIN; TZ : GMT-05:00 Only a few of these variables are interesting. The TMP and TEMP variables let you know where temporary files should be placed. The PATH variable lets the system know where to look for executable programs. It will search each directory in the list until the needed file is found. The TZ variable lets you know which time zone the computer is running in. The most useful variable is probably the PATH statement. By changing it, you can force the system to search the directories you specify. This might be useful if you suspect that another program of the same name resides in another directory. By placing the current directory at the beginning of the PATH variable, it will be searched first and you'll always get the executable you want. For example: $ENV{"PATH"} = ".;" . $ENV{"PATH"}; A single period is used to refer to the current directory, and a semicolon is used to delimit the directories in the PATH variable. So this statement forces the operating system to look in the current directory before searching the rest of the directories in PATH. Environment variables can be useful if you want a quick way to pass information between a parent and a child process. The parent can set the variables, and the child can read it. Summary This chapter gathered into one location all the special variables used by Perl. Most of the variables have already been discussed in previous chapters, and a few will be discussed in later chapters. Table 12.1 was organized to follow the PERLVAR.htm document that comes in the Perl distribution, so if you aren't familiar with a variable used in someone else's code, that's the place to look. The variables are basically ordered alphabetically. Table 12.2 was organized according to functionality. Some variables are used with files, some with arrays, and so forth. You saw an example of how to use the DATA file handle to read information from the same file that holds the Perl script. The %ENV variable was also discussed. This hash is used to hold the environmental variables used mostly by the operating system. In the next chapter, "Handling Errors and Signals," you learn about how to handle error conditions, use the eval() function, and other things dealing with exceptions that can happen while your program runs. Review Questions Answers to Review Questions are in Appendix A. 1. What is the $/ variable used for? 2. What file handle is used to avoid a second system call when doing two or more file tests? 3. What will the following program display? $_ = "The big red shoe"; m/[rs].*\b/; print("$`\n"); 4. What variable holds the value of the last match string? 5. What will the following program display? @array = (1..5); $" = "+"; print("@array\n"); 6. What does the following program display? @array = ('A'..'E'); foreach (@array) { print(); } $\ = "\n"; foreach (@array) { print(); } Review Exercises 1. Write a program that changes the array element separator used in interpolation of arrays inside double-quoted strings to be a comma instead of a space. 2. Write a program that displays which version of the Perl interpreter you are running. 3. Create a file in your temporary directory. (Hint: use the %ENV special variable.) 4. Write a program that uses the $\ to end each printed element with an ":END" string. 5. Write a program that prints the last record in a file. The records should be variable-length, but each record starts with the string "START:". (Hint: look at the $/ variable.) ____________________________________________________________________________________________________________________ [52][pc.gif] [53][cc.gif] [54][hb.gif] [55][nc.gif] _________________________________________________________________________________________________________________________ ____________________________________________________________________________________________________________ Chapter 13 Handling Errors and Signals ____________________________________________________________________________________________________________________ CONTENTS * [27]Checking for Errors * [28]Example: Using the errno Variable + [29]Example: Using the or Logical Operator + [30]Example: Using the die() Function + [31]Example: Using the warn() Function * [32]Trapping Fatal Errors + [33]Example: Using the eval() Function * [34]What Is a Signal? + [35]Example: How to Handle a Signal * [36]Summary * [37]Review Questions * [38]Review Exercises ____________________________________________________________________________________________________________________ Most of the examples in this book have been ignoring the fact that errors can and probably will occur. An error can occur because the directory you are trying to use does not exist, the disk is full, or any of a thousand other reasons. Quite often, you won't be able to do anything to recover from an error, and your program should exit. However, exiting after displaying a user-friendly error message is much preferable than waiting until the operating system or Perl's own error handling takes over. After looking at errors generated by function calls, we'll look at a way to prevent certain normally fatal activities-like dividing by zero-from stopping the execution of your script; this is by using the eval() function. Then, you'll see what a signal is and how to use the %SIG associative array to create a signal handling function. Checking for Errors There is only one way to check for errors in any programming language. You need to test the return values of the functions that you call. Most functions return zero or false when something goes wrong. So when using a critical function like open() or sysread(), checking the return value helps to ensure that your program will work properly. Perl has two special variables-$? and $!-that help in finding out what happened after an error has occurred. The $? variable holds the status of the last pipe close, back-quote string, or system() function. The $! variable can be used in either a numeric or a string context. In a numeric context it holds the current value of errno. If used in a string context, it holds the error string associated with errno. The variable, errno, is pre-defined variable that can sometimes be used to determine the last error that took place. Caution You can't rely on these variables to check the status of pipes, back-quoted strings, or the system() function when executing scripts under the Windows operating system. My recommendation is to capture the output of the back-quoted string and check it directly for error messages. Of course, the command writes its errors to STDERR and then can't trap them, and you're out of luck. Once you detect an error and you can't correct the problem without outside intervention, you need to communicate the problem to the user. This is usually done with the die() and warn() functions. Example: Using the errno Variable When an error occurs, it is common practice for UNIX-based functions and programs to set a variable called errno to reflect which error has occurred. If errno=2, then your script tried to access a directory or file that did not exist. Table 13.1 lists 10 possible values the errno variable can take, but there are hundreds more. If you are interested in seeing all the possible error values, run the program in Listing 13.1. Table 13.1 Ten Possible Values for errno Value Description 1 Operation not permitted 2 No such file or directory 3 No such process 4 Interrupted function call 5 Input/output error 6 No such device or address 7 Arg list too long 8 Exec format error 9 Bad file descriptor 10 No child processes [pseudo.gif] Loop from 1 to 10,000 using $! as the loop variable. Evaluate the $! variable in a string context so that $errText is assigned the error message associated with the value of $!. Use chomp() to eliminate possible newlines at the end of an error message. Some of the messages have newlines, and some don't. Print the error message if the message is not Unknown Error. Any error value not used by the system defaults to Unknown Error. Using the if statement modifier ensures that only valid error messages are displayed. ____________________________________________________________________________________________________________________ Listing 13.1 13LST01.PL-A Program to List All Possible Values for errno for ($! = 1; $! <= 10000; $!++) { $errText = $!; chomp($errText); printf("%04d: %s\n", $!, $errText) if $! ne "Unknown Error"; } ____________________________________________________________________________________________________________________ Under Windows 95, this program prints 787 error messages. Most of them are totally unrelated to Perl. Example: Using the or Logical Operator Perl provides a special logical operator that is ideal for testing the return values from functions. You may recall that the or operator will evaluate only the right operand if the left operand is false. Because most functions return false when an error occurs, you can use the or operator to control the display of error messages. For example: chdir('/user/printer') or print("Can't connect to Printer dir.\n"); This code prints only the error message if the program can't change to the /user/printer directory. Unfortunately, simply telling the user what the problem is, frequently, is not good enough. The program must also exit to avoid compounding the problems. You could use the comma operator to add a second statement to the right operand of the or operator. Adding an exit() statement to the previous line of code looks like this: chdir('/usr/printer') or print("failure\n"), exit(1); print("success\n"); I added the extra print statement to prove that the script really exits. If the printer directory does not exist, the second print statement is not executed. Note At the shell or DOS, a zero return value means that the program ended successfully. While inside a Perl script, a zero return value frequently means an error has occurred. Be careful when dealing with return values; you should always check your documentation. Using the comma operator to execute two statements instead of one is awkward and prone to misinterpretation when other programmers look at the script. Fortunately, you can use the die() function to get the same functionality. Example: Using the die() Function The die() function is used to quit your script and display a message for the user to read. Its syntax is die(LIST); The elements of LIST are printed to STDERR, and then the script will exit, setting the script's return value to $! (errno). If you were running the Perl script from inside a C program or UNIX script, you could then check the return value to see what went wrong. The simplest way to use the die() function is to place it on the right side of the or operator chdir('/user/printer') or die(); which displays Died at test.pl line 2. if the /user/printer directory does not exist. The message is not too informative, so you should always include a message telling the user what happened. If you don't know what the error might be, you can always display the error text associated with errno. For example: chdir('/user/printer') or die("$!"); This line of code displays No such file or directory at test.pl line 2. This error message is a bit more informative. It's even better if you append the text , stopped to the error message like this: chdir('/user/printer') or die("$!, stopped"); which displays No such file or directory, stopped at test.pl line 2. Appending the extra string makes the error message look a little more professional. If you are really looking for informative error messages, try this: $code = "chdir('/user/printer')"; eval($code) or die("PROBLEM WITH LINE: $code\n$! , stopped"); which displays the following: PROBLEM WITH LINE: chdir('/user/printer') No such file or directory , stopped at test.pl line 3. The eval() function is discussed in the section, "Example: Using the eval() Function," later in this chapter. Therefore, I won't explain what this code is doing other than to say that the eval() function executes its arguments as semi-isolated Perl code. First, the Perl code in $code is executed and then, if an error arises, the Perl code in $code is displayed as text by the die() function. If you don't want die() to add the script name and line number to the error, add a newline to the end of the error message. For example: chdir('/user/printer') or die("$!\n"); displays the following No such file or directory Example: Using the warn() Function The warn() function has the same functionality that die() does except the script is not exited. This function is better suited for nonfatal messages like low memor