See All Titles |
![]() ![]() First Python ApplicationNow that we are familiar with the syntax, style, variable assignment and memory allocation, it is time to look at a more complex example of Python programming. Many of the things in this program will be parts of Python which may have unfamiliar constructs, but we believe that Python is so simple and elegant that the reader should be able to make the appropriate conclusions upon examination of the code. The source file we will be looking at is fgrepwc.py, named in honor of the two Unix utilities of which this program is a hybrid. fgrep is a simple string searching command. It looks at a text file line by line and will output any line for which the search string appears. Note that a string may appear more than once on a line. wc is another Unix command; this one counts the number of characters, words, and lines of an input text file. Our version does a little of both. It requires a search string and a filename, and outputs all lines with a match and concludes by displaying the total number of matching lines found. Because a string may appear more than once on a line, we have to state that the count is a strict number of lines that match rather than the total number of times a search string appears in a text file. (One of the exercises at the end of the chapter requires the reader to "upgrade" the program so that the output is the total number of matches.) One other note before we take a look at the code: The normal convention for source code in this text is to leave out all comments, and place the annotated version on the CD-ROM. However, we will include comments for this example to aid you as you explore your first longer Python script with features we have yet to introduce. We now introduce fgrepwc.py, found below as Listing 3.1, and provide analysis immediately afterward. Example 3.1. File Find (fgrepwc.py)This application looks for a search word in a file and displays each matching line as well as a summary of how many matching lines were found. <$nopage> 001 1 #!/usr/bin/env python 002 2 003 3 "fgrepwc.py -- searches for string in text file" 004 4 005 5 import sys 006 6 import string 007 7 008 8 # print usage and exit 009 9 def usage(): 010 10 print "usage: fgrepwc [ -i ] string file" 011 11 sys.exit(1) 012 12 013 13 # does all the work 014 14 def filefind(word, filename): 015 15 016 16 # reset word count 017 17 count = 0 018 18 019 19 # can we open file? if so, return file handle 020 20 try: <$nopage> 021 21 fh = open(filename, 'r') <$nopage> 022 22 023 23 # if not, exit 024 24 except: <$nopage> 025 25 print filename, ":",sys.exc_info()[1] 026 26 usage() 027 27 028 28 # read all file lines into list and close 029 29 allLines = fh.readlines() 030 30 fh.close() 031 31 032 32 # iterate over all lines of file 033 33 for eachLine in allLines: 034 34 035 35 # search each line for the word 036 36 if string.find(eachLine, word) > -1: 037 37 count = count + 1 038 38 print eachLine, 039 39 040 40 # when complete, display line count 041 41 print count 042 42 043 43 # validates arguments and calls filefind() 044 44 def checkargs(): 045 45 046 46 # check args; 'argv' comes from 'sys' module 047 47 argc = len(sys.argv) 048 48 if argc != 3: 049 49 usage() 050 50 051 51 # call fgrepwc.filefind() with args 052 52 filefind(sys.argv[1], sys.argv[2]) 053 53 054 54 # execute as application 055 55 if __name__ == '__main__': 056 56 checkargs() 057 <$nopage> Lines 1–3The Unix start up line is followed by the module documentation string. If you import the fgrepwc module from another module, this string can be accessed with fgrepwc.__doc__. This is a key feature because it makes previously static text information available in a dynamic execution environment. We can also point out that what we described is usually the only use of the documentation string. It serves no other purpose, but it can double as a comment which is conveniently located at the top of a file. (We invite the reader to take a look at the documentation string at the commencement of the cgi module in the standard library for a serious example of module documentation.) Lines 5–6We've already seen the sys and string modules. The sys module contains mostly variables and functions that represent interaction between the Python interpreter and the operating system. You will find items in here such as the command-line arguments, the exit() function, the contents of the Python path environment variable PYTHONPATH, the standard files, and information on errors. The string module contains practically every function you'll need in processing strings, such as integer conversion via atoi() (and related functions), various string variables, and other string manipulation functions. The main motivation to provide modules to import is to keep the language small, light, fast, and efficient, and bring in only software that you need to get the job done. Plug'n'play with only the modules you need. Perl and Java have a similar setup, importing modules, packages, and the like, and to a certain extent so do C and C++ with the inclusion of header files. Lines 8–11We declare a function called usage() here which has no arguments/parameters. The purpose of this function is to simply display a message to the user indicating the proper command-line syntax with which to initiate the script, and exit the program with the exit() function, found in the sys module. We also mentioned that in the Python namespace, calling a function from an imported module requires a "fully-qualified" name. All imported variables and functions have the following formats: module.variable or module.function(). Thus we have sys.exit(). An alternative from-import statement allows the import of specific functions or variables from a module, bringing them into the current namespace. If this method of importing is used, only the attribute name is necessary. For example, if we wanted to import only the exit() function from sys and nothing else, we could use the following replacement: from sys import exit Then in the usage() function, we would call exit(1) and leave off the "sys.". One final note about exit(): The argument to sys.exit() is the same as the C exit() function, and that is the return value to the calling program, usually a command-line shell program. With that said, we point out that this "protocol" of printing usage and exiting applies only to command-line driven applications. In web-based applications, this would not be the preferred way to quit a running program, because the calling web browser is expecting an acceptable valid HTML response. For web applications, it is more appropriate to output an error message formatted in HTML so that end-users can correct their input. So, basically, no web application should terminate with an error. Exiting a program will send a system or browser error to the user, which is incorrect behavior and the responsibility falls on the website application developer. The same theory applies to GUI-based applications, which should not "crash out" of their executing window. The correct way to handle errors in such applications is to bring up an error dialog and notify the user and perhaps allow for a parameter change which may rectify the situation. Lines 13–41The core part of our Python program is the filefind() function. filefind() takes two parameters: the word the user is searching for, and the name of the file to search. A counter is kept to track the total number of successful matches (number of lines that contain the word). The next step is to open the file. The try-except construct is used to "catch" errors which may occur when attempting to open the file. One of Python's strengths is its ability to let the programmer handle errors and perform appropriate action rather than simply exiting the program. This results in a more robust application and a more acceptable way of programming. Chapter 10 is devoted to errors and exceptions. Barring any errors, the goal of this section of function is to open a file, read in all the lines into a buffer that can be processed later, and close the file. We took a sneak peek at files earlier, but to recap, the open() built-in function returns a file object or file handle, with which all succeeding operations are performed on, i.e., readlines() and close(). The final part of the function involves iterating through each line, looking for the target word. Searching is accomplished using the find() function from the string module. find() returns the starting character position (index) if there is a match, or -1 if the string does not appear in the line. All successful matches are tallied and matching lines are displayed to the user. filefind() concludes by displaying the total number of matching lines that were found. Lines 43–52The last function found in our program is checkargs(), which does exactly two things: checking for the correct number of command-line arguments and calling filefind() to do the real work. The command-line arguments are stored in the sys.argv list. The first argument is the program name and presumably, the second is the string we are looking for, and the final argument is the name of the file to search. Lines 54–56This is the special code we alluded to earlier: the code that determines (based on __name__) the different courses of action to take if this script was imported or executed directly. With the boilerplate if statement, we can be sure that checkargs() would not be executed if this module were imported, nor would we want it to. It exits anyway because the check for the command-line arguments would fail. If the code did not have the if statement and the main body of code consisted of just the single line to call checkargs(), then checkargs() would be executed whether this module was imported or executed directly. One final note regarding fgrepwc.py. This script was created to run from the command-line. Some work would be required, specifically interface changes, if you wanted to execute this from a GUI or web-based environment. The example we just looked at was fairly complex, but hopefully it was not a complete mystery, with the help of our comments in this section as well as any previous programming experience you may have brought. In the next chapter, we will take a closer look at Python objects, the standard data types, and how we can classify them.
|
© 2002, O'Reilly & Associates, Inc. |