Quantcast
Channel: Autarchy of the Private CavePython »
Viewing all articles
Browse latest Browse all 15

Simple substring counting script in Python

0
0

Approximately a month ago I endeavoured to use Python as my main shell-scripting language. At that moment, I was already aware of multiple benefits you get when you use Python for scripting:

  • source-level cross-platform scripting: your script will run anywhere, where Python compiles; expanding this statement – your script will run anywhere, where there is a C compiler (needed to build Python itself)
  • high-level language: you can iterate all the lines in a text file with as little as one ‘for’-statement, for example (see the actual example below)
  • simple/minimalist syntax: no curly braces around blocks of statements, no semicolons after each and every line of code, etc. Python at a glance looks much more understandable, than, for example, Perl.
  • the power of C in a language-interpreting system
  • it is interpreted! This gives easyness of debugging: modify, execute, see the trouble – with no compile/link stages
  • and, despite being interpreted, it is fast!

For the comparison (in speed, memory use, program size) with other computer programming languages, please see the “Computer Language Shootout Benchmarks”. Here I provide the link only to the comparison of Python with Perl and comparison of Python with PHP (which can also be used as shell-scripting language, albeit after some tinkering with settings and stuff)

Below is an example of the 2-minute script in Python, which counts the number of occurrences of some string in a file.

  1. """Read FILE and count number of occurences of SUBSTR."""
  2. version = 0.01
  3.  
  4. import sys
  5.  
  6. def main():
  7.   from optparse import OptionParser
  8.   opts = OptionParser(usage="%prog [options] FILE SUBSTR",
  9.     version="%prog " + str(version),
  10.     description="Read FILE and count number of occurences of SUBSTR.")
  11.   opts.set_defaults(verbose=False,flush=False)
  12.   opts.add_option("-v", "--verbose", action="store_true", dest="verbose", help="Print every line containing substr [default: %default]")
  13.   opts.add_option("-f", "--flush", action="store_true", dest="flush", help="When verbose, flush every line [default: %default]")
  14.   (options, args) = opts.parse_args()
  15.  
  16.   if len(args) != 2:
  17.     print "Two arguments required for correct processing"
  18.     opts.print_help()
  19.     sys.exit(2)
  20.  
  21.   infile = args[0]
  22.   substr = args[1]
  23.   lines_count = 0
  24.   substr_count = 0
  25.   lines_substr_count = 0
  26.   if options.verbose and not options.flush:
  27.     msg = ""
  28.  
  29.   f = open(infile, 'r')
  30.   for line in f:
  31.     lines_count += 1
  32.     found = line.count(substr)
  33.     substr_count += found
  34.     if found > 0:
  35.       lines_substr_count += 1
  36.       if options.verbose and not options.flush:
  37.         msg += str(found) + ": " + line
  38.       elif options.verbose and options.flush:
  39.         print (str(found) + ": " + line).replace("n","")
  40.  
  41.   f.close()
  42.  
  43.   if options.verbose and not options.flush:
  44.     print msg
  45.   print "Lines read from file: ", str(lines_count)
  46.   print "Lines with substring found: ", str(lines_substr_count)
  47.   print "Total substrings detected: ", str(substr_count)
  48.  
  49.   return
  50.  
  51. if __name__ == "__main__":  main()

StumbleUponDeliciousCiteULikeShare


Viewing all articles
Browse latest Browse all 15

Latest Images

Trending Articles





Latest Images