$ cat Count.py
import sys
from pyspark import SparkContextif __name__ == "__main__":
sc = SparkContext()
logfile = sys.argv[1]count = sc.textFile(logfile).filter(lambda line: '.jpg' in line).count()
print "JPG requests: ", countsc.stop()
$
$ spark-submit --master yarn-client Count.py /test/weblogs/*
Number of JPG requests: 10258
$