CS303E Assignment 10

Due: Thursday, July 18

File Name: (Use this to complete your program.): movies.py

Submit to Assignment 10 on Gradescope via Canvas

Purpose: To practice working with files, lists, functions, and strings

Limitations: You may only use the Python syntax and language features covered in the book chapters 1 - 8.

Rotten Tomatoes is a website that combines reviews of film and television shows from multiple sources. The Sentiment Analysis Project at Stanford University created data files from reviews of movies drawn from the Rotten Tomatoes website.

Files provided for the assignment:

movies.py - A starter file with some of the code written and other functions you must complete defined.
movie_50.txt - A smaller file with about 1/50th of all the reviews from the nifty assignment
movie_3.txt - A larger file with about 1/3rd of all the reviews from the nifty assignment
words_1.txt - A file with words to search for.

The reviews in the provided files are 1 review per line.

The first element (or token) in each line is a number 0 through 4. This represents the overall review based on the following scale:

0 : negative
1 : somewhat negative
2 : neutral
3 : somewhat positive
4 : positive

The rest of the line is the relatively short written review. Note, the reviews have been processed to pull apart some words with punctuation. Here are some examples:

1 An unsophisticated sci-fi drama that takes itself all too seriously .
3 A deeply felt and vividly detailed story about newcomers in a strange new world .
4 Intriguing and stylish .
0 The problem is the needlessly poor quality of its archival prints and film footage .
1 Too silly to take seriously .
4 Amid the new populist comedies that underscore the importance of family tradition and familial community , one would be hard-pressed to find a movie with a bigger , fatter heart than Barbershop .
0 A thoroughly awful movie -- dumb , narratively chaotic , visually sloppy ... a weird amalgam of ' The Thing ' and a geriatric ' Scream . '

Our program shall ask the user for the file with the reviews and then read in the file. The reviews shall be stored as a list of lists. Each element in the overall list is a list that represents a single review and shall contain the overall score for the review as the first element and the rest of the review stored as strings. Recall the split() method for strings can take a large string and easily break it up into a list of smaller strings with all whitespace (spaces, tabs, newline characters) removed. Do not try and split based on anything other than whitespace. So for example we will treat ... as a word.

We then present the user a menu with 3 choices:

1. See average rating for a word. With this choice we ask the user for a word. You may assume the user does not enter any spaces. We then find all the reviews that contain that word ignoring case. The in operator, a Python keyword, for lists makes this very easy. You can simply say

string in list

where string and list are the string and list variables to see if the string is present in the list. Of course you will have to do some conversions to ignore case. Call the lower() function on a String to get a version of the String with all characters that have a lower case equivalent, converted to that lower case value. Recall, Strings are immutable in Python so the lower() function returns a new, lower case version of the String you call the function on. It does not alter the original String.

s1 = 'SamPLe12' s1.lower() # Fairly pointless. s1 still refers to a String equal to 'SamPLe12' s1 = s1.lower() # Now s1 refers to a String equal to 'sample12'

We then display the number of review(s) that contained the word and the average rating. See the example output file for examples and formatting.

2. Show average reviews for all words in a file. When the user makes this choice we ask for a file name. We then read from the file and treat each line as a word. For each word we show the same results as option 1 when the user typed in the word. You may assume each line of the file is a single word with no spaces.

3. Display the longest review. Shows the longest review based on the number of words. This is based on the number of words in the review, NOT the total number of characters. Note, every element of the list, except the first number, the overall rating, is treated as a word. Even things like a single quote or a period. If there is a tie for the longest review, display the one that appears first in the file.

Complete the functions in the provided movies.py file. Add other functions if you think they are necessary to provide structure and / or remove redundancy.

Here is a file with multiple runs of the program. Given the same inputs and the same seed your output shall match this exactly.

I strongly recommend you check your output and the expected output with a diff program such as https://www.diffchecker.com/.

Recall, you must complete the header in the provided version of movies.py.