Intro. to Unix
Spring 2000

Homework #3, Test #2

Due Date:

Submit to introunix-submit@cs.rpi.edu with the subject line set to "HW3"

HW3/TEST2 FAQ

sed script | stack commands | listcommands shellscript | bash startup file on RCS | awk itar

There are 5 problems listed below - you need to do 3 of them. The grade for the 3 problems you pick will be your grade for HW3 and for TEST #2 (as discussed in class).

We will only grade 3 problems! All questions have equal weight in grading.


Problem #1: Sed script

You need to write an sed script that will extract the title from an HTML document (HTML details shown below). Your script must be submitted as a file (not as a sample command line)! The file should be named gettitle and the contents of the file should be an sed script. This is not a shell script - you cannot use anything except sed! For example, here is a sample sed script:
#!/bin/sed -nf
# Here is my script it prints all lines with the word HI
/HI/p

Note that the complete path to sed may be different on different machines (it might be /usr/bin/sed). Use whatever works for you.

  • HTML and HTML TITLE
    To complete this problem you need to understand a little about HTML files. HTML files are text files that contain tags that are used (by a web browser) to control the display of the document. For example you can use tags to tell the browser that a chunk of text should be drawn in boldface.

    Each HTML tag is contained within the characters < and >, here is some example HTML that contains some tags:

    This is normal text.
    <B>This text should be bold</B>
    <I>This text should be italics</I>
    

    Notice that there is a start tag for bold: <B> and an end tag for bold: </B> - this tells the browser that all the text between these tags should be rendered as bold.

    The title of an HTML document is found between the start and end tags <TITLE>, </TITLE>. Your job is to match this pattern using a regular expression in your sed script, and to print out only the stuff between the start and end tag.

  • Whitespace and case sensitivity
    HTML tags can have whitespace inside them (the tag name itself cannot contain whitespace, but there can be whitespace between the < and the tag name, etc). HTML tag names can be any mixture of upper and lower case! These are all legal HTML titles:
    <TITLE>My Title</TITLE>
    <  TITLE  >Another document name    <  /TITLE>
    <Title>blah</TITLE>
    <  title>Foo Foo Foo   <  /TiTLE>
    
  • Where to get sample HTML files
    When you visit a web page the HTML title is usually displayed on the titlebar of the browser window (Netscape and IE both do this). You can save the document with the "save as" menu item in the file menu, and then run your sed script giving the script the name of the HTML file as a command line argument. Assuming the file samp.html contains title tags that surround the string This is a sample document, running your script might look like this:

    > ls
    samp.html    foo.html    joe.html
    > gettitle samp.html
    This is a sample document
    > 
    
  • NOTES
    • Not all HTML documents have a title!
    • You can expect a single title in an HTML document.
    • The only HTML tags you need to worry about are the start and end tags for title: <TITLE> and </TITLE>.
    • Some HTML documents have the title split over multiple lines, like this:
      <TITLE>Joes house of pizza, wings and other
      good stuff to eat<TITLE>
      
      You don't need to deal with this possibility to get full credit for this problem (but I challenge you to try!).

  • Problem #2: Stack Commands

    This problem involves the creation of 2 new commands using whatever means you want (shell scripts, awk scripts, bash functions, whatever...). The two new commands are push and pop and together they implement a simple stack where you can store simple strings.

    Stack A stack is a list in which whatever goes in last comes out first. The push command adds something new (in our case a single line of text) to the stack, and the pop command gets the last thing put on the stack and removes it from the stack. Here is what a sample session might look like, note that in this session the Unix shell prompt is show as ">":

    > push Hi Dave
    > push The second line
    > pop
    The second line
    > pop
    Hi Dave
    > pop
    error - the stack is empty!
    

    Some Issues:

    • The current contents of the stack need to be stored somewhere! The simple solution is to use a file, although you could also think about using an environment variable (although in this case each time the shell exits the contents of the stack will be lost - this is fine for this assignment, just something you should keep in mind when debugging!).
    • I'm not asking for commands other than push and pop, but you might want to use some to help debug (for example, it might be helpful to you to have a command that show the current stack).
    • The push command should put all it's command line arguments on to the stack as a single item (think of the stack as a stack of lines of text - you give push a line of text). The push command doesn't need to print anything out, but feel free to have it do so if you like.
    • The pop command should print out a single line of text (whatever was on the top of the stack) and remove this item from the stack. If the stack is empty when the push command is run it should print something like "error - the stack is empty".
    • IMPORTANT: Please make sure you understand what I want here: I want 2 new commands that can be entered at the unix prompt, not a fancy program that allows me to play with a stack. I need to be able to do something like this: push $HOME or even this: svar=`pop`.

    Problem #3: Shell script that lists all commands in your path

    For this problem you need to write a shell script that prints out a sorted list of all the commands available at the unix prompt. You do not need to list shell internal commands, aliases or functions. You should simply use the PATH environment variable to find all executable files in your path.

    The output of this command (name the shell script "listcommands") must be sorted alphabetically!. Here is an example of part of what might be output when you run listcommands:
    > listcommands
    [
    a2p
    a2ps
    a2ps
    aafire
    aafire
    aainfo
    aainfo
    aasavefont
    aasavefont
    aatest
    aatest
    ac
    access
    access
    accton
    aclocal
    aclocal
    acroread
    acroread
    addftinfo
    ...
    
    NOTES:

    • It may be easy to find something that already lists all the commands in your path (an existing command) - you can't do this! You need to write a shell script that looks in all the directories in your $PATH.

    Problem #4: Find and document system-wide bash startup file

    This one is for those who don't want to write code! Your job is to find the the system-wide shell startup file for bash on RCS, and to document (explain) all the commands in this file. Finding the file should not be difficult (try "man bash" !), but you will probably need to spend some time figuring out what the commands in the file actually do!. I expect a detailed explaination of everything in the startup file - not broad, general statements.

    Problem #5: Awk Script version of itar
    I want the itar command from HW2, but this time you must write it completely in awk!