New Tool at HUBzero Work Space

The HUBzero instance at Indiana University, http://bamboo-hub0.dlib.indiana.edu, has a new tool available for Topic Modeling. I might also add that any short comings in the description of this tool is due entirely to me not the developer.

Parsing 2246 stories from the Associated Press, this tool creates topic groups and then analyzes the stories to see which topic groups the story most likely corresponds to.

The results are then returned to the result window, along with the list of stories and the topics generated. The results, stories and topics can all be downloaded. I found that downloading the results and importing it into Excel allowed me to sort and do some analysis to see which stories corresponded to which topic groups. 

If you have not set up an account yet on Bamboo's HUBzero test environment, you will need to register in order to access this tool and other tools. In the upper right hand corner of the home page is a Register button. Click and fill in the information. You will then be sent an email with a link. Click on the link and log in and your account is set up.

In addition to the Topic Modeling tool, we have set up a couple of other tools: the METS Page Turner and the Image Viewer to show how Java AWT applications can run in the rappture environment. To get to these tools, login and go to "my HUB" tab. On the left is a window called "My Tools". Choose the "All Tools" tab and you will see Topic Modeling, METS Page Turner, Image Viewer and Workspace. The Workspace is a general development area for tools and opens an X-windows environment with an x-term window.

For Topic Modeling, launch the Topic Modeling tool, then choose "AP Documents" in the Choose Data File drop down. If this tool is of interest, we can add more documents. Now click the Run button and wait for the results. After the tool runs, you can access the "Documents", the "Topic Document distribution" results, and the "Topics" from the Results drop down. The green button to the right allows you to download the different Results documents.

Image Viewer is a simple image display tool. When you click on the icon on the right or choose the text on the left and then click "Launch Tool", you get the simulation environment that HUBzero calls Rappture. To load the Image Viewer click the Simulate button and a window opens with a File menu. Open the menu and use the file dialog box to pick a file to view. If you choose a directory, you must click the Open button to open that directory. If you want to upload your own image files, go to the Workspace and in the x-term window type "filexfer". This opens a window that allows you to upload files from you machine to the HUBzero file system.

METS Page Turner is a simple app on the surface but how it displays the page images is interesting. First, when you click on the icon on the right or choose the text on the left and then click "Launch Tool", you get the simulation environment that HUBzero calls Rappture. You have some options on the left. At the top is a list of books, sheet music, etc. in the middle is the scale (type in 1000 for instance) and at the bottom you can display one or two pages. To run, click the Simulate button. When finished with the page turner, close the window and return to the Rappture page. You need to make a change in one of the options on the left, then you can run the Simulation again. What's interesting is that the document you choose is a link to a METS file on one of IU's Fedora Repository servers. The tool parses the XML, finds the image files which are also on Fedora and then displays the images back in the page turner.

Information on Topic Models can be found at:

Mark Steyvers; Tom Griffiths (2007) "Probablistic Topic Models" In: T. Landauer, D McNamara, S. Dennis, and W. Kintsch (eds), Handbook of Latent Semantic Analysis Psychology Press. ISBN 978-0-8058-5418-3

This implementation uses code from a java library called mallet which can be found at:

  McCallum, Andrew Kachites.  "MALLET: A Machine Learning for Language Toolkit."

    http://mallet.cs.umass.edu. 2002.