This web application is designed to provide a course searching engine for A&M students, especially for the graduate students in the department of CSE.
Normally, the section information for previous semesters, such as instructor, average GPA, etc. can be obtained on A&M registar report. However, it’s inconvenient for us students to use, since it’s organized by semester. If we want to check a course for last 3 years, we would need to download 9 PDF reports and search that course in each of them. It’s not straightforward if we want to compare the results across different semesters. Usually, students spend hours collecting helpful data if they want enough information. Some of them may just give up and go with recent 1 or 2 grade reports since this process is quite tedious and time-consuming.
To solve this problem, this web application collect all course catalog data and section report data and stored them in database. It will show the information organized by each course. It is easy to compare section information across different semesters , thus to help students choose the right course section for them.
Meanwhile, students can search courses not only with the course ID or name, but also based on their degree programs. This application implements some course related information in the graduate brochure for graduate students in the department of CSE. Students can check them using categorical search.
Furthermore, the registar reports don’t have any other information like course full name, credit and description. This web application will provide such information to users for their convenience. Therefore, it is a complete and convenient tool for A&M students for choosing courses.
As mentioned, this web app provides two ways for users to get course information: keyword search and categorical search.
Users can search courses by entering any combination of “Department”, “Number” and “Name”, thus do fuzzy search. For example, if a user enters “CSCE” for “Department” and “608” for “Number”, only one course “CSCE 608 Database Systems” will be presented. If the the searching criterion is “Name: database”, CourseLookr will return a list of courses whose names consist of “database”. “CSCE 608 Database Systems” will be in this list. Students can open the course detail page here for detailed information. The course information can be updated by users on this detail page.
For graduate students in the department of CSE, this web app provides categorical search. Students can click their degree program on the home page. It will list the required course packages and course packages based on interest. To fulfill their degree plan, students must pick a specified number of courses from each required package. Meanwhile, CourseLookr also provides course packages based on interest. Students can click their interested packages and check courses inside. Students will see a list of courses in the package page. The course detail page can be accessed here. Users can also add or delete courses from packages to help this application better.
The data used in this application was retrieved from three sources as below.
- A&M course catalog
- course ID, name, credit, description
- URL: http://catalog.tamu.edu/
- A&M registar report
- previous section information
- URL: https://web-as.tamu.edu/gradereport/
- A&M CSE department graduate brochure
- graduate degree program information
- URL: https://engineering.tamu.edu/cse/academics/graduate-program
The catalog information is relatively easy to retrieve since it’s web-based and well structured. A web crawler built with python is used to get all information and save them in
csv format. 5309 rows was retrieved. The source code can be found at https://github.com/melonskin/scrapeCourse.
First, A web crawler is used to download those reports for 3-year period, which is in PDF-format. Then, a OCR API developed by google is implemented to convert those PDF into txt files. Those data is processed, cleaned and saved in
csv format. 48491 rows were retrieved. The code is at https://github.com/melonskin/scrapeCourse.
CSE graduate brochure
The degree program and course package information are input manually, since it is relatively short and need some understanding.
E-R diagram of DB
The E-R diagram is plotted as Fig. 1.
This database design has five entities:
Interest. The relations between them are:
Courseone to many
Programone to many
Programmany to many
Packagemany to many
Interestmany to many
With the database design mentioned above, the E-R diagram can be generated.
For the application,
Section are used to store the course information crawled online. User do keyword search in
Course table based on its attributes. For every course, we could query its sections to check.
Interest consist of degree plan information in the graduate brochure. They basically support the categorical search service. For example,
Course have a relationship between them, it represents a course package consists of several courses. Therefore, when user queries for the courses of a package, the application will return a list of courses needed.
Based on the E-R diagram, the database schema is constructed.
There are 8 tables in total. Five of them store five entities. Other 3 are tables consisting of two foreign keys columns handling many-to-many relations between those entities. They are
The structures of tables are shown as below.
Note that an auto-incremented
id attribute is introduced in this table to be the primary key (PRI). The reason is that if we use
number as primary keys, we basically cannot change those values, otherwise, it will cause update anomalies to other tables with this attribute as a foreign key (FOR). This update is totally possible, such as from
CSCE as I know. So to be safe, this unique
id in introduced.
Similarly for other “ship” tables.
Table normalization is unnecessary for our current database design to satisfy BCNF. The definition of BCNF is:
A relation is in BCNF if and only if: whenever there is a nontrivial FD for , it is the case that is a superkey for
For those tables with a unique
id attribute, we can observe that there are no FD within other attributes, so any non-trivial FD must consist of
id attribute on the left side, thus every tuples on the left side will be unique, in other words, superkey. So these tables are in BCNF. Similarly, for other tables with multiple primary keys, any non-trivial FD must have those primary keys on the left side. Meanwhile, the combination of values of primary keys for a record is unique. Therefore, tuples on the left side of FD are unique, superkey. These tables are in BCNF.
The web-based user interface is built with Java Spring/SpringMVC for framework, MySQL for DBMS, Tomcat for web server, maven for dependency management, Bootstrap for front-end beautify and git for version control. Finally, it was deployed on a ubuntu server rent from DigitalOcean.
There are 7 pages in total. Introduction of those pages is given as below.
The home page consists of two blocks: course search and program category, shown in Fig. 2.
Users can search courses by entering any combination of three searching keywords.
Six degree programs in the department of CSE are shown here. Students can click on “View” button of those programs to get into program page.
Search result page
The search result page will show a list of courses matching the searching criteria as Fig. 3. Users can click “View” button to go to the course page they are interested in.
The course page shown in Fig. 4 will show course detail including name, credit, description and previous section information. User can click “Edit course” button if they want to update course information.
Course update page
Users can update course information on this page as shown in Fig. 5. For safety purpose, only name, credit and description are allowed to be updated. Validations will be executed while processing an update request.
The program page will show related required course packages and interested course packages in Fig. 6. Users can click the “View” button to check the courses inside packages.
The package page will show courses in this package shown in Fig. 7. Course detail page can be accessed here. It also present how many courses are required in this package. Also, users can add or delete courses in this package.
The interest page functions similarly as the package page, as shown in Fig. 8.
Project source code
- Application source code can be found at https://github.com/melonskin/courseLookr
- Data retrieval source code can be found at https://github.com/melonskin/scrapeCourse
Initially, developing a good database schema is challenging to me. I tried to understand it better by reading textbook and looking through online discussion, both piazza and other forums. Professor and TA answered my questions completely. Based on my understanding, I designed my database schema and optimized it several times to make it more reasonable.
I developed the user interface with Java Spring/SpringMVC framework, which I haven’t touched before. By reading books and tutorials, I achieved my project from scratch. I learned a lot about what I have used and some basic concepts related to web and network.
The deployment is really a painful but last part of this project. First I developed this application locally on my PC. It works well. For online deployment, I compared those server providers and give up Heroku. Because it does not support MySQL. I have used Heroku before and really like deploying app with only several command in it. Finally, I chose DigitalOcean and configured a blank ubuntu server, installing MySQL, Tomcat, php and other dependencies. A lot of weird problems came out. I struggled to figure all of them out. Especially, I chose a server with 512MB memory initially. It caused me tons of trouble. It took me hours to find that most of problems were related to low memory. Once I upgraded to a server with larger memory, the world is peaceful again. However, during this process, I got a better understanding of linux system, managing database and jobs on server.