Some Database Issues

Introduction. Another grouping of major issues we need to consider has to do with the nature of the databases or data sources that are being used. For some quick examples of some issues consider the following. You may have a data source

on your computer,

on another computer on a LAN that you can consider to be on a local network drive

on a LAN but your access is more limited

not on a LAN but elsewhere on your node of the web

not really web accessible

someplace else on the web

someplace else not on the web

This is not intended to be an exhaustive list, though it may be exhausting. These few possibilities are intended to motivate developing a taxonomy for what are some of the differences between databases that influence how one should develop for the web.

I am more inclined to uses axes to represent important dimensions and positions on these axes rather than try to construct a set of little boxes that will certainly fail to classify. All this babbling is leading someplace and should hopefully make more sense after some more reading and thinking.

So what are these important dimensions or axes? While there are all kinds of possibilities I will focus on the following.

How local is the information?

How accessible is the information?

How much control do you have over the data source?

How many different sources are there?

What is the volume of data?

How diverse are the DBMSs?

Degree of Localness. When you learned about databases in VB or Access you were inevitably working with data sources on your own machine. The things that you learned can easily transfer to data sources that are on local network drives that can be accessed via a LAN.

Maybe you do have a small company with just a few employees and your LAN is largely all you need. In fact maybe it's all you want because it gives you added security. I have done work for a medical testing facility, a small hedge fund and an outdoors club that all fit this situation, in addition to many others. But things are often not this simple.

Maybe your data source is on another server on another part of the network. Consider a situation where you are trying to create a spiffy report generator for the Datatel system here at Quinnipiac. Maybe your data sources are even more remote. It could be the case that your Transaction Processing System that keeps track of actual sales is on one computer system with its own DBMS. It may also be the case that your credit card information is on another network using another DBMS. This can obviously go on and on.

Degree of Accessibility. Do you have direct access to the data? Can you only get copies of what exists? Can you modify the data? Can you read the data source? Can you write to the data source? Can you cull information using some sort of software?

Obviously accessibility is going to be different for different situations. If you are submitting an order on-line then you are likely to at least be sending e-mail to some host. It is even reasonable to believe that you are writing data to some host. It is also possible that the data source server is separate from the web server to add some additional protection and more privacy.

Consider what would be involved for you to register for courses on-line. There has to be considerable trust for the user to input things correctly. On the other hand, the data entry isn't likely to result in as many errors as if someone was transcribing handwritten information into a data source. What sort of accessibility would you need to see your grades and on-line records of your grades? What sorts of measures would help ensure privacy?

Well, this is a start to a very involved discussion.

Degree of Data Control. How much can you influence what data is gathered? Can you modify the data?

Quantity of Data Sources. Sometimes you have a single well defined and well structured data sources. Well maybe this happens in classes in school.

At my last job we were trying to collect data in order to use statistics to study what variables were really influencing whether students decided to matriculate at the university. Even though all the data sources were local, Admissions had its own data set, financial aid had its own data set, the registrar had its own data set. I think you can start to see why they were willing to pay $1.5 million to get an antique version of Datatel!

One of the cases I develop in my MBA generalist class has to do with the EPA. The EPA turns out to have a different data source at a different location for each of seven different major pieces of environmental legislation!

These sorts of things are not uncommon since firms have tended to collect data in isolation for particular needs.

Volume of Data. I once worked on a little hedge fund back office system with just three traders and about six other staff. The volume of information they had just from the trades they had made in the last couple years resulted in source files in the multi-megabytes. Think what it must be like for large investment firms. Think what it must be like for banks.

This usually doesn't have as much impact as some of these other dimensions. If you were trying to develop web interactions based in Access you would likely see unrelenting collapses of the system. SQL Server as the manipulation language is far more robust. But even SQL Server is insufficiently scalable for some applications!

Diversity of the DBMSs. This can be a real issue in larger firms.

For example, consider a firm where years ago they were using IBMs with DB2 to store data. They continue to use this because the computers still run. But now they've gotten into issuing credit cards and they are using Oracle as the DBMS for this data. Let's assume there is some sort of transaction processing system that is UNIX based using another DBMS. This can easily go on and on.

This is one of the biggest issues in the EPA case where the diverse data sets are each maintained using different DBMSs.

For This Course. For this course we will assume that you are ultimately working towards creating a functioning on-line store with all of the usual features. Usually this would mean at least one web server and a different database server for added security. We will have to operate off one machine due to resource constraints. We will be using a particular set of Microsoft tools collecting current data. We will not have to access legacy data from other sources, that will happen in a data mart/data warehouse course.