Clustering and Time Series Prediction for Spatio-Temporal Geographic Dataset by Kedar Prasad AgrawalMaterial type: Visual materialPublisher: Ahmedabad Nirma Institute of Technology 2016Description: 86p Ph. D. Thesis with Synopsis and CD.DDC classification: TT000036 Online resources: Institute Repository (Campus Access) | Shodhganga
|Item type||Current location||Collection||Call number||Status||Date due||Barcode||Item holds|
|Thesis||Institute of Technology||Reference||TT000036 AGR (Browse shelf)||Not For Loan||TT000036|
|CD/DVD||Institute of Technology||Reference||TT000036 AGR (Browse shelf)||Not For Loan||TT000036-1|
|Synopsis||Institute of Technology||Reference||TT000036 AGR (Browse shelf)||Not For Loan||TT000036-2|
Guided by: Dr. Sanjay Garg With Synopsis and CD 11EXTPHDE71
Owing to the generation of petabytes of data (may be of type classical, spatial, temporal or hybrid) on daily basis from different sources, work is required to be carried out such that these voluminous amount of data can be utilized meaningfully using relevant data mining tasks. When it is required to deal with Spatio-Temporal dataset, data mining related tasks becomes more challenging specially in case of obtaining arbitrary shaped clusters of good quality and reliable forecasting. Based on reliable forecasting, some anticipatory action like Land Usage, availability of good and healthy crops or no crops, good rains, flood or detecting drought areas etc. can be taken which is beneficial to masses. In clustering, issues like detection of arbitrary shaped clusters, handling high dimensional data, independence from order of data input, interpretability, ability to deal with nested clusters, scalability etc. and while forecasting, issues like handling non-stationarity of time series, non-linear domain, selection and tuning of parameters of existing or newly developed technique(s) needs to be addressed with utmost care.
Spatio-Temporal Data Mining (STDM) is a process of the extraction of implicit knowledge, spatial and temporal relationships, or other patterns not explicitly stored in spatiotemporal databases. As data is growing not only from static view point, but they also evolve spatially and temporally which is dynamic in nature that is the reason why this field is now becoming very important field of research. In addition Spatio-Temporal (ST) -Data tends to be highly auto-correlated, because of which assumptions which are taken in Gaussian distribution models fails, as in Gaussian Distribution, an assumption of independence is taken into consideration, which is not the case with ST Data. Vital issues in spatiotemporal clustering technique for Earth observation data is to obtain good quality arbitrarily shaped clusters and its validation. The presented research work addresses these issues and presents their solutions. In order to achieve said objective, an attempt has been made to develop a clustering algorithm named as “Spatio-Temporal - Ordering Points to Identify Clustering Structure (ST-OPTICS)” which is modified version of existing density based technique OPTICS.
Experimental work carried out is analyzed and found that quality of clusters obtained and run time efficiency are much better than existing technique i.e. ST DBSCAN. An attempt has been made to hybridize the results generated by ST-OPTICS with agglomerative approach to improve the visualization and the interpretation of obtained clusters. Validations of the obtained results have also been performed by visualization and various performance indices. Results shows performance improvement of ST-OPTICS clustering technique.
In order to improve the accuracy of prediction, fusion of statistical and machine learning models have been done. Statistical model like Integration of Auto Regressive (AR) and Moving Average (MA) is capable to handle non-stationary time series but it can deal with only single time series. While machine learning approach (i.e. Support Vector Regression (SVR)) can handle dependency among different time series along with non-linear separable domains, however it cannot incorporate the past behavior of time-series. This led to combine these two approaches for improving accuracy of time series prediction, where focus has been given on minimization of forecast error using residuals, which helps to take appropriate action for near future. Keeping in view objective, hybridization of Auto Regressive Integrated Moving Average (ARIMA) with SVR models has been done. In order to reduce number of area wise models and reduction in time complexity for tuning different parameters, emphasis has been laid down on handling issues related to scalability by taking suitable representative samples from each sub-areas. Results obtained shows that the performance of proposed hybrid model is better than individual models.