AI for Historical Structure Classification & Tourism Recommendation

Problem Statement

Historical structures preserve cultural heritage and attract tourism. A government agency wants to use machine learning to:

Monitor conditions of historical structures automatically (image-based classification).
Understand tourists and recommend places to improve marketing and engagement.

Part 1 — Image Classification (TensorFlow / ResNet50)

Objective

Predict the category (one of 10 classes) of a structure from an image to support automated monitoring.

Dataset

Training images + separate test set across 10 categories.
Training split further into training / validation.

Model & Training

Base model: ResNet50 (pretrained), with custom dense + dropout layers.
Loss: sparse_categorical_crossentropy.
Regularization: early stopping + dropout.
Training epochs: 50 planned, early stopped at 26 (non-augmented) and 15 (augmented).
Also trained on augmented images for improved robustness.

Model Snippet

# CNN architecture
base_model = tf.keras.applications.ResNet50(
    input_shape=(224,224,3),
    include_top=False,
    weights='imagenet' #Initialize the weights (parameters) using the model that was already trained on ImageNet.
)
#Add the layers
model = models.Sequential([
    base_model,
    layers.GlobalAveragePooling2D(), #Flatten- for transfer we use this layer rather than flat layer
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2), # Dropout for regularization
    layers.Dense(10, activation='softmax')  # 10 classes
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

Training Summary

Non-augmented Best

Epoch 21 — val_acc 0.5700, val_loss 1.3712

Training and validation accuracy/loss curves for non-augmented training.

Augmented Best

Epoch 11 — val_acc 0.5083, val_loss 1.4583

Training and validation accuracy/loss curves for augmented training.

Early Stopping

26 epochs (non-aug.), 15 epochs (aug.)

Conclusion (Part 1)

Validation accuracy > Training accuracy and Validation loss < Training loss, indicating strong generalization. Augmentation increased robustness but made training accuracy lower due to higher data variability. Additional data or longer training could further improve performance.

Part 2 — Tourism Analytics & Recommendation Engine

Objective

Perform EDA and build recommenders to help tourists discover places of interest and guide tourism marketing.

Data Preparation

Merged three datasets into a single DataFrame.
Cleaned data and removed irrelevant columns.
Translated Indonesian text to English (Google Translate) for analysis.

Key Analytics Findings

Insight	Summary
Top rating age group	Users aged 25–35 provided most ratings
Top tourist origin	Bekasi, Jawa Barat
Popular cities	Bandung, Jakarta, Yogyakarta City
Top category by visits	Amusement Parks
Highest-rated category	Nature preserves

Recommendations Built

Cosine similarity — collaborative approach on user-place rating matrix (low similarity scores; mostly < 0.4).
GenAI recommender — generative/contextual recommendations using combined dataset; produced category-aware suggestions and often diverged from cosine-based results.

Takeaways

Cosine similarity struggled due to sparse overlaps in user ratings.
Generative AI gave more thematic recommendations.

Conclusion (Part 2)

The analytics uncovered clear demographic and location patterns (top cities and categories). The hybrid approach (collaborative + generative) can be used together: collaborative filtering where data overlap exists and GenAI for contextual, category-based suggestions.

Charts & Diagrams

Popular Tourist Spots

Tourist Spots by City

Recommendation Cosine Similarity

Recommendation GenAI

Tech Stack & Future Improvements

Tech Stack

Deep Learning: TensorFlow, Keras, ResNet50
Data Analysis: Pandas, NumPy
Visualization: Matplotlib, Seaborn
Recommendation: Cosine Similarity, GenAI
Translation: Google Translator API

Future Improvements

Increase and diversify the image dataset; fine-tune deeper layers of ResNet50.
Combine collaborative and content-based recommenders into a hybrid system.
Deploy a web dashboard for real-time monitoring and recommendations.
Collect more explicit user feedback to improve collaborative filtering.