AI for Historical Structure Classification & Tourism Recommendation

Image classification for heritage structures (TensorFlow) + exploratory analytics and recommendation engine for tourism.

Problem Statement

Historical structures preserve cultural heritage and attract tourism. A government agency wants to use machine learning to:

Part 1 — Image Classification (TensorFlow / ResNet50)

Objective

Predict the category (one of 10 classes) of a structure from an image to support automated monitoring.

Dataset

  • Training images + separate test set across 10 categories.
  • Training split further into training / validation.

Model & Training

  • Base model: ResNet50 (pretrained), with custom dense + dropout layers.
  • Loss: sparse_categorical_crossentropy.
  • Regularization: early stopping + dropout.
  • Training epochs: 50 planned, early stopped at 26 (non-augmented) and 15 (augmented).
  • Also trained on augmented images for improved robustness.

Model Snippet

# CNN architecture
base_model = tf.keras.applications.ResNet50(
    input_shape=(224,224,3),
    include_top=False,
    weights='imagenet' #Initialize the weights (parameters) using the model that was already trained on ImageNet.
)
#Add the layers
model = models.Sequential([
    base_model,
    layers.GlobalAveragePooling2D(), #Flatten- for transfer we use this layer rather than flat layer
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.2), # Dropout for regularization
    layers.Dense(10, activation='softmax')  # 10 classes
])
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
          

Training Summary

Non-augmented Best
Epoch 21 — val_acc 0.5700, val_loss 1.3712
Training and validation accuracy/loss curves for non-augmented training.
Augmented Best
Epoch 11 — val_acc 0.5083, val_loss 1.4583
Training and validation accuracy/loss curves for augmented training.
Early Stopping
26 epochs (non-aug.), 15 epochs (aug.)

Conclusion (Part 1)

Validation accuracy > Training accuracy and Validation loss < Training loss, indicating strong generalization. Augmentation increased robustness but made training accuracy lower due to higher data variability. Additional data or longer training could further improve performance.

Part 2 — Tourism Analytics & Recommendation Engine

Objective

Perform EDA and build recommenders to help tourists discover places of interest and guide tourism marketing.

Data Preparation

  • Merged three datasets into a single DataFrame.
  • Cleaned data and removed irrelevant columns.
  • Translated Indonesian text to English (Google Translate) for analysis.

Key Analytics Findings

InsightSummary
Top rating age groupUsers aged 25–35 provided most ratings
Top tourist originBekasi, Jawa Barat
Popular citiesBandung, Jakarta, Yogyakarta City
Top category by visitsAmusement Parks
Highest-rated categoryNature preserves

Recommendations Built

  • Cosine similarity — collaborative approach on user-place rating matrix (low similarity scores; mostly < 0.4).
  • GenAI recommender — generative/contextual recommendations using combined dataset; produced category-aware suggestions and often diverged from cosine-based results.

Takeaways

  • Cosine similarity struggled due to sparse overlaps in user ratings.
  • Generative AI gave more thematic recommendations.

Conclusion (Part 2)

The analytics uncovered clear demographic and location patterns (top cities and categories). The hybrid approach (collaborative + generative) can be used together: collaborative filtering where data overlap exists and GenAI for contextual, category-based suggestions.

Charts & Diagrams

Popular Tourist Spots Popular Tourist Spots
Tourist Spots by CityTourist Spots by City
Recommendation Cosine SimilarityRecommendation Cosine Similarity
Recommendation GenAIRecommendation GenAI

Tech Stack & Future Improvements

Tech Stack

  • Deep Learning: TensorFlow, Keras, ResNet50
  • Data Analysis: Pandas, NumPy
  • Visualization: Matplotlib, Seaborn
  • Recommendation: Cosine Similarity, GenAI
  • Translation: Google Translator API

Future Improvements

  • Increase and diversify the image dataset; fine-tune deeper layers of ResNet50.
  • Combine collaborative and content-based recommenders into a hybrid system.
  • Deploy a web dashboard for real-time monitoring and recommendations.
  • Collect more explicit user feedback to improve collaborative filtering.