class: top, left, inverse, title-slide .title[ #
Indexing, Searching & Similarity of data using Multimodal Models
] .author[ ###
Lampros Sp. Mouselimis
] .institute[ ###
Monopteryx
] .date[ ### 2023-01-30
monopteryx-dashboard/
mlampros.github.io/
] --- # NLP & Deep Learning <br> Recent advancements in Natural Language Processing (NLP) - *static* & *contextualized* word embeddings - *deep contextual language models* in combination with image Deep Learning Models - *transformers* - *zero-shot transfer* allows the *indexing*, *categorization* and *similiarity* of multimodal datasets with high accuracy compared to previous supervised & unsupervised methods
--- # Use Cases <br> We verify the *zero-shot-tranfer* results using the next earth observation datasets, - [AID](https://captain-whu.github.io/AID/) (Aerial Scene Classification) - [UC Merced Land Use](http://weegee.vision.ucmerced.edu/datasets/landuse.html) (Land Use Imagery) The same approach can be used in other scientific areas as well, for instance - Fault detection - Biomedical image categorization where pairs of (text, image) are available --- class:hide_logo # AID Dataset The [AID](https://captain-whu.github.io/AID/) Dataset consists of the following *30 aerial scene types*, .pull-left[ - Airport - Bare_Land - Basebal_Field - Beach - Bridge - Center - Church - Commercial - Dense_Residential - Desert - Farmland - Forest - Industrial - Meadow - Medium_Residential ] .pull-right[ - Mountain - Park - Parking - Playground - Pond - Port - Railway_Station - Resort - River - School - Sparse_Residential - Square - Stadium - Storage_Tanks - Viaduct ] <img src="images/AID_sample.png" width="80%" style="display: block; margin: auto auto auto 0;" /> --- class:hide_logo # AID zero-shot-transfer Each type of the AID data has the following number of images, <img src="images/number_images_per_category_AID.png" width="105%" style="display: block; margin: auto auto auto 0;" /> The next *confusion matrix* shows the zero-shot-transfer results for the *10000* images (of all 30 categories) which gives an overall accuracy of 74.3 % <img src="images/confusion_matrix_AID.png" width="100%" style="display: block; margin: auto auto auto 0;" /> --- class:hide_logo # AID Random Forest Classifier We can compare the zero-shot-transfer with a *Random Forest Classifier*, which takes labels & image representations as input using default parameters and 5-fold cross-validation (CV). The overall CV-accuracy is 96.9 % as the following confusion matrix shows, <img src="images/confusion_matrix_AID_rf.png" width="105%" style="display: block; margin: auto auto auto 0;" /> --- # AID comparison results <br> The confusion matrices of the zero-shot-transfer & Random Forest Classifier show that we can get almost the same accuracy for *specific classes* - Airport - Beach - Desert - Farmland - Parking - Pont - Port - Viaduct without even train a classifier, which means a *pre-trained multimodal Model*, <br /> adjusted (*fine-tuned*) on a specific scientific area might give similar results <br /> as a classifier. --- class:hide_logo # UCMerced Dataset The [UC Merced Land Use](http://weegee.vision.ucmerced.edu/datasets/landuse.html) dataset includes *21 categories* and compared to the AID dataset it has *in total 2100 images* where *each class includes 100 images*, .pull-left[ - agricultural - airplane - baseball_diamond - beach - buildings - chaparral - dense_residential - forest - freeway - golf_course ] .pull-right[ - harbor - intersection - medium_residential - mobile_homepark - overpass - parking_lot - river - runway - sparse_residential - storage_tanks - tennis_court ] <img src="images/ucmerced_sample.png" width="60%" style="display: block; margin: auto auto auto 0;" /> --- class:hide_logo # UCMerced zero-shot-transfer The *confusion matrix* shows the zero-shot-transfer results for the *2100* images (of all 21 categories) which gives an overall accuracy of 82.2 % <img src="images/confusion_matrix_UC.png" width="100%" style="display: block; margin: auto auto auto 0;" /> --- class:hide_logo # UCMerced Random Forest The *Random Forest Classifier* will be used (as previously) for comparison purposes. The overall CV-accuracy is 96.9 % as the confusion matrix confirms, <img src="images/confusion_matrix_UC_rf.png" width="105%" style="display: block; margin: auto auto auto 0;" /> The accuracy of specific categories (such as Airplane, baseball diamond or beach) is approximately the same for the two approaches. --- class:hide_logo # Text & Image Similarity Zero-shot-transfer can be used also for text & document similarity. Once a database is created it can be incrementally updated so that a user can search, categorize and find similar text or images as the following diagram shows, <img src="images/similarity_diagram.png" width="70%" style="display: block; margin: auto auto auto 0;" /> The database can scale to millions of texts & images and return results within seconds using approximate k-nearest-neighbor algorithms. --- # Pricing <br> Feel free to reach out and request a quote by filling out the [Inquiry Form](https://monopteryx.netlify.app/contact/) <br /> in the following weblink: [https://monopteryx.netlify.app/contact/](https://monopteryx.netlify.app/contact/)