Indexing, Searching & Similarity of data using Multimodal Models

class: top, left, inverse, title-slide

.title[
# <b>Indexing, Searching & Similarity of data using Multimodal Models<b/>
]
.author[
### <img src="images/initial_diagram_cropped.png" width="385" height="245"><br><b>Lampros Sp. Mouselimis<b/>
]
.institute[
### <b>Monopteryx<b/>
]
.date[
### 2023-01-30<br><svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M172.5 131.1C228.1 75.51 320.5 75.51 376.1 131.1C426.1 181.1 433.5 260.8 392.4 318.3L391.3 319.9C381 334.2 361 337.6 346.7 327.3C332.3 317 328.9 297 339.2 282.7L340.3 281.1C363.2 249 359.6 205.1 331.7 177.2C300.3 145.8 249.2 145.8 217.7 177.2L105.5 289.5C73.99 320.1 73.99 372 105.5 403.5C133.3 431.4 177.3 435 209.3 412.1L210.9 410.1C225.3 400.7 245.3 404 255.5 418.4C265.8 432.8 262.5 452.8 248.1 463.1L246.5 464.2C188.1 505.3 110.2 498.7 60.21 448.8C3.741 392.3 3.741 300.7 60.21 244.3L172.5 131.1zM467.5 380C411 436.5 319.5 436.5 263 380C213 330 206.5 251.2 247.6 193.7L248.7 192.1C258.1 177.8 278.1 174.4 293.3 184.7C307.7 194.1 311.1 214.1 300.8 229.3L299.7 230.9C276.8 262.1 280.4 306.9 308.3 334.8C339.7 366.2 390.8 366.2 422.3 334.8L534.5 222.5C566 191 566 139.1 534.5 108.5C506.7 80.63 462.7 76.99 430.7 99.9L429.1 101C414.7 111.3 394.7 107.1 384.5 93.58C374.2 79.2 377.5 59.21 391.9 48.94L393.5 47.82C451 6.731 529.8 13.25 579.8 63.24C636.3 119.7 636.3 211.3 579.8 267.7L467.5 380z"/></svg> <a href="https://monopteryx.netlify.app/portfolio/">monopteryx-dashboard/</a><br><svg aria-hidden="true" role="img" viewBox="0 0 640 512" style="height:1em;width:1.25em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:currentColor;overflow:visible;position:relative;"><path d="M172.5 131.1C228.1 75.51 320.5 75.51 376.1 131.1C426.1 181.1 433.5 260.8 392.4 318.3L391.3 319.9C381 334.2 361 337.6 346.7 327.3C332.3 317 328.9 297 339.2 282.7L340.3 281.1C363.2 249 359.6 205.1 331.7 177.2C300.3 145.8 249.2 145.8 217.7 177.2L105.5 289.5C73.99 320.1 73.99 372 105.5 403.5C133.3 431.4 177.3 435 209.3 412.1L210.9 410.1C225.3 400.7 245.3 404 255.5 418.4C265.8 432.8 262.5 452.8 248.1 463.1L246.5 464.2C188.1 505.3 110.2 498.7 60.21 448.8C3.741 392.3 3.741 300.7 60.21 244.3L172.5 131.1zM467.5 380C411 436.5 319.5 436.5 263 380C213 330 206.5 251.2 247.6 193.7L248.7 192.1C258.1 177.8 278.1 174.4 293.3 184.7C307.7 194.1 311.1 214.1 300.8 229.3L299.7 230.9C276.8 262.1 280.4 306.9 308.3 334.8C339.7 366.2 390.8 366.2 422.3 334.8L534.5 222.5C566 191 566 139.1 534.5 108.5C506.7 80.63 462.7 76.99 430.7 99.9L429.1 101C414.7 111.3 394.7 107.1 384.5 93.58C374.2 79.2 377.5 59.21 391.9 48.94L393.5 47.82C451 6.731 529.8 13.25 579.8 63.24C636.3 119.7 636.3 211.3 579.8 267.7L467.5 380z"/></svg> <a href="http://mlampros.github.io/">mlampros.github.io/</a>
]

---

# NLP & Deep Learning

<br>

Recent advancements in Natural Language Processing (NLP)

- *static* & *contextualized* word embeddings
- *deep contextual language models*

in combination with image Deep Learning Models

- *transformers*
- *zero-shot transfer*

allows the *indexing*, *categorization* and *similiarity* of multimodal datasets with high accuracy compared to previous supervised & unsupervised methods

<div>
<style type="text/css">.xaringan-extra-logo {
width: 110px;
height: 128px;
z-index: 0;
background-image: url(images/monopteryx.png);
background-size: contain;
background-repeat: no-repeat;
position: absolute;
top:26em;right:1em;
}
</style>
<script>(function () {
  let tries = 0
  function addLogo () {
    if (typeof slideshow === 'undefined') {
      tries += 1
      if (tries < 10) {
        setTimeout(addLogo, 100)
      }
    } else {
      document.querySelectorAll('.remark-slide-content:not(.title-slide):not(.inverse):not(.hide_logo)')
        .forEach(function (slide) {
          const logo = document.createElement('div')
          logo.classList = 'xaringan-extra-logo'
          logo.href = null
          slide.appendChild(logo)
        })
    }
  }
  document.addEventListener('DOMContentLoaded', addLogo)
})()</script>
</div>

---

# Use Cases

<br>

We verify the *zero-shot-tranfer* results using the next earth observation datasets,

- [AID](https://captain-whu.github.io/AID/) (Aerial Scene Classification)
- [UC Merced Land Use](http://weegee.vision.ucmerced.edu/datasets/landuse.html) (Land Use Imagery)

The same approach can be used in other scientific areas as well, for instance

- Fault detection
- Biomedical image categorization

where pairs of (text, image) are available

---
class:hide_logo

# AID Dataset

The [AID](https://captain-whu.github.io/AID/) Dataset consists of the following *30 aerial scene types*,

.pull-left[
- Airport 
- Bare_Land 
- Basebal_Field 
- Beach 
- Bridge 
- Center 
- Church 
- Commercial 
- Dense_Residential 
- Desert 
- Farmland 
- Forest 
- Industrial 
- Meadow 
- Medium_Residential 
]

.pull-right[
- Mountain 
- Park 
- Parking 
- Playground 
- Pond 
- Port 
- Railway_Station 
- Resort 
- River 
- School 
- Sparse_Residential 
- Square 
- Stadium 
- Storage_Tanks 
- Viaduct
]

---
class:hide_logo

# AID zero-shot-transfer

Each type of the AID data has the following number of images,

The next *confusion matrix* shows the zero-shot-transfer results for the *10000* images (of all 30 categories) which gives an overall accuracy of 74.3 %

---
class:hide_logo

# AID Random Forest Classifier

We can compare the zero-shot-transfer with a *Random Forest Classifier*, which takes labels & image representations as input using default parameters and 5-fold cross-validation (CV). The overall CV-accuracy is 96.9 % as the following confusion matrix shows,

---

# AID comparison results

<br>

The confusion matrices of the zero-shot-transfer & Random Forest Classifier show that we can get almost the same accuracy for *specific classes*

- Airport
- Beach
- Desert
- Farmland
- Parking
- Pont
- Port
- Viaduct

without even train a classifier, which means a *pre-trained multimodal Model*, <br /> adjusted (*fine-tuned*) on a specific scientific area might give similar results <br /> as a classifier.

---
class:hide_logo

# UCMerced Dataset

The [UC Merced Land Use](http://weegee.vision.ucmerced.edu/datasets/landuse.html) dataset includes *21 categories* and compared to the AID dataset it has *in total 2100 images* where *each class includes 100 images*,

.pull-left[
- agricultural 
- airplane 
- baseball_diamond 
- beach 
- buildings 
- chaparral 
- dense_residential 
- forest 
- freeway 
- golf_course 
]

.pull-right[
- harbor 
- intersection 
- medium_residential 
- mobile_homepark 
- overpass 
- parking_lot 
- river 
- runway 
- sparse_residential 
- storage_tanks 
- tennis_court
]

---
class:hide_logo

# UCMerced zero-shot-transfer

The *confusion matrix* shows the zero-shot-transfer results for the *2100* images (of all 21 categories) which gives an overall accuracy of 82.2 %

---
class:hide_logo

# UCMerced Random Forest

The *Random Forest Classifier* will be used (as previously) for comparison purposes. The overall CV-accuracy is 96.9 % as the confusion matrix confirms,

The accuracy of specific categories (such as Airplane, baseball diamond or beach) is approximately the same for the two approaches.

---
class:hide_logo

# Text & Image Similarity

Zero-shot-transfer can be used also for text & document similarity. Once a database is created it can be incrementally updated so that a user can search, categorize and find similar text or images as the following diagram shows,

The database can scale to millions of texts & images and return results within seconds using approximate k-nearest-neighbor algorithms.

---

# Pricing

<br>

Feel free to reach out and request a quote by filling out the [Inquiry Form](https://monopteryx.netlify.app/contact/) <br /> in the following weblink:
[https://monopteryx.netlify.app/contact/](https://monopteryx.netlify.app/contact/)