muhia.posthaven.com Open in urlscan Pro
2a09:8280:1::3:3273 Public Scan

Back to summary

Submitted URL:
http://muhia.posthaven.com/
Effective URL:
https://muhia.posthaven.com/
Submission: On June 28 via api (June 28th 2024, 12:00:01 pm UTC) from US — Scanned from DE

Form analysis
2 forms found in the DOM

/posthaven/subscription

<form action="/posthaven/subscription" class="posthaven-subscribe-form posthaven-control">
  <a class="posthaven-anon posthaven-subscribe-prompt" href="javascript:void(0)">
<strong>Subscribe by email »</strong>
<span>We'll email you when there are new posts here.</span>

</a>
  <div class="posthaven-user-required" style="display:none">
    <div class="posthaven-user-unsubscribe" style="display: none"> You're following this blog. <a data-method="DELETE" href="javascript:void(0);">
Unfollow »
</a>
    </div>
    <div class="posthaven-user-subscribe">
      <a data-method="POST" href="javascript:void(0);">
Follow this Posthaven »
</a>
    </div>
  </div>
  <div class="posthaven-anon posthaven-subscribe-anon" style="display:none"> Enter your email address to get email alerts about new posts on this site. Unsubscribe anytime. <div class="posthaven-subscribe-error posthaven-form-error"
      style="display:none"> Email address is invalid. </div>
    <input name="email_address" placeholder="Email Address" type="text">
    <input type="submit" value="Subscribe">
  </div>
</form>

/archive

<form action="/archive" class="posthaven-archive-form">
  <input name="query" placeholder="Search this site..." type="text">
</form>

Text Content

Manage New Post Logout Login




BRIAN MUHIA


[Μ] PERSONAL THOUGHTS ABOUT THIS WEIRD GENERATIVE PROCESS WE ALL SEEM TO BE IN.


ALIGNMENT JAM #2

On November 11-13, I was in a hackathon! I learned so much that weekend,
including how to speedrun a tutorial and how to quickly execute on a research
question. I'll be more keen to form or join a team next time, which should help
get more done. Here's the link to my results: 

- https://poppingtonic.itch.io/ob

- https://github.com/poppingtonic/transformer-visualization

Upvote Upvoted 0


Posted 2 years ago


VISUALISING MULTI-SENSOR PREDICTIONS FROM A RICE DISEASE CLASSIFIER


CROSS-POSTED FROM HTTPS://ZINDI.AFRICA/DISCUSSIONS/14258






INTRODUCTION


The Microsoft Rice Disease Classification Challenge introduced a dataset
comprising RGB and RGNiR (RG-Near-infra-Red) images. This second image type
increased the difficulty of the challenge such that all of the winning models
worked with RGB only. In this challenge we applied a res2next50 encoder that was
first pre-trained with self-supervised learning through the SwAV algorithm, to
represent each RGB and their corresponding RGNIR images with the same weights.
The encoder was then fine-tuned and self-distilled to classify the images which
produced a public test set score of 0.228678639, and a private score of
0.183386940. K-fold cross-validation was not used for this challenge result. To
better understand the impact of self-supervised pre-training on the problem of
classifying each image type, we apply t-distributed Stochastic Neighbour
Embedding (t-SNE) on the logits (predictions before applying softmax). We show
how this method graphically provides some of the value of a confusion matrix, by
locating some incorrect predictions. We then render the visualisation by
overlaying the raw images in each data point, and note that to this model, the
RGNIR images do not appear to be inherently more difficult to categorise. We
make no comparisons through sweeps, RGB-only models or RGNIR-only models. This
is left to future work.


GOAL OF THIS REPORT

This report tries to explain a simple-to-understand method for visualising the
distribution of raw predictions from a vision classifier on a random sample of
data in the validation set.

We do this to, at a glance;

 1. explain the model in ways that can help us improve it. 

 2. to understand the data itself, asking the question, if the model struggled
    to classify RGNIR images more than RGB images.





DATA

Combining data from multiple sensors seems to be a good way to increase the
number of training set examples, which has a known positive effect on train/test
performance, among other measures of generalisation. Additional sensors are
often deployed to capture different features from the baseline sensors, which
may help to resolve their deficiencies. Less well studied is the question of
when the additional sensor(s) add noise or require more representational
capacity from the model, whether this reduces its capacity to perform the task
on even the baseline sensor data.


METHODS & ANALYSIS

This work is an example of post-hoc interpretability, which addresses the
black-box nature of our models, where we do not have access to their internal
representations, or ignore the structure of the model whose behaviour we are
trying to explain. This means that we only use raw predictions and labels (0.0 =
blast, 1.0 = brown, 2.0 = healthy) on each data point, ignoring the model’s
layer structure, learned features, dimensionality, weights and biases. This lets
us use general methods for clustering data such as t-SNE. To plot a 2D image, we
initialise using PCA to reduce dimensionality to 2 components, and apply
perplexity=50. Note the overlaps i.e the presence of false-positives in each
class, indicating the need for k-fold cross-validation.


To show the effect that the image type had on classification, we overlay each
datapoint with the raw image it represents. This follows from related work by
Karpathy and Iwana et. al which use this methodology to produce informative
visualisations with some explanatory value, although in this case the effect is
more salient due to the two image types. We see where the RGNIR images tend to
cluster in relation to their location in the global cluster regions in the chart
above. Note the density of RGNIR images in the “tip” of the “blast” cluster
(blue region in the first plot, scroll up then back), and in the bottom middle,
indicating that while some RGNIR images were easy to correctly classify as
“blast”, others were more easily confused with “brown” than they were with
“healthy”. Qualitatively, there appear to be more false-positive RGNIR images
than not, which might indicate higher uncertainty or noise in the predictions
due to conflicting sensor data. This might be an artefact of the data
augmentation methods used to train SwAV and the classifier. A lot more
region-overlapping in the centroid of the image, together with the presence of
both image types indicates some confusion for the classification task. 


There are many reasons not to put much weight on the analysis above. T-SNE is
valuable only after multiple runs have been observed. We might also want to
include comparisons with weights from different epochs, early in training. More
generally, statistical grounding improves the quality of good interpretability
methods. In conclusion, the separation could be improved by applying readily
available methods and there is no a priori reason to expect the pretraining
strategy to contribute to better separation of classes. It helps with
representing the images more fairly, but not decisively for the classification
problem. All this work can be reproduced with the notebooks available here. The
repository also has links to model weights: Rice Disease Classification through
Self-Supervised Pre-training.



CONCLUSION

We show that when correctly applied, t-SNE, or potentially other types of
dimensionality reduction methods can produce plots that can help us understand
which of our training strategies could be changed in order to improve the
model’s test set scores. In this case, we identify cross-validation as a
potential intervention. We also learn more about our data using a method that is
reproducible and reusable for other domains.


ACKNOWLEDGMENTS

Kerem Turgutlu for self-supervised:
https://keremturgutlu.github.io/self_supervised

Zachary Mueller: https://walkwithfastai.com

Jeremy Howard: https://fast.ai

Daniel Gitu, Ben Mainye and Alfred Ongere for helping proofread a draft of this
document.

Open Philanthropy for funding part of this work.



APPENDIX


 1. 1: SELF-DISTILLATION

When training a classifier, we eventually find predictions that are correct with
high confidence. Naively applied, self-distillation in this case meant assigning
labels to high-confidence test set examples. We collect these new labels and
create a new “train.csv” which is used to fine-tune the best checkpoint with the
dataset updated to include resampled test set examples, with their predicted
labels. The final private test set predictions were produced after 2 rounds of
self-distillation.


 1. 2: T-SNE

t-SNE is a dimensionality reduction method useful for producing beautiful
visualisations of high dimensional data. It gives each high-dimensional data
point a location on a 2D or 3D map. This relies on the parameter n_components,
which we set to 2 for a 2-dimensional image. t-SNE is a non-linear, and adaptive
transformation, operating on each data point based on a balance between its
neighbours (local information) and the whole sample dataset (global
information). For this, the hyperparameter ‘perplexity’ is applied. We set this
to 50 in the presented plot, after sampling values below that (2, 10, 30), and
above (100) to observe the different plots that are generated.




Upvote Upvoted 1


Posted 2 years ago


BOO "PAPERCLIP MAXIMIZERS" AS A TERM

This is an analogy used in informal arguments related to AI's potential for
catastrophic risk. The value of the analogy in this name was, in my view, that
it pointed out the idea of "a random outcome that nobody asked for". Paperclips
are what you'd call a niche interest, for humans nearly everywhere in the past
or future. So an incredibly powerful computer that somehow managed to maximize
the number of paperclips on earth over everything else, against the wishes of
its controllers, would produce a random outcome that nobody asked for,
especially those who don't care one bit about paperclips.

Upvote Upvoted 0


Posted 2 years ago


DIFFICULT VISION CHALLENGES: UCHIDA LAB'S BOOK DATASET

As someone who wants to interpret and explain decisions from deep learning
models, I like to highlight difficult datasets as a subject for study. My
current challenge is Iwana et. al’s “Judging a Book By its Cover”, which
introduced a 200k+ image multi-feature dataset of book covers from Amazon. The
paper posed a genre classification challenge. The original tasks are very
challenging for a convolutional neural network to tackle. My own attempts with a
resnet-50 fine-tuned from ImageNet only slightly beat the published top result,
with 0.306 top-1 accuracy. Training procedure was SGD with warm restarts,
discriminative fine-tuning, cyclical learning rate decay schedule, progressive
image resizing and data augmentation at test time.

This is a T-SNE visualisation of my model’s test set performance on a sample of
the dataset:








The spikes represent 30% of the images, with 70% in the centroid,
underrepresented for their label. T-SNE ran with perplexity=15.


The visualisation suggests that relying on the convolutional inductive bias only
works for a small number of naturalistic covers represented in the dataset (the
spikes in the image), but fails to find any genre-unique similarity between most
varieties of plain human-designed text, fonts and graphic design. It might also
be due to different forms of imbalance in the dataset. This visualisation and
theory is worth more exploration. Testing multimodal models with a Text+Vision
inductive bias on this dataset might shed some light on this. For example,
evaluating and visualising contrastive language-image pretraining (CLIP) in
inference mode. The CLIP paper claims that it can do OCR. Can it classify the
recognized text as well? Here we would evaluate CLIP’s zero-shot performance
based on the task of classifying the text in the book cover image by genre. The
task would be: each image would have a question asking "Is this a picture of a
<genre>?" 

There's an interpretability project here, which would be to visualise multimodal
embeddings activated by this task and use that to explain why it works better,
if it does. This would be one entry point for work on visualising and explaining
large language models because it often feels like visualisation is simpler with
multimodal tasks.





Upvote Upvoted 0


Posted 2 years ago


MACHINE LEARNING UPDATES AND LINKS (MAY 2019)

1. I recently taught AI Saturdays Nairobi about DEViSE (Deep visual-semantic
embedding) methods, which can be useful in visual image search, dataset
curation, semantic image search, and [possibly] blocking movie spoilers you'd
rather not see..? Notebook is available here: devise-food101-v2.ipynb.

2. In April, I participated in the inaugural AI4D (AI for Development) network
of excellence in Artificial Intelligence for Sub-Saharan Africa. See updates
from the event here: AI4D-SSA.

3. Sign up to participate in the Omdena AI Challenge! See the details here:
Omdena AI Challenge.

4. Nairobi Women in Machine Learning and Data Science is holding an event in
June to encourage people to contribute to critical infra in ML. This time it's
scikit-learn: Scikit-Learn Sprint (contribute to open source).

Upvote Upvoted 0


Posted 5 years ago
 * 1
 * 2
 * Next ›
 * Last »

Subscribe by email » We'll email you when there are new posts here.
You're following this blog. Unfollow »
Follow this Posthaven »
Enter your email address to get email alerts about new posts on this site.
Unsubscribe anytime.
Email address is invalid.


BRIAN MUHIA

I love to make, learn and teach. I program computers, read, write, think about
science, technology, the future of life and trying to live healthily.

Browse the Archive »


TAGS

 * crowdsourcing 1
 * open source 1
 * dataset 1
 * image captioning 1
 * machine translation 1
 * See all 8 tags »

muhia.posthaven.com Open in urlscan Pro 2a09:8280:1::3:3273 Public Scan

Form analysis 2 forms found in the DOM

/posthaven/subscription

/archive

Text Content

muhia.posthaven.com Open in urlscan Pro
2a09:8280:1::3:3273 Public Scan

Form analysis
2 forms found in the DOM