Skip to content

mexicanamerican/segment-anything-with-clip

 
 

Repository files navigation

Segment Anything with Clip

[HuggingFace Space] | [COLAB] | [Demo Video]

Meta released a new foundation model for segmentation tasks. It aims to resolve downstream segmentation tasks with prompt engineering, such as foreground/background points, bounding box, mask, and free-formed text. However, the text prompt is not released yet.

Alternatively, I took the following steps:

  1. Get all object proposals generated by SAM (Segment Anything Model).
  2. Crop the object regions by bounding boxes.
  3. Get cropped images' features and a query feature from CLIP.
  4. Calculate the similarity between image features and the query feature.
# How to get the similarity.
preprocessed_img = preprocess(crop).unsqueeze(0)
tokens = clip.tokenize(texts)
logits_per_image, _ = model(preprocessed_img, tokens)
similarity = logits_per_image.softmax(-1)

How to run on local

Anaconda is required before start setup.

make env
conda activate segment-anything-with-clip
make setup
# this executes GRadio server.
make run

Open http://localhost:7860/

Successive Works

  • Fast Segment Everything: Re-implemented Everything algorithm in iterative manner that is better for CPU only environments. It shows comparable results to the original Everything within 1/5 number of inferences (e.g. 1024 vs 200), and it takes under 10 seconds to search for masks on a CPU upgrade instance (8 vCPU, 32GB RAM) of Huggingface space.
  • Fast Segment Everything with Text Prompt: This example based on Fast-Segment-Everything provides a text prompt that generates an attention map for the area you want to focus on.
  • Fast Segment Everything with Image Prompt: This example based on Fast-Segment-Everything provides an image prompt that generates an attention map for the area you want to focus on.
  • Fast Segment Everything with Drawing Prompt: This example based on Fast-Segment-Everything provides a drawing prompt that generates an attention map for the area you want to focus on.

References

About

Segment Anything combined with CLIP

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 73.5%
  • Jupyter Notebook 23.5%
  • Makefile 3.0%