This tool takes an MP3 as input via a web form and generates a number of images via the Open AI API in this sequence:
- Audio File -> Transcription
- Transcription ->
nSummarized Prompts - Summarized Prompts -> Short Global Summary
- Indiviual Prompt + Global Summary -> Image
After generation is complete, and "Optimize" option will perform
a handful of imagemagick operations on the generated content.
You'll need ruby, maybe wget and imagemagick.
- Run
bundle install - Ensure
OPENAI_API_KEYis set to your particular API key - Add the upscaler:
wget https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.5.0/realesrgan-ncnn-vulkan-20220424-macos.zipunzip realesrgan-ncnn-vulkan-20220424-macos.zipchmod u+x realesrgan-ncnn-vulkanrm realesrgan-ncnn-vulkan-20220424-macos.zip
- Run
ruby app.rb(orDEBUG=1 ruby app.rbfor more logging) - Visit http://127.0.0.1:4567 and upload your audio
Example projects are included from:
The first example was a music recording with only a style provided. The second example was per the narration from this YouTube video having both a style and context. However, the additional context of "There's always bats." was overlooked during the prompt generation. Some fine tuning of the prompting is still needed.
If you want to do some specific post-processing, you can create
your own actions from ./custom_actions. See colorize.rb for a
full example. Then copy the boilerplate starter.rb into your own
file.
