Audio to Images

This tool takes an MP3 as input via a web form and generates a number of images via the Open AI API in this sequence:

Audio File -> Transcription
Transcription -> n Summarized Prompts
Summarized Prompts -> Short Global Summary
Indiviual Prompt + Global Summary -> Image

After generation is complete, and "Optimize" option will perform a handful of imagemagick operations on the generated content.

Setup

You'll need ruby, maybe wget and imagemagick.

Run bundle install
Ensure OPENAI_API_KEY is set to your particular API key
Add the upscaler: wget https://github.com/xinntao/Real-ESRGAN/releases/download/v0.2.5.0/realesrgan-ncnn-vulkan-20220424-macos.zip unzip realesrgan-ncnn-vulkan-20220424-macos.zip chmod u+x realesrgan-ncnn-vulkan rm realesrgan-ncnn-vulkan-20220424-macos.zip

Up and Running

Run ruby app.rb (or DEBUG=1 ruby app.rb for more logging)
Visit http://127.0.0.1:4567 and upload your audio

Example Output

Example projects are included from:

The first example was a music recording with only a style provided. The second example was per the narration from this YouTube video having both a style and context. However, the additional context of "There's always bats." was overlooked during the prompt generation. Some fine tuning of the prompting is still needed.

UI Example

Custom Actions

If you want to do some specific post-processing, you can create your own actions from ./custom_actions. See colorize.rb for a full example. Then copy the boilerplate starter.rb into your own file.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
custom_actions		custom_actions
lib		lib
projects		projects
views		views
.gitignore		.gitignore
Gemfile		Gemfile
Gemfile.lock		Gemfile.lock
README.md		README.md
app.rb		app.rb
bgw_edge.png		bgw_edge.png
docker-compose.yml		docker-compose.yml
project-example.jpg		project-example.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio to Images

Setup

Up and Running

Example Output

UI Example

Custom Actions

About

Uh oh!

Releases

Packages

Uh oh!

Languages

unRARed/audio-images

Folders and files

Latest commit

History

Repository files navigation

Audio to Images

Setup

Up and Running

Example Output

UI Example

Custom Actions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages