- 
                Notifications
    You must be signed in to change notification settings 
- Fork 37
Developing and deploying converters
A converter is a service that converts each input file into one or more output files, along with metadata.
For instance, an html converter will convert input HTML into an output PDF, text and thumbnail. A zip converter will convert input Zip into output files with unknown content-types.
The Ingest-Pipeline document describes how Overview juggles the input and output files.
Converters share zero code with Overview. You can write them in any language.
A converter polls Overview's worker for tasks, via HTTP. The converter then streams its output as a multipart/form-data HTTP POST.
Follow instructions at overview-convert-framework to build a converter. In the end, you'll have it pushed to Docker Hub. In this example, let's call that image overview/overview-convert-thing:1.2.3.
You'll need to change overview-server to handle the new file type. (We can't make plugins auto-register their supported file types, because Overview needs to know their file types before they start up.)
- Edit converter_versions.env: add CONVERT_THING_IMAGE=overview/overview-convert-thing:1.2.3`. This will be used in docker-compose files.
- Edit docker-compose.ymland add a clause forconvert-thing. Give it aPOLL_URLofhttp://overview-worker:9032/Thing.
- Edit integration-test/docker-compose.ymland add a clause foroverview-convert-thing. Also addoverview-convert-thingtointegration-test'sdepends_onarray.
- Add a test file to integration-test/files/file-upload-spec/XXX.thing. Test that it produces the desired files inintegration-test/spec/file_upload_spec.rb. (Overview's integration tests just prove that Overview invokes the converter and that the converter runs. The converter itself should test that it handles all possible inputs.)
- Prepare to deploy to Kubernetes: add sed -e "s@CONVERT_THING_IMAGE@$CONVERT_THING_IMAGE@"tokubernetes/common. Addapply_template convert-thing.ymltokubernetes/deployAnd create aconvert-thing.ymlconfig file, probably by copyingconvert-email.ymland replacingEmailwithThing,emailwiththingandEMAILwithTHING. Set appropriatelimits,minReplicasandmaxReplicas.
- Add to worker/src/main/scala/com/overviewdocs/ingest/process/Step.scala. For instance, anHttpStepof"Thing" -> 0.2means: when the converter finishes outputting data, we are 20% closer to producing documents than we were before the converter ran. (If your converter outputs PDF+thumbnail+text with wantOcr:false and wantSplitByPage:false, then use"Thing" -> 1.0.)
- Alter worker/src/main/scala/com/overviewdocs/ingest/process/Decider.scala: add aNextStep.Thingand make some MIME types point to it.
- Alter worker/src/test/scala/com/overviewdocs/ingest/process/DeciderSpec.scala: add"Thing"tostepsand write a test to convince yourself Overview chooses it.
- 
./devand test uploading a file manually.
- docker/build && integration-test/run-in-docker-compose
- Commit and push. Jenkins will deploy it to Kubernetes when integration tests pass.
Once Jenkins deploys to production, it will have pushed images to Docker Hub. Now you can use them in overview-local:
- Edit config/overview.defaults.env: add aCONVERT_THING_IMAGEline, and changeOVERVIEW_VERSIONto the version you committed in step 11.
- Edit config/overview.yml: add the exact clause you added to overview-server'sintegration-test/docker-compose.yml.
- 
./update && ./start-after-git-pullto test.
- Commit and push. Users will get your new code when they ./update.
- Release the new converter. The instructions are converter-specific, but they'll all end with a new Docker image on Docker Hub. Let's say it's overview/overview-convert-thing:1.2.4.
- Update overview-server:
- Alter converter_versions.env:CONVERT_THING_IMAGE=overview/overview-convert-thing:1.2.4
- integration-test/run-in-docker-compose
- Commit and push. Users will get your new code when they ./update.
 
- Alter 
- Update overview-local:
- Alter config/overview.defaults.env:CONVERT_THING_IMAGE=overview/overview-convert-thing:1.2.4. You don't need to edit overview-local'sOVERVIEW_VERSIONif you're only updating a converter; but it's good practice. Wait for Jenkins to finish with the overview-server commit you just pushed, and then updateOVERVIEW_VERSION.
- 
./update && ./start-after-git-pullto test.
- Commit and push. Users will get your new code when they ./update.
 
- Alter