In today's competitive landscape, organizations are often hampered by a portfolio of disconnected legacy and modern applications. This creates information silos, manual process inefficiencies, and significant barriers to innovation. This Unified Application Integration Framework project is a strategic initiative designed to address these challenges head-on.
The project's core goal is to "integrate diverse existing / legacy applications or API services" by creating an intelligent middleware layer. This framework will automatically analyze data from various systems, understand the underlying business processes, and expose the combined functionality / use cases through a single, modern, and unified interface keeping in sync this interactions with the underlying integrated applications backends.
I’d like to share some thoughts regarding what could be this Semantics / AI enabled Business Integration / Enterprise Application Integration (EAI) platform with a reactive microservices backend I'm willing to implement. I’m currently looking for advice on the frameworks, techniques and patterns used for the implementation.
Particularly I’m needing help with a custom way to encode embeddings, enabling GenAI / MCP custom interactions (not just similarity but also relationships inference). I'm needing help or suggestions about this proposal and the overall architecture, choosing the right components and tools layout before moving into the implementation phase. This includes evaluating which GenAI models and tools could be used somehow leveraging this approach.
The idea is to build a layered semantic (RDF4J) set of models with their own levels of abstraction backing a set of microservices (Spring) from data ingestion from integrated business / legacy applications, their datasources, files and APIs to an Activation layer which exposes a unified interface to the integrated applications use cases keeping in sync integrated applications backend with this layer’ interactions.
I’ll be using RDF / FCA (Formal Concept Analysis) for inference in an Aggregation layer, an FCA-based embeddings model for an Alignment layer and DDD (Domain Driven Development) / DOM (Dynamic Object Model) / DCI (Data, Context and Interaction) and Actor / Role Pattern for the mentioned Activation layer.
https://github.com/sebxama/sebxama/raw/refs/heads/main/ApplicationService.pdf
Implementation Roadmap (Work In Progress, needs cleanup):
https://github.com/sebxama/sebxama/raw/refs/heads/main/RoadmapDetail1.4.pdf
https://github.com/sebxama/sebxama/raw/refs/heads/main/RoadmapDetail3.0.pdf
https://github.com/sebxama/sebxama/raw/refs/heads/main/RoadmapDetail3.1.pdf
https://github.com/sebxama/sebxama/raw/refs/heads/main/RoadmapDetail3.2.pdf
I'm currently just starting with the specification and design phases needing advice in this initial phase about tools and implementation choices.
Sebastian Samaruga. https://sebxama.blogspot.com
The concept is to manage raw data (SPO Triples) inputs into layers which perform Aggregation (Type, State, Order inference), Alignment (Attribute / Values Relationships inference, linking and matching) and Activation (Roles / Actors, Contexts, Interactions inference).
Once Aggregation, Alignment and Activation has been performed one should be able to manipulate Data at Context (use case) level, with Actors (data) playing Roles (Types) in interactions (Contexts execution instances).
One should be able to ask for Contexts Interactions with a desired outcome, via inference performed on which Actors should play which Roles.
One should be able to navigate previous Interactions (Contexts executions) or to create new ones (Contexts invocation).
For example, in the 'Family' context:
A cousin is the son of the brother of the father of another son.
An interaction could be:
Peter is the son of John. John is the brother of Arthur. Arthur is the father of Charlie. Peter is the cousin of Charlie.
Inputs: Triples SPO. Type inference (i.e.: Subjects w./ same Attributes). State inference (Attributes with the same Values). Order (Inferred via Types / States hierarchies).
Inputs: Aggregation Triples augmented with Type, State and Order Metadata. Information inference: Attributes / Values Relationships. Linking inference: (Entities Relationships) Equivalences inference (Entities, Attributes, Values Matching)
Actors Activation in Contexts Interactions Roles according Aggregation and Alignment Metadata.
Inputs: Aggregation and Alignment augmented Triples. Roles / Actors inference. Contexts inference. Interactions inference.
This is a first draft of a document in respect to what could be (feedback needed) a BI (Business Integration) and EAI (Enterprise Applications Integration) through Semantic Web framework / toolkit:
Integrate the domains of various applications into a unified frontend or interface. Extract all data sources from the applications to be integrated and represent them in a unified way.
Find relationships and equivalences between the data of the applications to be unified and their possible interactions. Use cases of / between applications.
Expose through an API the possible interactions to be invoked, their contexts roles and transactions interactions actors, and synchronize transaction data with the original applications.
The idea is that by doing an "ETL" of all the tables / schemas / APIs / documents of your domain and applications, translating the sources into triples (nodes, arcs: knowledge graph) the framework can infer your entity types, relationships and the contexts, "use cases", possible between your applications generating a generic overlay (APIs, Frontend) in which to integrate in a unified, conversational and "discoverable" interface (API, web assistant, chatbot) the integrated contexts interaction in / between the source applications.
To unify and integrate diverse data sources, I transform all the information from each source into triples (Entity, Attribute, Value) and their context into a graph in the "Datasources" component. The other components deal with inference (aggregation), alignment and "activation" (exposing the description of the possible contexts and their interactions in / between the integrated applications). The last component could be a generic frontend or an API endpoint to interact according to the metadata of each context (use case).
The architecture would be microservices with five components, which for now are "black boxes", interfaces for reactive microservices to implement their algorithms with functional / streams programming.
The components are:
-
Datasources Service: ETL (tabular, APIs, documents to triples knowledge graph). Populate initial graph. Synchronization with the backends of the integrated source applications according to the interactions of the contexts (Activation inference generated APIs).
-
Aggregation Service: Contexts / Relationships, Types, States inference. From the "raw" data, infer types and meta-types (state) of the entities of the datasources to be integrated through their attributes and their values in a given context (relationships).
I consider entities with the same attributes as the same type, superset / subset of attributes: type hierarchy. Attributes with the same values, same states. Superset / subset of values / states: order inference.
-
Alignment Service: Ontology Matching. Find equivalent contexts / types / states / entities / relationships. Missing Links / Attributes inference. Upper ontology* alignment.
-
Activation Service: Use Case Types (Contexts / Roles) and Instances (Interactions / Actors) inference, APIs description metadata. DCI Design Pattern*. Possible / past interactions (transactions) for each context / actors roles.
-
API Service / Generic Frontend: Generic discoverable / browseable Use Case (Contexts / Interactions) APIs from Activation Service metadata.
All services would have an administration interface for each step of the workflow with a graph-oriented backend (RDF4J or Neo4j), leveraging Graph NNs and LLMs / NLP through functional / reactive programming of the component microservices and their tasks.
Defining the "schema" of the graphs for each input/output of each component. Through functional and "reactive" programming, implementing algorithms that incrementally "parse" graphs and their respective inferences in each service so that the system is dynamic and iterative (incremental integration)
Simple example (use cases): I have fruits and vegetables, I can open a greengrocer's. I want to open a greengrocer's, I need fruits and vegetables. Actors: supplier, greengrocer, customer. Contexts / Interactions: supply, sale, etc.
Another example: I have these indicators that I inferred from the ETL, what reports can I put together? I want a report about these aspects of this topic, what indicators (roles) do I need to add.
Ultimately, it is about creating a "generator" of unified interfaces for the integration of current or legacy applications or data sources (DBs, APIs, documents, etc.) in order to expose diverse sources in an unified way, such as a web frontend (generic use case wizards), chatbots, API endpoints, etc.
The nodes and arcs of the graph triples are URIs and have a "retrievable" internal representation with metadata that each service / layer populates through the "helper" services: Registry, Naming (NLP) and Index service shared by each layer.
There is something called "Web3" that uses decentralized blockchain for the management of identifiers (URIs as DIDs: W3C Decentralized Identifiers*) and their interactions and semantics (smart contracts for example). Since the nodes and arcs of the graphs are URIs, it would not be unreasonable to use the Java APIs that are available on GitHub for this (DIDs) to facilitate the interaction of different instances or deployments of this framework between different organizations.
https://github.com/sebxama/sebxama
https://github.com/sebxama/scrapbook
https://github.com/sebxama/scrapbook/raw/refs/heads/master/SemanticWebAlignmentTheory.pdf
-
Upper ontology: https://en.wikipedia.org/wiki/Upper_ontology
-
DCI: https://en.wikipedia.org/wiki/Data,_context_and_interaction
-
W3C DIDs: https://en.wikipedia.org/wiki/Decentralized_identifier
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.