-
Notifications
You must be signed in to change notification settings - Fork 18
Add class ReferenceData #1193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Would you really want to say that you can have Organizations as reference data? What about importing a list of Organizations from IndustryKG, DnB, or maybe some day from an official Organization licensing entity. |
I'm not sure a reference data is a real-world thing. It's more of a way we think about and use a set of data. |
This does not make any sense to me, what is the rationale? What are the pros & cons? A con is it is not intuitive. |
Organizations are not reference data in this model; categories are both types of reference data and allocated by organizations (or people or IP). This is the same thing we do now with IDs. |
Jamie: Reference data is a feature of the data and how it's used, not inherent to the data. Organizations could be reference data if you use a controlled vocabulary. Dave: I want a covering concept for things we make up and are not real-world things. A lot of things refer to them, but they don't refer to other things (much). Originally thought of this as immutable, whereas master data is highly mutable. Michael: These things are similar at the metadata level (information about classes), not about the instances of the classes. Rebecca: take language out - you can say a lot of things about languages. Michael/David: What is the use case? Is it just to tidy up the top level of the class hierarchy?
Rebecca: we need to pin down exactly what this concept means. We've mentioned various criteria but haven't agreed on which are defining:
|
On a somewhat related note, we are considering adding a new extension for our concept of a Person, namely that of Persona. Reason? An online presence is NOT the same thing as the Person that "manages" that online presence. And individuals (and organizations too) typically have many personas. To wit: my LinkedIn persona is not "me". Neither is my Facebook persona. One is focused and tailored to business purposes, the other to social/family. In addition to those, I also have political personas and art-related personas, and automobile-related personas, and motorcycle-related personas, etc with very little overlap. So while there can be persona-to-persona relationships (e.g. someone "follows" one of your personas), these may or may not correspond with actual person-to-person relationships. Anyway, I'd like to suggest considering such a refinement. |
@MichaelSullivanArchitect I think the concept of Persona has value, but I don't see it as a subclass of |
Agree on not subclassing it. Thanks for the feedback. |
SA dev team meeting: Phil: Just proposed in order to make the periodic table work? Rebecca: I feel like there's something there, but not sure how to define it. Mark: Feels like it's proposed as a covering class, when not necessary (as Dan said last meeting); similar to Cheryl: Is an example something like tax tables (look-up data)? Mark: If those are rules, that sounds like a spec. Reference data could be an aspect of something. Doug: Likes Dan's comment. Categories are stubs for things we don't want to model out in detail, but they could be with different use cases. So not inherently reference data. We asked if anyone wants the class and is willing to defend it; no one raised their hand. Pel: Not useful in enough contexts to warrant it being included in gist. DECISION: Do not define this class. @mkumba please note. Can keep as column header in gist 14 periodic table. |
Most large companies have "reference data." The finance industry calls it ref data (for the longest time at Goldman Sachs I thought people were saying "rough data" which i supposed was related to "rough computing" (https://www.sciencedirect.com/science/article/abs/pii/S0952197620302529 ) but it turned out to have nothing to do with that because it was ref data. Most other industries have different names for it, but everyone has it. What it is is mostly static, simple lookup tables. Many companies refer to it as their lookup tables. About every decade or so they have a project to round up all the enums and lookup tables in all their systems and put them in one place. Then they go back to sprawl for a while. I think its a real thing (ask Mike Atkin by the way) The real problem is what is its scope. Most firms include country codes in their ref data, clearly we would not (or maybe we should, maybe we should have the ISO codes in ref data, but the real country in geospatial, not a bad idea, given the ambiguity around the land mass and the government., we could have both the Ukrainian land mass (constantly changing) and the Ukrainian government (hopefully prevailing) both point to UKR and .uk). ISO seems to favor land mass over government (Antartica and Western Sahara) but have Palestine, but in most uses seems not to distinguish. So maybe the way to maintain the ambiguity that most people seem to like is have the ambiguity in the ref data. Most companies include currency codes in ref data, and we will if we have units of measure in our ref data. Some have units of measure, although its curious that not all do. Most of what is in most ref data is what we call categories. The classic ref data is gender. I think what most firms think of are things that they can put in a single large table where the columns are "group" (what we would cal the subcategory), "code", "short label", "description" and "definition". People often like to have codes in their ref data to provide multiple language labels. Note: Master data is not ref data. Specifications (which is really product master data) is not ref data. I suggested putting "language" in here and you guys talked me out of it, the the more I think about it I think language belongs here as well. The main used of language as reference data is the short codes that go in language coded strings, and references to which languages are spoken in which countries. Most reference data has very simple (probably should be nearly identical) structure (like the table above), should change very slowly (most companies update their ref data annually, so almost immutable, and have what Katariina Kari calls high page rank items (lots of links in, few links out) Tibco's definition is ok (except they want to make it part of the domain of master data, and then turn around and include transaction codes, which clearly aren't) wikipedia was pretty good, although I didn't follow their calendar example, (although as I think of it again financial services have a lot of codes about whether to use 360, 365 or 365.25 days in the calculation of interest and things like that all have codes. Again check with Mike Atkin, he's working on this stuff as we speak. Collibra also pretty good, very similar, they lumped in product codes (and pricing!) which is I think is wrong Atlan picked up one I'd missed: TimeZones Starburst make a few interesting distinctions. They correctly distinguish ref data from master, from transaction and from analytic data. They point out that changing master data typically doesn't affect work flow where changing ref data might. They distinguish external ref data (country and currency codes) from internal (business units and trans types) and point out the value in doing data virtualization Even IBM likes reference data But if you guys really want to ditch it and make the periodic table look like crap, ok. |
Also, don't know if you caught it, but even TopQuadrant, in Steve Hedden's architecture had a prominent place for Reference Data |
@mkumba
Let's imagine a class called data; reference data would be a subclass. Let's call an instance a datum. Points against include:
On the penultimate point, my first thought is that an individual reference datum is a piece of information, an assertion represented as a triple.
Or no, maybe its just an individual? We don't want to be reifying triples.
|
Just to add to this: we describe any data that directly supports the ontology "curated" data. Do with that as you will. |
We can do the same thing here that you propose with countries and country codes: language codes are reference data, but the languages themselves are not. They have internal properties, relationships to people and countries who speak them, etc. (In fact, the Tibco and Atlan references list language codes, but not languages, as reference data. Starburst includes customer segments and pricing as reference data. It's not clear that all uses would consider these unstructured categories. Looking at the examples that recur in all the sources, we have things like postal codes, country codes, language codes, industry codes, etc. These can be modeled as identifiers. Not all categories that we define are reference data. So the concept still doesn't seem well-defined enough to constitute a class. |
Reopening to discuss additional comments from Dave. |
gist dev team meeting 4/24: Rebecca: Distinguish codes from the things they identify - e.g., language codes vs languages, country codes vs countries, etc. Pel: Data is reference data relative to how it's used, not in and of itself. Relativizing things seems like the role of a predicate rather than a class. Dave: The idea that some people's reference data is someone else's data is interesting but not widely accepted. Master data is vendors, customers, products, etc. - different from reference data. Rebecca: A client could define a class as a subclass of Jamie: Schema metadata argument not relevant to reference data, because it's not reference data. Rebecca: Units of measure are definitely reference data. Dan: Reference data feels like artifact - a class that we add as an organizing, covering class, and then decide that's not the way we want to organize things. Michael: Would call Aspect SchemaMetada. Straw poll: Define |
Definition from Dave:
We may want to add something to the definition about the fact that reference data is frequently (though not always) a standard across an industry or domain and shared across organizations. |
gist:Aspect
a subclass ofgist:ReferenceData
.gist:UnitOfMeasure
a subclass ofgist:ReferenceData
.gist:Language
a subclass ofgist:ReferenceData
.gist:Category
a subclass ofgist:ReferenceData
.gist:Organization
is disjoint withgist:UnitOfMeasure
. This should be changed togist:ReferenceData
, presumably.The text was updated successfully, but these errors were encountered: