Data Model
Data Model
Introduction
Creating a data model is the first step in secondary development.
Alfresco has already preset general data models. For details, please refer to contentModel.xml in the Alfresco source code.
Therefore, if it is just a simple extension, it may not involve creating a new data model, but understanding the relevant knowledge of data models helps to understand Alfresco's concepts and underlying design, which is very helpful for secondary development.
However, if it involves business expansion, such as: archive management. After uploading archive files, if you want to manage more attributes of the archives, such as: archive number, archiving personnel, valid period, etc., you need to create new data models to correspond.
Modeling Basics
Data models define the data stored in the repository. Data models are critical. Without them, Alfresco is just a file system. The following is a list of key information that data models provide for Alfresco:
- Fundamental data types
Basic data types and how these data types should be saved to the database. For example: "String" and "Date" types, etc. - Higher order data types
Higher-order data types, such as: "content" and "folder", etc. - Aspects
Groups of properties, such as: "auditable" and "classifiable", etc. - Properties
Properties (or metadata) specific to each data type. - Constraints
Constraints imposed on property values (for example, property values must match a specific pattern or must come from a specific list of possible values). - Index
How to index content for searching. - Associations
Relationships between nodes.
Data models are built using a small set of building blocks: Types, Properties, Property types, Constraints, Associations, Aspects.
Types
Types are like types or classes in object-oriented programming. They can be used to model business objects. They have properties and can inherit from parent types. "Content", "Person", and "Folder" are three important existing types.
You can customize types according to business needs. Examples include things like "expense report", "medical record", "movie", "song", and "review".
Note that types, properties, constraints, associations, and aspects all have names. Through the use of namespaces specific to the model, names are unique throughout the repository. Namespaces have abbreviations. This tutorial assumes that we are implementing for a fictional company named SomeCo. Therefore, SomeCo might define a custom model that declares a namespace with URI "http://www.someco.com/model/content/1.0" and prefix "sc". The name of any type defined as part of that model will be prefixed with "sc:". Data models are actually defined using XML with specific namespaces and prefixes. When multiple data models are defined, using namespaces in this way helps prevent name conflicts.
Properties
Properties are pieces of metadata associated with a specific type. For example, properties of an expense report might include things like "employee name", "submission date", "project", "customer", "expense report number", "total amount", and "currency". The expense report may also contain a "content" property to hold the actual expense report file (for example, it might be a PDF or Excel spreadsheet).
Property Types
Property types (or data types) describe the basic data types that the repository will use to store properties. The basic types of properties mainly include:
- d:text Text
- d:mltext Multilingual text
- d:content Content (file)
- d:int Integer
- d:long Long integer
- d:float Floating point
- d:double Double precision floating point
- d:date Date
- d:datetime Time
- d:boolean Boolean
Constraints
You can optionally use constraints to restrict values in properties. There are four commonly used constraint types: REGEX, LIST, MINMAX, and LENGTH.
- REGEX is used to ensure property values match a regular expression pattern
- LIST is used to define a list of possible values for a property
- MINMAX provides a numerical range for property values
- LENGTH sets limits on the length of strings
Constraints can be defined once and reused throughout the model. For example, contentModel.xml in the Alfresco source code defines a constraint named cm:filename, which restricts that filenames must not contain special characters. If a property in a custom type needs to restrict values to match the filename pattern, the custom model does not need to define the constraint again, but only needs to reference the constraint cm:filename.
<!-- contentModel.xml -->
<constraints>
<constraint name="cm:filename" type="REGEX">
<parameter name="expression"><value><![CDATA[(.*[\"\*\\\>\<\?\/\:\|]+.*)|(.*[\.]?.*[\.]+$)|(.*[ ]+$)]]></value></parameter>
<parameter name="requiresMatch"><value>false</value></parameter>
</constraint>
<constraint name="cm:userNameConstraint" type="org.alfresco.repo.dictionary.constraint.UserNameConstraint" />
<constraint name="cm:authorityNameConstraint" type="org.alfresco.repo.dictionary.constraint.AuthorityNameConstraint" />
<constraint name="cm:storeSelectorConstraint" type="REGISTERED">
<parameter name="registeredName"><value>defaultStoreSelector</value></parameter>
</constraint>
</constraints>Associations
Associations define relationships between types. For example, for a contract, the contract may be associated with other types (such as bidding documents, qualification materials, invoices, etc.).
Associations come in two forms:
- Parent-Child parent-child association
- Source-Target peer association
An existing association is cm:contains. This association defines the parent-child association relationship between folders and objects inside the folder (subfolders and files).
Aspects
- Properties and Associations can be defined in Aspects
- Aspects can be bound to Types, so that types have the Properties and Associations defined in Aspects
- Aspects can be reused, that is, they can be bound to different Types at the same time
To better illustrate the concept and role of Aspects, let's give an example:
For example, in order to manage contract documents, we added a new type: cus:contract. cus:contract inherits the cm:content type, thus possessing the attributes of general documents (name, title, content, creator, creation time, etc.).
But this is not enough. For contracts, we need to manage new attributes: contract number and valid date.
Without using Aspects, there are two approaches:
Add properties to the parent type (cm:content): contract number and valid date
This will make all cm:content types and other types that inherit cm:content have the attributes of contract number and valid date.
This is not a good choice and will produce attribute redundancy, because not all types need the attributes of contract number and valid date.Add properties to the contract type (cus:contract): contract number and valid date
This approach is slightly better than the previous one, but still not the best choice.
Although contract number is a unique attribute of contracts, valid date is obviously not. Other types, such as archives, system files, etc., may also need to manage the valid date attribute.
Therefore, we need to use property groups:
- Define the unique attribute contract number on the contract type (cus:contract).
- Define the common attribute valid date on a new Aspect, such as: com:valid.
- Aspects can be reused, so com:valid can be bound to needed types, such as: contracts, archives, system files, etc.
Content Modeling Best Practices
The following are some best practices to consider when modeling content, or suggestions for content modeling:
Do not modify existing data models
Try not to change existing data models, that is, do not directly modify the data model definition files in the source code.
Instead, extend them with your own custom data models. If you need to manage different types of content, create a data type for each content type. The content type can inherit cm:content or inherit a custom root content type.Be conservative and only add known and confirmed properties
From early business research to final system implementation, the attributes in the data model should be a process of increasing until slowly stabilizing, rather than defining uncertain attributes in the data model from the beginning.
Once the data model is activated, adding properties is easy, but deleting properties is not easy. You may get "integrity errors" because even if you decide not to use an attribute later, it may have already been used. When this happens, the options can be:- Keep the old model
- Try exporting content, modifying the ACP XML file, and then re-importing
- Delete the Alfresco table, clear the data directory, and start over
As long as everyone on the team is aware of this, it's not a big problem during development. But be sure to establish corresponding rules and standardized processes to deal with data model changes after going into production.
Avoid unnecessary data model depth
Complex data model depth is likely to cause difficulty in later maintenance.
Make full use of Aspects
Using Aspects, in addition to improving potential performance and saving overhead, also promotes the reuse of models, business logic, and presentation layers.
When two or more content types have common properties, consider extracting these properties into an Aspect.Appropriately define types without specific properties and associations
There are at least two situations where you can consider defining a type that inherits everything needed from its parent type without its own specific properties and associations:
- For search purposes, distinguish subtypes from parent types
- Specialized behaviors applicable to subtype instances can be defined
Remember that folders are also a type
Like everything else in the model, folders are types, which means they can be inherited.
When a type may contain other types, you can model this type as inheriting the folder type (cm:folder).Don't be afraid to define XML files for multiple data models
It makes sense to split data models into multiple namespaces, that is, multiple XML files, for example defining contract and archive data models in separate XML files.
XML file names are best meaningful. Try to avoid data model files named "customModel.xml" or "myModel.xml".Use existing data models
Before customizing models, it is best to first understand Alfresco's existing data models. This is not only helpful for understanding data models, but also avoids duplicate definitions.
Existing Data Models
The Alfresco source code is an indispensable reference tool. The following table describes several existing data model files.
| File | Namespace | Prefix | Description |
|---|---|---|---|
| dictionaryModel.xml | model/dictionary/1.0 | d | Defines basic data types |
| systemModel.xml | model/system/1.0 | sys | Defines system root types |
| system/registry/1.0 | reg | ||
| system/modules/1.0 | module | ||
| contentModel.xml | model/content/1.0 | sys | The most commonly used data model, defining basic types such as files and folders |
| model/exif/1.0 | exif | ||
| model/audio/1.0 | audio | ||
| model/webdav/1.0 | webdav | ||
| bpmModel.xml | model/bpm/1.0 | bpm | Defines workflow related data types |
| forumModel.xml | model/forum/1.0 | fm | Defines forum related data types |
For brevity, approximately 25 other model files are omitted. For details, please refer to the source code.
In addition to the model files, the modelSchema.xsd file is also a good reference. As the name implies, it defines the XML vocabulary that Alfresco content model XML files must comply with.
