Know Your Data

In our blog dated 12th May 2021, we saw an overview of the Microsoft Information Protection Solution. We discussed that the first step is to discover Sensitive Data and Classify the same.

In this blog we shall see some methods of Data Classification.

Identifying and classifying sensitive items that are under your organizations control is the first step in the Information Protection process. Microsoft 365 provides three ways of identifying items so that they can be classified:

1) Manually by users

This method requires human judgement and action. An admin may either use the pre-existing labels and sensitive information types or create their own and then publish them. Users and admins apply them to content as they come across the same or create it. You can then protect the content and manage it.

2) Automated pattern recognition, like sensitive information types

This category of classification mechanisms includes finding content by:

  •  Keywords or metadata values (keyword query language).
  • Recognizing an item because it’s a variation on a template (document finger printing). Currently this is a detection method applicable in Exchange Online only.
  • Using the presence of exact strings (exact data match).  Please note that this EDM feature is available only in the following licenses:
  • Office 365 E5; Microsoft 365 E5 ; Microsoft 365 E5 Compliance ;  Microsoft 365 / A5 Information Protection and Governance.

Sensitivity and retention labels can then be automatically applied to make the content available for use in Learn about data loss prevention) and auto-apply polices for retention labels.

3) Machine learning or Classifiers

When you publish the classifier, it sorts through items in locations like SharePoint Online, Exchange, and OneDrive, and classifies the content. After you publish the classifier, you can continue to train it using a feedback process that is similar to the initial training process.


Classifiers only work with items that are not encrypted and are in English.

Pre-trained classifiers

Microsoft 365 comes with five pre-trained classifiers:

  • Offensive language
  • Resumes
  • Source Code
  • Harassment
  • Profanity
  • Threat

Custom classifiers

When the pre-trained Classifiers do not meet your needs, you can create & train your own Classifiers. There is significantly more work involved with creating your own, but they’ll be much better tailored to your organization’s needs. 

For example, you could create trainable Classifiers for:

  • Legal documents – such as attorney client privilege, closing sets, statement of work.

  • Strategic business documents – like press releases, merger and acquisition, deals, business or marketing plans, intellectual property, patents, design docs.

  • Pricing information – like invoices, price quotes, work orders, bidding documents.

  • Financial information – such as organizational investments, quarterly or annual results.



Our Solutions