Skip to main content
Identifying Personal Data Elements

How Privado identifies and classifies personal data elements

N
Written by Nikhil Kukade
Updated this week

Privado identifies over 200 personal data elements out-of-the-box to support compliance for regulations like GDPR, CPRA, HIPAA, and frameworks like PCI-DSS. Most modern privacy laws define personal data as any set of data that is linked to a user directly or indirectly. Some common types of data elements that Privado identifies:

  • Personally Identifiable Information or PII: Names, email address, national identification numbers(SSN in US, Aadhar number in India, etc.)

  • Online Identifiers: Cookies, IP address, mac address, device IDs (IDFA, Android IDs)

  • Sensitive Data: Health data, credit card data, sexual Preferences, etc.

Data Elements in Privado

Data elements in Privado have the following attributes:

  1. Data Element: Name of the data element identified by Privado

  2. Category: Grouping of different data elements in Privado, this is super helpful when you want to inform the end users or regulators on the types of personal data your products/apps are processing

  3. Sensitivity: Determines the risk level for the data element as high, medium or low. Sensitivity allows companies to reflect the business risk of processing personal data across code repositories.

These values are configurable for each account and can be changed based on your internal data dictionary as well.

How Privado Builds a Personal Data Inventory

Privado scans the code and tags variables, classes, objects & functions processing personal data. To tag variables as personal data, Privado runs the following heuristics:

  1. Regex: In the first pass, Privado uses a regex rules library to tag & classify personal data. This tagger considers code semantics like tagging variables, classes objects & not enums. The base regex rules are open-source and maintained here

  2. ML Models: Next Privado uses our machine learning models trained on millions of lines of open source code to eliminate some of these variables tagged from Regex and tags some more variables that did not match our rules. Think of this like fuzzing; developers do not write perfect code, but Privado can still tag these variables.

  3. Dynamic Rules: As Privado is rolled out, developers give feedback on findings by marking them Confirmed, False Positive & Non-Personal, based on this Privado automatically adjusts its base rules.

Controls for Managing Data Element Identification

Privado offers the following controls for data elements identified:

  • Confirmed: Correct finding

  • False Positive: Incorrect discovery by Privado and is removed from the dashboard

  • Non-Personal: Not an individual's personal data. For example company's phone number detected by Privado is Non-Personal

Based on the feedback provided using these controls Privado improves with each scan reducing false positives overtime and improving the accuracy of the privacy code scanning.

Custom Data Elements

Beyond the out-of-the-box data elements offered by Privado, companies can add data elements custom to their company. Privado allows you to define the name, category, sensitivity, and rules for these custom data elements. Get in touch with your Account Manager to enable custom data elements for your account.

Did this answer your question?