All Collections
Privado Code Analyzer
Code Analyzer
Data Discovery with Code scanning
Data Discovery with Code scanning

How does Privado discover data elements

Written by Nikhil Kukade
Updated over a week ago

Privado out of the box discovers over 200 data elements supporting regulations like GDPR, CPRA, HIPAA, and frameworks like PCI-DSS. Most modern privacy laws define personal data as any set of data that is linked to a user directly or indirectly. Some common types of data elements that Privado discovers:

  • Identifiers or PII: Names, Email Address, National Identification Numbers(SSN in US, Aadhar Number in India etc)

  • Online Identifiers: Cookies, IP Address, Mac Address, Device IDs(IDFA, Android IDs)

  • Sensitive Data: Health Data, Card Data, Sexual Preferences, etc.

Data Elements in Privado

Data elements in Privado have the following attributes:

  1. Data Element: Name of the data element discovered by Privado

  2. Category: Grouping of different data elements in Privado, this is super helpful when you want to inform the end users or regulators on the types of personal data your products/apps are processing

  3. Sensitivity: Determines the risk level for the data element & has three options - High, Medium & Low. Sensitivity allows companies to reflect the business risk of processing personal data across code repositories.

These values are configurable for each account and can be changed based on your internal data dictionary as well.

How Data Discovery Works in Privado

Privado scans the code and tags variables, classes, objects & functions processing personal data. To tag variables as personal data, Privado runs the following heuristics:

  1. Regex: In the first pass, Privado uses a regex rules library to tag & classify personal data. This tagger considers code semantics like tagging variables, classes objects & not enums. The base regex rules are open-source and maintained here

  2. ML Models: Next Privado uses our machine learning models trained on millions of lines of open source code to eliminate some of these variables tagged from Regex and tags some more variables that did not match our rules. Think of this like fuzzing; developers do not write perfect code, but Privado can still tag these variables.

  3. Dynamic Rules: As Privado is rolled out, developers give feedback on findings by marking them Confirmed, False Positive & Non-Personal, based on this Privado automatically adjusts its base rules.

Controls for Managing Data Discovery

Privado offers the following controls for data elements discovered:

  • Confirmed: Correct finding

  • False Positive: Incorrect discovery by Privado and is removed from the dashboard

  • Non-Personal: Not an individual's personal data. For example company's phone number detected by Privado is Non-Personal

Based on the feedback provided using these controls Privado improves with each scan reducing FPs overtime and improving the accuracy of the privacy code scanner.

Custom Data Elements

Beyond the out of the box data elements offered by Privado, companies can add data elements custom to their company. Privado allows you to define the name, category, sensitivity, and rules for these custom data elements. Get in touch with your Account Manager to enable custom data elements for your account.

Did this answer your question?