
Definition
The PII test detects and validates the presence of personal identifiable information (PII) in your data. The test supports detection of a comprehensive range of PII types, including financial information, government identifiers, contact details, and location data across multiple countries and regions. You can specify one or multiple PII types to check for, and set thresholds on either the absolute count or percentage of rows containing PII.Taxonomy
- Task types: LLM, tabular classification, tabular regression, text classification.
- Availability: and .
Why it matters
- Data privacy compliance: Ensures your data meets privacy regulations like GDPR, CCPA, and other data protection laws
- Security: Prevents accidental exposure of sensitive personal information
- Model safety: LLMs are prone to memorizing and potentially leaking PII from training data
- Audit trail: Provides documentation of PII detection for compliance reporting
Supported PII types
General PII Types
Type | Description |
---|---|
CREDIT_CARD | Credit card numbers (various formats) |
EMAIL_ADDRESS | Email addresses |
PHONE_NUMBER | Phone numbers (various formats) |
IP_ADDRESS | IP addresses |
URL | Web URLs |
DATE_TIME | Date and time information |
LOCATION | Geographic locations |
PERSON | Person names |
CRYPTO | Cryptocurrency addresses |
MEDICAL_LICENSE | Medical license numbers |
NRP | National registry of persons |
IBAN_CODE | International Bank Account Numbers |
United States
Type | Description |
---|---|
US_SSN | Social Security Numbers |
US_BANK_NUMBER | US bank account numbers |
US_DRIVER_LICENSE | US driver’s license numbers |
US_ITIN | Individual Taxpayer Identification Numbers |
US_PASSPORT | US passport numbers |
United Kingdom
Type | Description |
---|---|
UK_NHS | National Health Service numbers |
UK_NINO | National Insurance numbers |
European Union
Type | Description |
---|---|
ES_NIF | Spanish tax identification numbers |
ES_NIE | Spanish foreigner identification numbers |
IT_FISCAL_CODE | Italian tax codes |
IT_DRIVER_LICENSE | Italian driver’s licenses |
IT_VAT_CODE | Italian VAT codes |
IT_PASSPORT | Italian passport numbers |
IT_IDENTITY_CARD | Italian identity cards |
FI_PERSONAL_IDENTITY_CODE | Finnish personal identity codes |
PL_PESEL | Polish personal identification numbers |
Asia-Pacific
Type | Description |
---|---|
SG_NRIC_FIN | Singapore NRIC/FIN numbers |
SG_UEN | Singapore Unique Entity Numbers |
AU_ABN | Australian Business Numbers |
AU_ACN | Australian Company Numbers |
AU_TFN | Australian Tax File Numbers |
AU_MEDICARE | Australian Medicare numbers |
IN_PAN | Indian Permanent Account Numbers |
IN_AADHAAR | Indian Aadhaar numbers |
IN_VEHICLE_REGISTRATION | Indian vehicle registration |
IN_VOTER | Indian voter ID numbers |
IN_PASSPORT | Indian passport numbers |
South America
Type | Description |
---|---|
BR_CPF | Brazilian individual taxpayer registry |
BR_CNPJ | Brazilian national registry of legal entities |
Test configuration examples
If you are writing atests.json
, here are a few valid configurations for the PII test: