User PII Cleanup

Overview

The user deletion requirement in inQuiry has been originated from the below requirement.

PRD: [PRD] Delete Account functionality

BE Design Lern - [Design] Delete Account Functionality

FE Design Lern - [Design] [Front-end]Delete User Functionality

What is changing?

The user can request for deletion of their account in Sunbird, this means two primary actions to happen.

  1. User's Personal Identifiable Information (PII) needs to be removed

  2. The assets (like questions, questionSets, content etc) that was created by this user needs to be transferred to an identified user.

Changes for Learn:

  1. Learn BB provided user delete api which produces a kafka event on <env>.delete.user topic.

  2. For more details on the delete user api, please visit

Changes for inQuiry:

  1. PII data cleanup feature released under inQuiry 7.0.0 release.

  2. inQuiry provided a flink job user-pii-data-updater for user PII data cleanup activity.

  3. The flink job listen to <env>.delete.user kafka topic and process the data accordingly.

  4. The job works for all object type (including Question, QuestionSet, Content, Collection, Assets) which are configured to the job.

  5. The flink job will search for all the objects (configured with flink job) owned by deleted user and update the pii field configured under object schema(e.g: creator) to the pre-configured value (e.g: Deleted User / Anonymous user) with the flink job.

  6. PII field configuration is part of each object schema (config.json) because there could be different pii field for each object.

  7. Sample PII Config is as below:

    Ref: https://github.com/Sunbird-inQuiry/inquiry-api-service/blob/a0352eb2dfa6ccc4433dc15c44610db286deb12e/schemas/question/1.0/config.json#L58

    "PII_Fields": { "user": { "createdBy": ["creator"] }, "org": { } }

  8. The job triggers a notification email to the org admin with all identifiers affected for Deleted user.

Release Tags:

Variables Added to user-pii-data-updater flink job for PII Cleanup Use case:

Code And Configuration Changes:

If you are interested in adopting this feature by making code changes in your forked repository of sunbird, then please checkout below code and configuration changes: Code Changes: For user-pii-data-updater flink job code, please checkout the link below:

Below changes need to be done for creating kafka topic needed by user-pii-data-updater-job: ansible/roles/inquiry-setup-kafka/defaults/main.yml

- name: delete.user
    num_of_partitions: 1
    replication_factor: 1  

- name: delete.user
    retention_time: 172800000
    replication_factor: 1

If the above topic already exists, you can skip this change.

Configuration Changes: Add below configuration in kubernetes/helm_charts/datapipeline_jobs/values.j2 file:

For Variables used in below configuration, please refer to Variables Section Above.

user-pii-data-updater:
  user-pii-data-updater: |+
    include file("/data/flink/conf/base-config.conf")
    kafka {
      input.topic = "{{ user_pii_updater_kafka_topic_name }}"
      groupId = "{{ user_pii_updater_group }}"
    }
    task {
      consumer.parallelism = 1
      parallelism = 1
      router.parallelism = 1
    }
    target_object_types={{ user_pii_target_object_types }}
    user_pii_replacement_value="{{ user_pii_replacement_value }}"
    admin_email_notification_enable={{ enable_admin_email_notification | default('true') }}
    userorg_service_base_url="{{ user_org_service_base_url }}"
    notification {
      email {
        subject: "{{ email_notification_subject }}",
        regards: "{{ email_notification_regards }}"
      }
    }
  flink-conf: |+
    jobmanager.memory.flink.size: {{ flink_job_names['user-pii-data-updater'].jobmanager_memory }}
    taskmanager.memory.flink.size: {{ flink_job_names['user-pii-data-updater'].taskmanager_memory }}
    taskmanager.numberOfTaskSlots: {{ flink_job_names['user-pii-data-updater'].taskslots }}
    parallelism.default: 1
    jobmanager.execution.failover-strategy: region
    taskmanager.memory.network.fraction: 0.1

Add below configuration in kubernetes/ansible/roles/flink-jobs-deploy/defaults/main.yml file:

flink_job_names:
  user-pii-data-updater:
    job_class_name: 'org.sunbird.job.user.pii.updater.task.UserPiiUpdaterStreamTask'
    replica: 1
    jobmanager_memory: 2048m
    taskmanager_memory: 2048m
    taskslots: 1
    cpu_requests: 0.3


### user-pii-data-updater config
user_pii_updater_kafka_topic_name: "{{ env_name }}.delete.user"
user_pii_updater_group: "{{ env_name }}-user-pii-updater-group"
user_pii_target_object_types: '{
  "Question": ["1.0", "1.1"],
  "QuestionSet": ["1.0", "1.1"]
}'
user_pii_replacement_value: "Deleted User"
user_org_service_base_url: "http://{{private_ingressgateway_ip}}/userorg"
email_notification_subject: "User Account Deletion Notification"
email_notification_regards: "Team"

Last updated