Matching App Processing


Preprocessing rules

  1. Stopword elimination eg. Govt, YYY
  2. Strip Punctuation and start and end whitespace
  3. Convert to lower case

Notes from Hip chat meeting

Constraints and observations about partner data by Gautam

  • Currently, without a KLP ID, we cannot show a school on KLP WWW or capture it in KLP EMS.
  • Without a KLP ID, we cannot capture child data in the KLP EMS or show it on KLP WWW.
  • All of our data is linked to an academic year.
  • Currently, on KLP WWW, we show only one academic year for infrastructure etc. but program data is shown across academic years.
  • Partner data, be it DISE or Sikshana or Akshaya Patra, does not have a KLP ID. For schools or children.
  • Partner data is also linked to academic years.
  • We need to match Partner data, schools and children, to existing KLP IDs (and DISE IDs) before we can show it on KLP WWW.
  • Matching at the child level is a huge pain. I think we should drop that idea.
  • If partner data is in independent DBs, then all we need is school KLP IDs. But we need to match them with existing KLP IDs.
  • Academic years of each data set needs to be in sync.
  • KLP has lots of historical KLP IDs which have associated DISE IDs that have not kept pace with DISE changes. Which makes things hard.
  • Also, for example, there are schools in DISE that are not in KLP so we cannot show them on KLP till we create those schools in KLPWWW and give them KLP IDs.

Important takeaways

  • DISE changes from year to year for the same school. KLP ID is the only contant and should be the identifier for all our data (and partner data). It is the Primary key.
  • KLP ID can be mapped to DISE ID + Academic year (AC_YEAR) so as to enable comparison of same school data over years.
  • IF a school gets promoted or demoted, its stays as the same school.
  • We need a workflow to map partner school ids / School names / DISE IDs + AC_Year for every partner.
  • We need different approaches for matching different partners


  1. All partner data in independent DBs.
  2. At the child level, if need be.
  3. All partner schools must have a associated KLP ID. There will be a mapping table/column for this in every DB/Table
  4. Without a KLP ID, nothing shows up on KLPWWW.
  5. All partner data is associated with an academic year.
  6. TBD - Need a good answer for this. Child level data and promotion of children need to be thought about. One approach would be to have child level data but let the researcher map it as mapping it would be tedious. We keep school-class-academic year data separate for each partner.
  7. TDB - Look at sikshana data and apmdm data.

Next Steps

  • DISE to DISE matching across years.
  • DISE to KLP matching.
  • Partner to KLP matching.
  • TBDs above
  • An API to generate a KLP ID for a school that does not exist in our system
  • An API to identify pre-existing school in our system. Given a set of params we need to see if a school already exists. If not create using the API above.

We'll use a mix of name matching + human matching and see how it goes

Last modified 4 years ago Last modified on 04/01/14 01:02:30

Attachments (1)

Download all attachments as: .zip