NLP (Natural Language Processing) Model to Extract & Classify Interesting Topics

Client Loaction

Berlin,
Germany

Tech Stack

Python, SciKit Learn, Tensorflow, Google Cloud Services

Team

2 Data Science Engineers

We developed two AI Models, one for extracting topics with natural language processing from the idea and classifying them into categories, another for finding related ideas within each cluster and recommend them to social network users.

Client &
Challenges

Our client is a startup developing a social network for exchanging ideas extracted from articles and books. To keep the users engaged, they needed a solution to extract topics and classify them to recommend ideas to users based on personalized topic interests profiles.

Solution &
Result

The project was focused on developing a K-Nearest Neighbor (KNN) model for classification and topic extraction with data embedding in clusters. Thanks to that, the system can categorize ideas found in articles and books and, in the latter part of the system, daily recommend them to users interested in a particular subject. For recommendations, we used behavior modeling with a custom recommendation strategy that evolved as the project grew.

Some categories assigned by the initial classifier were huge. Optimizing category size was crucial to get recommended ideas that are interesting for the user and assure that the platform can suggest new and engaging content daily to every user. To get the best results in content personalization, we optimized category sizes with a modified KNN model that also accounts for the final category size and preprocessing of data to make all variables similarly scaled and centered.

The solution also covered the transition matrix between topics, enabling users to switch between topics and discover new exciting categories. Furthermore, such a transition matrix with topical user profiling discriminates the stream of recommended ideas for a particular user from the interests and recommendations of others, with a potential to predict future user interests.