Open Source Python Natural Language Processing (NLP) Tools

Python Natural Language Processing (NLP) Tools

View 189 business solutions

Browse free open source Python Natural Language Processing (NLP) Tools and projects below. Use the toggles on the left to filter open source Python Natural Language Processing (NLP) Tools by OS, license, language, programming language, and project status.

  • Gen AI apps are built with MongoDB Atlas Icon
    Gen AI apps are built with MongoDB Atlas

    The database for AI-powered applications.

    MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
    Start Free
  • Build Securely on Azure with Proven Frameworks Icon
    Build Securely on Azure with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 1
    Ciphey

    Ciphey

    Decrypt encryptions without knowing the key or cipher

    Fully automated decryption/decoding/cracking tool using natural language processing & artificial intelligence, along with some common sense. You don't know, you just know it's possibly encrypted. Ciphey will figure it out for you. Ciphey can solve most things in 3 seconds or less. Ciphey aims to be a tool to automate a lot of decryptions & decodings such as multiple base encodings, classical ciphers, hashes or more advanced cryptography. If you don't know much about cryptography, or you want to quickly check the ciphertext before working on it yourself, Ciphey is for you. The technical part. Ciphey uses a custom-built artificial intelligence module (AuSearch) with a Cipher Detection Interface to approximate what something is encrypted with. And then a custom-built, customizable natural language processing Language Checker Interface, which can detect when the given text becomes plaintext.
    Downloads: 7 This Week
    Last Update:
    See Project
  • 2
    DeepSparse

    DeepSparse

    Sparsity-aware deep learning inference runtime for CPUs

    A sparsity-aware enterprise inferencing system for AI models on CPUs. Maximize your CPU infrastructure with DeepSparse to run performant computer vision (CV), natural language processing (NLP), and large language models (LLMs).
    Downloads: 7 This Week
    Last Update:
    See Project
  • 3
    HanLP

    HanLP

    Han Language Processing

    HanLP is a multilingual Natural Language Processing (NLP) library composed of a series of models and algorithms. Built on TensorFlow 2.0, it was designed to advance state-of-the-art deep learning techniques and popularize the application of natural language processing in both academia and industry. HanLP is capable of lexical analysis (Chinese word segmentation, part-of-speech tagging, named entity recognition), syntax analysis, text classification, and sentiment analysis. It comes with pretrained models for numerous languages including Chinese and English. It offers efficient performance, clear structure and customizable features, with plenty more amazing features to look forward to on the roadmap.
    Downloads: 6 This Week
    Last Update:
    See Project
  • 4
    spaCy

    spaCy

    Industrial-strength Natural Language Processing (NLP)

    spaCy is a library built on the very latest research for advanced Natural Language Processing (NLP) in Python and Cython. Since its inception it was designed to be used for real world applications-- for building real products and gathering real insights. It comes with pretrained statistical models and word vectors, convolutional neural network models, easy deep learning integration and so much more. spaCy is the fastest syntactic parser in the world according to independent benchmarks, with an accuracy within 1% of the best available. It's blazing fast, easy to install and comes with a simple and productive API.
    Downloads: 6 This Week
    Last Update:
    See Project
  • Build Securely on AWS with Proven Frameworks Icon
    Build Securely on AWS with Proven Frameworks

    Lay a foundation for success with Tested Reference Architectures developed by Fortinet’s experts. Learn more in this white paper.

    Moving to the cloud brings new challenges. How can you manage a larger attack surface while ensuring great network performance? Turn to Fortinet’s Tested Reference Architectures, blueprints for designing and securing cloud environments built by cybersecurity experts. Learn more and explore use cases in this white paper.
    Download Now
  • 5
    Haystack

    Haystack

    Haystack is an open source NLP framework to interact with your data

    Apply the latest NLP technology to your own data with the use of Haystack's pipeline architecture. Implement production-ready semantic search, question answering, summarization and document ranking for a wide range of NLP applications. Evaluate components and fine-tune models. Ask questions in natural language and find granular answers in your documents using the latest QA models with the help of Haystack pipelines. Perform semantic search and retrieve ranked documents according to meaning, not just keywords! Make use of and compare the latest pre-trained transformer-based languages models like OpenAI’s GPT-3, BERT, RoBERTa, DPR, and more. Pick any Transformer model from Hugging Face's Model Hub, experiment, find the one that works. Use Haystack NLP components on top of Elasticsearch, OpenSearch, or plain SQL. Boost search performance with Pinecone, Milvus, FAISS, or Weaviate vector databases, and dense passage retrieval.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 6
    NVIDIA NeMo

    NVIDIA NeMo

    Toolkit for conversational AI

    NVIDIA NeMo, part of the NVIDIA AI platform, is a toolkit for building new state-of-the-art conversational AI models. NeMo has separate collections for Automatic Speech Recognition (ASR), Natural Language Processing (NLP), and Text-to-Speech (TTS) models. Each collection consists of prebuilt modules that include everything needed to train on your data. Every module can easily be customized, extended, and composed to create new conversational AI model architectures. Conversational AI architectures are typically large and require a lot of data and compute for training. NeMo uses PyTorch Lightning for easy and performant multi-GPU/multi-node mixed-precision training. Supported models: Jasper, QuartzNet, CitriNet, Conformer-CTC, Conformer-Transducer, Squeezeformer-CTC, Squeezeformer-Transducer, ContextNet, LSTM-Transducer (RNNT), LSTM-CTC. NGC collection of pre-trained speech processing models.
    Downloads: 5 This Week
    Last Update:
    See Project
  • 7
    Datasets

    Datasets

    Hub of ready-to-use datasets for ML models

    Datasets is a library for easily accessing and sharing datasets, and evaluation metrics for Natural Language Processing (NLP), computer vision, and audio tasks. Load a dataset in a single line of code, and use our powerful data processing methods to quickly get your dataset ready for training in a deep learning model. Backed by the Apache Arrow format, process large datasets with zero-copy reads without any memory constraints for optimal speed and efficiency. We also feature a deep integration with the Hugging Face Hub, allowing you to easily load and share a dataset with the wider NLP community. There are currently over 2658 datasets, and more than 34 metrics available. Datasets naturally frees the user from RAM memory limitation, all datasets are memory-mapped using an efficient zero-serialization cost backend (Apache Arrow). Smart caching: never wait for your data to process several times.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 8
    ModelScope

    ModelScope

    Bring the notion of Model-as-a-Service to life

    ModelScope is built upon the notion of “Model-as-a-Service” (MaaS). It seeks to bring together most advanced machine learning models from the AI community, and streamlines the process of leveraging AI models in real-world applications. The core ModelScope library open-sourced in this repository provides the interfaces and implementations that allow developers to perform model inference, training and evaluation. In particular, with rich layers of API abstraction, the ModelScope library offers unified experience to explore state-of-the-art models spanning across domains such as CV, NLP, Speech, Multi-Modality, and Scientific-computation. Model contributors of different areas can integrate models into the ModelScope ecosystem through the layered APIs, allowing easy and unified access to their models. Once integrated, model inference, fine-tuning, and evaluations can be done with only a few lines of code.
    Downloads: 4 This Week
    Last Update:
    See Project
  • 9
    Diffgram

    Diffgram

    Training data (data labeling, annotation, workflow) for all data types

    From ingesting data to exploring it, annotating it, and managing workflows. Diffgram is a single application that will improve your data labeling and bring all aspects of training data under a single roof. Diffgram is world’s first truly open source training data platform that focuses on giving its users an unlimited experience. This is aimed to reduce your data labeling bills and increase your Training Data Quality. Training Data is the art of supervising machines through data. This includes the activities of annotation, which produces structured data; ready to be consumed by a machine learning model. Annotation is required because raw media is considered to be unstructured and not usable without it. That’s why training data is required for many modern machine learning use cases including computer vision, natural language processing and speech recognition.
    Downloads: 3 This Week
    Last Update:
    See Project
  • Get the most trusted enterprise browser Icon
    Get the most trusted enterprise browser

    Advanced built-in security helps IT prevent breaches before they happen

    Defend against security incidents with Chrome Enterprise. Create customizable controls, manage extensions and set proactive alerts to keep your data and employees protected without slowing down productivity.
    Download Chrome
  • 10
    AllenNLP

    AllenNLP

    An open-source NLP research library, built on PyTorch

    AllenNLP makes it easy to design and evaluate new deep learning models for nearly any NLP problem, along with the infrastructure to easily run them in the cloud or on your laptop. AllenNLP includes reference implementations of high quality models for both core NLP problems (e.g. semantic role labeling) and NLP applications (e.g. textual entailment). AllenNLP supports loading "plugins" dynamically. A plugin is just a Python package that provides custom registered classes or additional allennlp subcommands. There is ecosystem of open source plugins, some of which are maintained by the AllenNLP team here at AI2, and some of which are maintained by the broader community. AllenNLP will automatically find any official AI2-maintained plugins that you have installed, but for AllenNLP to find personal or third-party plugins you've installed, you also have to create either a local plugins file named .allennlp_plugins in the directory where you run the allennlp command.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 11
    DataDreamer

    DataDreamer

    DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models

    DataDreamer is a tool designed to assist in the generation and manipulation of synthetic data for various applications, including testing and machine learning.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 12
    Machine Learning PyTorch Scikit-Learn

    Machine Learning PyTorch Scikit-Learn

    Code Repository for Machine Learning with PyTorch and Scikit-Learn

    Initially, this project started as the 4th edition of Python Machine Learning. However, after putting so much passion and hard work into the changes and new topics, we thought it deserved a new title. So, what’s new? There are many contents and additions, including the switch from TensorFlow to PyTorch, new chapters on graph neural networks and transformers, a new section on gradient boosting, and many more that I will detail in a separate blog post. For those who are interested in knowing what this book covers in general, I’d describe it as a comprehensive resource on the fundamental concepts of machine learning and deep learning. The first half of the book introduces readers to machine learning using scikit-learn, the defacto approach for working with tabular datasets. Then, the second half of this book focuses on deep learning, including applications to natural language processing and computer vision.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 13
    NNCF

    NNCF

    Neural Network Compression Framework for enhanced OpenVINO

    NNCF (Neural Network Compression Framework) is an optimization toolkit for deep learning models, designed to apply quantization, pruning, and other techniques to improve inference efficiency.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 14
    Obsei

    Obsei

    Obsei is a low code AI powered automation tool

    Obsei is an automated no-code/low-code AI-powered text observation and analysis framework, designed for extracting insights from unstructured text data such as social media, reviews, and logs.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 15
    SparseML

    SparseML

    Libraries for applying sparsification recipes to neural networks

    SparseML is an optimization toolkit for training and deploying deep learning models using sparsification techniques like pruning and quantization to improve efficiency.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 16
    Stanza

    Stanza

    Stanford NLP Python library for many human languages

    Stanza is a collection of accurate and efficient tools for the linguistic analysis of many human languages. Starting from raw text to syntactic analysis and entity recognition, Stanza brings state-of-the-art NLP models to languages of your choosing. Stanza is a Python natural language analysis package. It contains tools, which can be used in a pipeline, to convert a string containing human language text into lists of sentences and words, to generate base forms of those words, their parts of speech and morphological features, to give a syntactic structure dependency parse, and to recognize named entities. The toolkit is designed to be parallel among more than 70 languages, using the Universal Dependencies formalism. Stanza is built with highly accurate neural network components that also enable efficient training and evaluation with your own annotated data.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 17
    Text Generation Inference

    Text Generation Inference

    Large Language Model Text Generation Inference

    Text Generation Inference is a high-performance inference server for text generation models, optimized for Hugging Face's Transformers. It is designed to serve large language models efficiently with optimizations for performance and scalability.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 18
    Transformers4Rec

    Transformers4Rec

    Transformers4Rec is a flexible and efficient library

    Transformers4Rec is an advanced recommendation system library that leverages Transformer models for sequential and session-based recommendations. The library works as a bridge between natural language processing (NLP) and recommender systems (RecSys) by integrating with one of the most popular NLP frameworks, Hugging Face Transformers (HF). Transformers4Rec makes state-of-the-art transformer architectures available for RecSys researchers and industry practitioners. Traditional recommendation algorithms usually ignore the temporal dynamics and the sequence of interactions when trying to model user behavior. Generally, the next user interaction is related to the sequence of the user's previous choices. In some cases, it might be a repeated purchase or song play. User interests can also suffer from interest drift because preferences can change over time. Those challenges are addressed by the sequential recommendation task.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 19
    UniEM

    UniEM

    Unified embedding model

    UniEM is a unified embedding model designed to create high-quality text embeddings for various natural language processing tasks.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 20
    txtai

    txtai

    Build AI-powered semantic search applications

    txtai executes machine-learning workflows to transform data and build AI-powered semantic search applications. Traditional search systems use keywords to find data. Semantic search applications have an understanding of natural language and identify results that have the same meaning, not necessarily the same keywords. Backed by state-of-the-art machine learning models, data is transformed into vector representations for search (also known as embeddings). Innovation is happening at a rapid pace, models can understand concepts in documents, audio, images and more. Machine-learning pipelines to run extractive question-answering, zero-shot labeling, transcription, translation, summarization and text extraction. Cloud-native architecture that scales out with container orchestration systems (e.g. Kubernetes). Applications range from similarity search to complex NLP-driven data extractions to generate structured databases. The following applications are powered by txtai.
    Downloads: 2 This Week
    Last Update:
    See Project
  • 21
    AdalFlow

    AdalFlow

    The library to build & auto-optimize LLM applications

    AdalFlow is a framework for building AI-powered automation workflows, enabling users to design and execute intelligent automation pipelines with minimal coding.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 22
    Adapters

    Adapters

    A Unified Library for Parameter-Efficient Learning

    Adapters is an add-on library to HuggingFace's Transformers, integrating 10+ adapter methods into 20+ state-of-the-art Transformer models with minimal coding overhead for training and inference. Adapters provide a unified interface for efficient fine-tuning and modular transfer learning, supporting a myriad of features like full-precision or quantized training (e.g. Q-LoRA, Q-Bottleneck Adapters, or Q-PrefixTuning), adapter merging via task arithmetics or the composition of multiple adapters via composition blocks, allowing advanced research in parameter-efficient transfer learning for NLP tasks.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 23
    Automatic text summarizer

    Automatic text summarizer

    Module for automatic summarization of text documents and HTML pages

    Sumy is an automatic text summarization library that provides multiple algorithms for extracting key content from documents and articles. Simple library and command line utility for extracting summary from HTML pages or plain texts. The package also contains a simple evaluation framework for text summaries. Implemented summarization methods are described in the documentation. I also maintain a list of alternative implementations of the summarizers in various programming languages.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 24
    BEIR

    BEIR

    A Heterogeneous Benchmark for Information Retrieval

    BEIR is a benchmark framework for evaluating information retrieval models across various datasets and tasks, including document ranking and question answering.
    Downloads: 1 This Week
    Last Update:
    See Project
  • 25
    Chonkie

    Chonkie

    The no-nonsense RAG chunking library

    Chonkie is an AI-powered framework designed for building conversational agents and chatbots with natural language understanding and multi-turn conversation support.
    Downloads: 1 This Week
    Last Update:
    See Project
  • Previous
  • You're on page 1
  • 2
  • 3
  • 4
  • 5
  • Next
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.