Skip to content
Change the repository type filter

All

    Repositories list

    • A JavaScript/Typescript client for the Unstructured Platform API
      TypeScript
      MIT License
      175452Updated Jun 20, 2025Jun 20, 2025
    • A Python client for the Unstructured Platform API
      Python
      MIT License
      17104116Updated Jun 20, 2025Jun 20, 2025
    • Python
      Apache License 2.0
      1502Updated Jun 19, 2025Jun 19, 2025
    • docs

      Public
      Documentation for all Unstructured products and libraries
      MDX
      236015Updated Jun 18, 2025Jun 18, 2025
    • HTML
      Apache License 2.0
      46935522Updated Jun 18, 2025Jun 18, 2025
    • Python
      Apache License 2.0
      164756337Updated Jun 18, 2025Jun 18, 2025
    • Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
      HTML
      Apache License 2.0
      96112k16549Updated Jun 17, 2025Jun 17, 2025
    • Store Dockerfiles and Packer configs for images to use as a base to build upon
      Shell
      Apache License 2.0
      2411Updated Jun 17, 2025Jun 17, 2025
    • Python
      Apache License 2.0
      621862212Updated Jun 13, 2025Jun 13, 2025
    • notebooks

      Public
      Jupyter Notebook
      0100Updated Jun 12, 2025Jun 12, 2025
    • UNS-MCP

      Public
      Jupyter Notebook
      133001Updated May 26, 2025May 26, 2025
    • .github

      Public
      2021Updated Mar 19, 2025Mar 19, 2025
    • Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
      Python
      Apache License 2.0
      8.4k3700Updated Mar 17, 2025Mar 17, 2025
    • A Python wrapper for Google Tesseract
      Python
      Apache License 2.0
      732400Updated Mar 5, 2025Mar 5, 2025
    • Reference architecture that provides a set of guidelines and best practices for implementing a central AI API gateway to empower various line-of-business units in an organization to leverage Azure AI services
      Bicep
      MIT License
      80100Updated Nov 22, 2024Nov 22, 2024
    • Script to accompany the AWS blog post on unstructured data ETL with Unstructured Ingest library
      Python
      Apache License 2.0
      0000Updated Oct 16, 2024Oct 16, 2024
    • Pairing Technical Challenge
      TypeScript
      0000Updated Sep 4, 2024Sep 4, 2024
    • FedRAMP formatted model cards
      0100Updated Aug 29, 2024Aug 29, 2024
    • danswer

      Public
      Gen-AI Chat for Teams - Think ChatGPT if it had access to your team's unique knowledge.
      Python
      Other
      1.7k1001Updated Aug 23, 2024Aug 23, 2024
    • JS Client Batch Processing
      JavaScript
      0000Updated Jul 31, 2024Jul 31, 2024
    • Main package repository for production Wolfi images
      C
      Other
      356000Updated Jul 10, 2024Jul 10, 2024
    • pipeline-sec-filings

      Public archive
      Preprocessing pipeline notebooks and API supporting text extraction from SEC documents
      Jupyter Notebook
      Apache License 2.0
      3214657Updated Jan 1, 2024Jan 1, 2024
    • Python
      Apache License 2.0
      8804Updated Oct 2, 2023Oct 2, 2023
    • Pipeline for extraction information from Army OERs
      Jupyter Notebook
      Apache License 2.0
      5816Updated Oct 1, 2023Oct 1, 2023
    • Pipeline for converting PDFs to raw text with PaddleOCR
      Jupyter Notebook
      Apache License 2.0
      72315Updated Aug 21, 2023Aug 21, 2023
    • langchain

      Public
      ⚡ Building applications with LLMs through composability ⚡
      Python
      MIT License
      18k800Updated Aug 18, 2023Aug 18, 2023
    • Python
      Apache License 2.0
      122821Updated Aug 4, 2023Aug 4, 2023
    • Terraform module that implements a web app on ECS and supports autoscaling, CI/CD, monitoring, ALB integration, and much more.
      HCL
      Apache License 2.0
      156200Updated Jul 6, 2023Jul 6, 2023
    • Terraform module which implements an ECS service which exposes a web service via ALB.
      HCL
      Apache License 2.0
      197000Updated Jul 6, 2023Jul 6, 2023
    • Pipeline for layout extraction
      Python
      Apache License 2.0
      2111Updated Jul 3, 2023Jul 3, 2023