<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>IA | Daniel Arbelaez Alvarez</title><link>https://portfolio.sprintjudicial.com/en/tags/ia/</link><atom:link href="https://portfolio.sprintjudicial.com/en/tags/ia/index.xml" rel="self" type="application/rss+xml"/><description>IA</description><generator>Hugo Blox Builder (https://hugoblox.com)</generator><language>en-us</language><lastBuildDate>Mon, 01 Dec 2025 00:00:00 +0000</lastBuildDate><image><url>https://portfolio.sprintjudicial.com/media/icon_hu7729264130191091259.png</url><title>IA</title><link>https://portfolio.sprintjudicial.com/en/tags/ia/</link></image><item><title>Sherlock-docs - Intelligent Legal Document Processing</title><link>https://portfolio.sprintjudicial.com/en/project/sherlock-docs/</link><pubDate>Mon, 01 Dec 2025 00:00:00 +0000</pubDate><guid>https://portfolio.sprintjudicial.com/en/project/sherlock-docs/</guid><description>&lt;p>&lt;strong>Sherlock-docs&lt;/strong> is an intelligent processing system for judicial documents (tutelas and habeas corpus) that combines OCR, named entity recognition (NER), and duplicate detection. All processing is 100% local, with no data sent to external APIs.&lt;/p>
&lt;h2 id="key-features">Key Features&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Hybrid OCR&lt;/strong>: PaddleOCR + Tesseract for scanned and digital documents&lt;/li>
&lt;li>&lt;strong>Legal NER&lt;/strong>: Entity extraction with SpaCy, F1 score of 85.3% validated by humans&lt;/li>
&lt;li>&lt;strong>Multi-Level Duplicate Detection&lt;/strong>: SHA-256 hash, LZJD fuzzy hash, TF-IDF, Sentence-Transformers&lt;/li>
&lt;li>&lt;strong>Advanced Search&lt;/strong>: SQLite FTS5 full-text search with advanced filtering&lt;/li>
&lt;li>&lt;strong>Complete REST API&lt;/strong>: 22 FastAPI endpoints with JWT authentication and Swagger documentation&lt;/li>
&lt;li>&lt;strong>Graphical Interface&lt;/strong>: 9 Streamlit pages for interactive document management&lt;/li>
&lt;li>&lt;strong>CLI&lt;/strong>: 20 commands for batch operations&lt;/li>
&lt;li>&lt;strong>Active Learning&lt;/strong>: Interface for human validation of NER entities&lt;/li>
&lt;li>&lt;strong>Export&lt;/strong>: Excel report generation&lt;/li>
&lt;/ul>
&lt;h2 id="technologies-used">Technologies Used&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Backend&lt;/strong>: Python 3.12.4, FastAPI, Pydantic&lt;/li>
&lt;li>&lt;strong>OCR&lt;/strong>: PaddleOCR + Tesseract&lt;/li>
&lt;li>&lt;strong>NLP/NER&lt;/strong>: SpaCy (Spanish legal model)&lt;/li>
&lt;li>&lt;strong>Database&lt;/strong>: SQLite + FTS5 (33 columns, 11 indexes, WAL mode)&lt;/li>
&lt;li>&lt;strong>Frontend&lt;/strong>: Streamlit (9 pages)&lt;/li>
&lt;li>&lt;strong>Architecture&lt;/strong>: Layered Clean Architecture, Result-Oriented Programming (returns)&lt;/li>
&lt;li>&lt;strong>Security&lt;/strong>: JWT, RBAC, rate limiting, security headers&lt;/li>
&lt;li>&lt;strong>Deployment&lt;/strong>: Docker, Easypanel, automated GitHub Webhook&lt;/li>
&lt;/ul>
&lt;h2 id="architecture">Architecture&lt;/h2>
&lt;p>5 layers with ServiceContainer (14 lazy-loaded properties):&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Core&lt;/strong>: Entities, value objects&lt;/li>
&lt;li>&lt;strong>Application&lt;/strong>: Use cases, DTOs, ServiceContainer&lt;/li>
&lt;li>&lt;strong>Infrastructure&lt;/strong>: OCR, NER, deduplication, logging&lt;/li>
&lt;li>&lt;strong>Persistence&lt;/strong>: SQLite + FTS5 with ISP ports&lt;/li>
&lt;li>&lt;strong>Interfaces&lt;/strong>: Streamlit GUI, CLI (20 commands), FastAPI REST (22 endpoints)&lt;/li>
&lt;/ul>
&lt;h2 id="achievements">Achievements&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>1,832 tests&lt;/strong> (1,770 unit + 62 integration) with &lt;strong>87% coverage&lt;/strong>&lt;/li>
&lt;li>&lt;strong>NER F1 85.3%&lt;/strong> with data validated by human operators&lt;/li>
&lt;li>&lt;strong>22 REST endpoints&lt;/strong> documented and functional&lt;/li>
&lt;li>&lt;strong>16/16 security SECs&lt;/strong> + 4 SEC-API implemented&lt;/li>
&lt;li>&lt;strong>9 sprints&lt;/strong> completed (S23-S28) with 30 planning documents&lt;/li>
&lt;li>&lt;strong>mypy 0 errors&lt;/strong> in strict mode across 130 files&lt;/li>
&lt;li>&lt;strong>Certified SDD audit&lt;/strong>: 39 conforming, 0 defects&lt;/li>
&lt;li>&lt;strong>Code quality&lt;/strong>: 9.1/10&lt;/li>
&lt;/ul>
&lt;h2 id="impact">Impact&lt;/h2>
&lt;p>Sherlock-docs processes ~100 documents/day with a performance target of &amp;lt;2 minutes for 20-page documents and &amp;lt;100ms for digital documents, eliminating manual classification of judicial documents and automatically detecting duplicate filings.&lt;/p>
&lt;p>This system was born from the need to automate the registration and classification of tutelas and habeas corpus in the Colombian judicial system, where duplicate detection and accurate extraction of procedural party information are critical to judicial office efficiency.&lt;/p></description></item><item><title>TYBABot - Virtual Assistant for Justicia XXI Web</title><link>https://portfolio.sprintjudicial.com/en/project/tybabot/</link><pubDate>Fri, 19 Sep 2025 00:00:00 +0000</pubDate><guid>https://portfolio.sprintjudicial.com/en/project/tybabot/</guid><description>&lt;p>&lt;strong>TYBABot&lt;/strong> is a specialized virtual assistant for Colombia&amp;rsquo;s Justicia XXI Web (Tyba) system, developed with Streamlit and the OpenAI API. It provides guidance on electronic judicial management to officials and system operators.&lt;/p>
&lt;h2 id="key-features">Key Features&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Document Management&lt;/strong>: Guidance on e-judicial documents and their handling in Tyba&lt;/li>
&lt;li>&lt;strong>Judicial Proceedings&lt;/strong>: Guide on registering and querying procedural actions&lt;/li>
&lt;li>&lt;strong>Petitions&lt;/strong>: Assistance in managing electronic petitions (memoriales)&lt;/li>
&lt;li>&lt;strong>Notifications&lt;/strong>: Guidance on the digital notification system&lt;/li>
&lt;li>&lt;strong>Electronic Signatures&lt;/strong>: Support for judicial electronic signature processes&lt;/li>
&lt;/ul>
&lt;h2 id="technologies-used">Technologies Used&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Backend&lt;/strong>: Python, OpenAI Assistants API v2&lt;/li>
&lt;li>&lt;strong>Frontend&lt;/strong>: Streamlit&lt;/li>
&lt;li>&lt;strong>AI&lt;/strong>: Language model with specialized knowledge base on Tyba&lt;/li>
&lt;li>&lt;strong>Deployment&lt;/strong>: Streamlit Cloud&lt;/li>
&lt;/ul>
&lt;h2 id="impact">Impact&lt;/h2>
&lt;p>TYBABot reduces the learning curve for judicial officials interacting with the Tyba system, providing contextualized responses based on the official system documentation and best practices for electronic judicial management.&lt;/p>
&lt;p>This assistant was born from direct experience working with the Justicia XXI Web system in the Judicial Branch, identifying the most frequent difficulties officials face when adopting electronic document management tools.&lt;/p></description></item></channel></rss>