Post by ummefatihaayat22 on Feb 13, 2024 5:16:51 GMT -5
More than an event, Databeers is a community around data and what can be done with it. Different speakers tell how they use them in different areas and projects, from investigating the environment, to how to find people on Facebook with generic data or taking advantage of them in the legal sector. The Institute of Knowledge Engineering (IIC) had the opportunity to be present at the return of these meetings, with the participation of Helena Montoro, computational linguist. The Natural Language Processing (NLP) expert explained how real texts were worked on and prepared in a project in the legal sector.
Through the File Map project , developed by the IIC in collaboration with the Garrigues law firm, Helena Montoro focused on working with data, in this case legal texts, and above all on the reality of the data . The ultimate goal Germany Telemarketing Data of the project was to speed up the review of large-volume judicial files , through the classification of legal documents and the detection of legal entities, for which we worked with 79.6 GB of unlabeled data , belonging to 6 judicial files. First of all, the texts had to be reviewed in order to annotate them. This is when the team encounters some problems to solve: the documents are not digitized, there are pages that do not contribute anything (non-informative) and the PDF files contain concatenated legal documents that must be delimited.
Therefore, in addition to the tasks to achieve the main objectives of the project, others were added to process the texts and ensure their quality. For example, character recognition techniques (OCR-Optical Character Recognition) had to be used to transcribe and digitize the documents. Different models were also trained to discard non-informative pages and to split PDF files into the individual documents they contained.
Thus, the expert wanted to capture some common challenges that the IIC team of computational linguists faces when working with data, where what is expected is not usually what the real texts record. Therefore, the review, treatment and preprocessing of data is essential in NLP projects and essential to obtain the best results. Practical examples of this transformation of processes and teams were the protagonist experiences of the final round table, on the day-to-day life of Artificial Intelligence. Representatives from Amadix, ASISA and, again, the 12 de Octubre University Hospital participated.In the case of the insurer, the project with the IIC focused on optimizing the authorization process for a test or treatment. “We have transformed,” said Paloma Ruiz, attached to the Central-North Territorial Directorate of ASISA. “The key is to gain confidence and see how to improve the process. “We changed the entire operation and streamlined the decision for the customer ,” he explained.
For Juan Luis Cruz, head of the IT Service at the 12 de Octubre University Hospital, almost everything also involves obtaining that trust from the clinician . “You have to have a system that has been working for a long time, showing a prediction even if it is not yet used… Creating the algorithm is simple, validating it is not,” he highlighted.