0

Note: This is not a duplicate question as I have gone through this answer and made the necessary package downgrade but it still results in the same error. Details below.

# System Details

  • MacBook Air (M1, 2020)
  • MacOS Monterey 12.3
  • Python 3.10.8 (Miniconda environment)
  • Relevant library versions from pip freeze
importlib-metadata==3.4.0
PyMuPDF==1.21.1
spacy==3.4.4
spacy-alignments==0.9.0
spacy-legacy==3.0.11
spacy-loggers==1.0.4
spacy-transformers==1.2.0
streamlit==1.17.0
flair==0.11.3
catalogue==2.0.8

# Setup

  • I am trying to use Spacy for some text processing over a pdf document uploaded to a Streamlit app.
  • The Streamlit app basically contains an upload button, submit button (which calls the preprocessing and spacy functions), and a text_area to display the processed text.

Here is the working code for uploading a pdf document and extracting its text -

import streamlit as st
import fitz

def load_file(file):
    doc = fitz.open(stream=uploaded_file.read(), filetype="pdf")    
    text = []
    with doc:
        for page in doc:
            text.append(page.get_text())
        text = "\n".join(text)
    return text

#####################################################################   

st.title("Test app")

col1, col2 = st.columns([1,1], gap='small')

with col1:
    with st.expander("Description -", expanded=True):
        st.write("This is the description of the app.")
    
with col2:
    with st.form(key="my_form"):
        uploaded_file = st.file_uploader("Upload",type='pdf', accept_multiple_files=False, label_visibility="collapsed")
        submit_button = st.form_submit_button(label="Process")        

#####################################################################        
        
col1, col2 = st.columns([1,3], gap='small')

with col1:
    st.header("Metrics")

with col2:
    st.header("Text")
    
    if uploaded_file is not None:
        text = load_file(uploaded_file)
        st.text_area(text)

# Reproduce base code

  • install necessary libraries
  • save above code to a test.py file
  • from terminal navigate to folder and run streamlit run test.py
  • navigate to http://localhost:8501/ in browser
  • download this sample pdf and upload it to the app as an example

This results in a functioning app -

enter image description here

# Issue I am facing

Now, the issue comes when I add spacy to the python file using import spacy and rerun the streamlit app, this error pops up -

AttributeError: 'PathDistribution' object has no attribute '_normalized_name'
Traceback:
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
    exec(code, module.__dict__)
File "/Users/akshay_sehgal/Library/CloudStorage/________/Documents/Code/Demo UI/Streamlit/keyphrase_extraction_template/test.py", line 3, in <module>
    import spacy
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/spacy/__init__.py", line 6, in <module>
    from .errors import setup_default_warnings
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/spacy/errors.py", line 2, in <module>
    from .compat import Literal
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/spacy/compat.py", line 3, in <module>
    from thinc.util import copy_array
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/thinc/__init__.py", line 5, in <module>
    from .config import registry
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/thinc/config.py", line 1, in <module>
    import catalogue
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/catalogue/__init__.py", line 20, in <module>
    AVAILABLE_ENTRY_POINTS = importlib_metadata.entry_points()  # type: ignore
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/importlib/metadata/__init__.py", line 1009, in entry_points
    return SelectableGroups.load(eps).select(**params)
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/importlib/metadata/__init__.py", line 459, in load
    ordered = sorted(eps, key=by_group)
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/importlib/metadata/__init__.py", line 1006, in <genexpr>
    eps = itertools.chain.from_iterable(
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/importlib/metadata/_itertools.py", line 16, in unique_everseen
    k = key(element)

# What have I tried?

  1. First thing I tried was to isolate the spacy code and run it in a notebook in the specific environment, which worked without any issue.
  2. Next, after researching SO (this answer) and the github issues, I found that importlib.metadata could be the potential culprit and therefore I downgraded this using the following code, but it didn't fix anything.
pip uninstall importlib-metadata
pip install importlib-metadata==3.4.0
  1. I removed the complete environment, and setup the whole thing again, from scratch, following the same steps I used the first time (just in case I had made some mistake during its setup). But still the same error.

  2. Final option I would be left with, is to containerize the spacy processing as an API, and then call it via the streamlit app using requests

I would be happy to share the requirements.txt if needed, but I will have to figure out how to upload it somewhere via my office pc. Do let me know if that is required and I will find a way.

Would appreciate any help in solving this issue!

Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51

1 Answers1

0

Upgrade importlib-metadata to importlib-metadata>=4.3.0 to avoid this particular error.

There can be complicated interactions between the built-in importlib.metadata and the additional importlib_metadata package, and you need a newer version of importlib-metadata to get some of the updates/fixes related to this.

With python 3.10 and importlib-metadata==3.4.0, you can see this error with the following example (spacy and streamlit are not required):

import importlib_metadata
import importlib.metadata
importlib.metadata.entry_points()
aab
  • 10,858
  • 22
  • 38
  • let me try upgrading it. For now I have setup a separate API for the spacy component which I am calling from the streamlit, which works as intended. – Akshay Sehgal Jan 17 '23 at 21:59