Note: This is not a duplicate question as I have gone through this answer and made the necessary package downgrade but it still results in the same error. Details below.
# System Details
- MacBook Air (M1, 2020)
- MacOS Monterey 12.3
- Python 3.10.8 (Miniconda environment)
- Relevant library versions from
pip freeze
importlib-metadata==3.4.0
PyMuPDF==1.21.1
spacy==3.4.4
spacy-alignments==0.9.0
spacy-legacy==3.0.11
spacy-loggers==1.0.4
spacy-transformers==1.2.0
streamlit==1.17.0
flair==0.11.3
catalogue==2.0.8
# Setup
- I am trying to use
Spacy
for some text processing over a pdf document uploaded to aStreamlit
app. - The
Streamlit
app basically contains an upload button, submit button (which calls the preprocessing and spacy functions), and atext_area
to display the processed text.
Here is the working code for uploading a pdf document and extracting its text -
import streamlit as st
import fitz
def load_file(file):
doc = fitz.open(stream=uploaded_file.read(), filetype="pdf")
text = []
with doc:
for page in doc:
text.append(page.get_text())
text = "\n".join(text)
return text
#####################################################################
st.title("Test app")
col1, col2 = st.columns([1,1], gap='small')
with col1:
with st.expander("Description -", expanded=True):
st.write("This is the description of the app.")
with col2:
with st.form(key="my_form"):
uploaded_file = st.file_uploader("Upload",type='pdf', accept_multiple_files=False, label_visibility="collapsed")
submit_button = st.form_submit_button(label="Process")
#####################################################################
col1, col2 = st.columns([1,3], gap='small')
with col1:
st.header("Metrics")
with col2:
st.header("Text")
if uploaded_file is not None:
text = load_file(uploaded_file)
st.text_area(text)
# Reproduce base code
- install necessary libraries
- save above code to a
test.py
file - from terminal navigate to folder and run
streamlit run test.py
- navigate to
http://localhost:8501/
in browser - download this sample pdf and upload it to the app as an example
This results in a functioning app -
# Issue I am facing
Now, the issue comes when I add spacy
to the python file using import spacy
and rerun the streamlit app, this error pops up -
AttributeError: 'PathDistribution' object has no attribute '_normalized_name'
Traceback:
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
exec(code, module.__dict__)
File "/Users/akshay_sehgal/Library/CloudStorage/________/Documents/Code/Demo UI/Streamlit/keyphrase_extraction_template/test.py", line 3, in <module>
import spacy
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/spacy/__init__.py", line 6, in <module>
from .errors import setup_default_warnings
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/spacy/errors.py", line 2, in <module>
from .compat import Literal
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/spacy/compat.py", line 3, in <module>
from thinc.util import copy_array
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/thinc/__init__.py", line 5, in <module>
from .config import registry
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/thinc/config.py", line 1, in <module>
import catalogue
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/site-packages/catalogue/__init__.py", line 20, in <module>
AVAILABLE_ENTRY_POINTS = importlib_metadata.entry_points() # type: ignore
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/importlib/metadata/__init__.py", line 1009, in entry_points
return SelectableGroups.load(eps).select(**params)
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/importlib/metadata/__init__.py", line 459, in load
ordered = sorted(eps, key=by_group)
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/importlib/metadata/__init__.py", line 1006, in <genexpr>
eps = itertools.chain.from_iterable(
File "/Users/akshay_sehgal/miniconda3/envs/demo_ui/lib/python3.10/importlib/metadata/_itertools.py", line 16, in unique_everseen
k = key(element)
# What have I tried?
- First thing I tried was to isolate the spacy code and run it in a notebook in the specific environment, which worked without any issue.
- Next, after researching SO (this answer) and the github issues, I found that
importlib.metadata
could be the potential culprit and therefore I downgraded this using the following code, but it didn't fix anything.
pip uninstall importlib-metadata
pip install importlib-metadata==3.4.0
I removed the complete environment, and setup the whole thing again, from scratch, following the same steps I used the first time (just in case I had made some mistake during its setup). But still the same error.
Final option I would be left with, is to containerize the spacy processing as an API, and then call it via the streamlit app using
requests
I would be happy to share the requirements.txt
if needed, but I will have to figure out how to upload it somewhere via my office pc. Do let me know if that is required and I will find a way.
Would appreciate any help in solving this issue!