Lookup error resource stopwords not found - Исправление ошибок и поиск оптимальных решений проблем

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and
privacy statement. We’ll occasionally send you account related emails.

Already on GitHub?
Sign in
to your account

Closed

ksednew opened this issue

Apr 19, 2018

· 22 comments

Closed

Cannot import NLTK stopwords after install

#685

ksednew opened this issue

Apr 19, 2018

· 22 comments

Comments

On a Mac using Python 3.6 and Anaconda. Have installed NLTK and used both command line and manual download of stop words. I see the stop word folder in NLTK folder, but cannot get it to load in my Jupyter notebook:

from nltk.corpus import stopwords

LookupError Traceback (most recent call last)
/anaconda3/lib/python3.6/site-packages/nltk/corpus/util.py in __load(self)
79 except LookupError as e:
—> 80 try: root = nltk.data.find(‘{}/{}’.format(self.subdir, zip_name))
81 except LookupError: raise e

/anaconda3/lib/python3.6/site-packages/nltk/data.py in find(resource_name, paths)
672 resource_not_found = ‘n%sn%sn%sn’ % (sep, msg, sep)
—> 673 raise LookupError(resource_not_found)
674

LookupError:

Resource stopwords not found.
Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download(‘stopwords’)

Searched in:
— ‘/Users/ksednew/nltk_data’
— ‘/usr/share/nltk_data’
— ‘/usr/local/share/nltk_data’
— ‘/usr/lib/nltk_data’
— ‘/usr/local/lib/nltk_data’
— ‘/anaconda3/nltk_data’
— ‘/anaconda3/lib/nltk_data’

During handling of the above exception, another exception occurred:

LookupError Traceback (most recent call last)
in ()
1 from nltk.corpus import stopwords
—-> 2 stop = stopwords.words(«english»)
3 def stopwords(x):
4 x = re.sub(«[^a-zs]», » «, x.lower())
5 x = [w for w in x.split()

/anaconda3/lib/python3.6/site-packages/nltk/corpus/util.py in getattr(self, attr)
114 raise AttributeError(«LazyCorpusLoader object has no attribute ‘bases‘»)
115
—> 116 self.__load()
117 # This looks circular, but its not, since __load() changes our
118 # class to something new:

/anaconda3/lib/python3.6/site-packages/nltk/corpus/util.py in __load(self)
79 except LookupError as e:
80 try: root = nltk.data.find(‘{}/{}’.format(self.subdir, zip_name))
—> 81 except LookupError: raise e
82
83 # Load the corpus.

/anaconda3/lib/python3.6/site-packages/nltk/corpus/util.py in __load(self)
76 else:
77 try:
—> 78 root = nltk.data.find(‘{}/{}’.format(self.subdir, self.__name))
79 except LookupError as e:
80 try: root = nltk.data.find(‘{}/{}’.format(self.subdir, zip_name))

/anaconda3/lib/python3.6/site-packages/nltk/data.py in find(resource_name, paths)
671 sep = ‘*’ * 70
672 resource_not_found = ‘n%sn%sn%sn’ % (sep, msg, sep)
—> 673 raise LookupError(resource_not_found)
674
675

LookupError:

Resource stopwords not found.
Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download(‘stopwords’)

I have tried placing copies the stopwords folder in various places (where it says it searched) as well as in the corpus folder and still no luck. Any ideas?

We have official support for corpora, but I believe it does not function properly on Python 3.6. I will have to investigate.

Interesting, I thought on my other Mac running the same versions, it had worked, but I may be wrong.

I will add that I was not able to download the Stopwords corpora because of issues involving my company’s proxy:

nltk.download(‘stopwords’)
[nltk_data] Error loading stopwords: <urlopen error [SSL:
[nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed
[nltk_data] (_ssl.c:833)>

If you have ideas for that, maybe that would solve it. I’m wondering if I’m just dropping the manually downloaded version in the wrong place.

Are there any updates to this? It seems no one every commented as to whether this is a problem with the third-party’s support of Python 3.6 or something else.

Further, it sounds like this is a problem on your local machine @ksednew and I’m not certain how this is relevant to the buildpack.

Hemants-MacBook-Pro:TextSummarizer khemant$ python2
Python 2.7.15 (v2.7.15:ca079a3ea3, Apr 29 2018, 20:59:26)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type «help», «copyright», «credits» or «license» for more information.

import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.corpus import stopwords
stop = set(stopwords.words(‘english’))
Traceback (most recent call last):
File «», line 1, in
File «/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/corpus/util.py», line 116, in getattr
self.__load()
File «/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk/corpus/util.py», line 81, in __load
except LookupError: raise e
LookupError:

Resource stopwords not found.
Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download(‘stopwords’)

Searched in:
— ‘/Users/khemant/nltk_data’
— ‘/usr/share/nltk_data’
— ‘/usr/local/share/nltk_data’
— ‘/usr/lib/nltk_data’
— ‘/usr/local/lib/nltk_data’
— ‘/Library/Frameworks/Python.framework/Versions/2.7/nltk_data’
— ‘/Library/Frameworks/Python.framework/Versions/2.7/share/nltk_data’
— ‘/Library/Frameworks/Python.framework/Versions/2.7/lib/nltk_data’

nltk.download(‘stopwords’)
[nltk_data] Error loading stopwords: <urlopen error [SSL:
[nltk_data] CERTIFICATE_VERIFY_FAILED] certificate verify failed
[nltk_data] (_ssl.c:726)>
False

This is really surprising.

@khemanta this is not running on Heroku. It seems this is an issue with your local installations. I can not help you with this but I suspect folks on StackOverflow can. Cheers

@ksednew Try using averaged_perceptron_tagger module from the nltk.download() method.
Also do restart the IDE after unzipping completes. and then run your code base.

From comments, this sounds like problems in local instances, which the buildpack can’t address. Closing the issue.

I have the same exact error with Python 3.6; i download the resources with nltk.download method. Then when i deploy get the following errors:


2019-01-15T05:23:23.773231+00:00 app[web.1]: **********************************************************************

2019-01-15T05:23:23.773232+00:00 app[web.1]:   Resource �[93mstopwords�[0m not found.

2019-01-15T05:23:23.773234+00:00 app[web.1]:   Please use the NLTK Downloader to obtain the resource:

2019-01-15T05:23:23.773236+00:00 app[web.1]: 

2019-01-15T05:23:23.773237+00:00 app[web.1]:   �[31m>>> import nltk

2019-01-15T05:23:23.773239+00:00 app[web.1]:   >>> nltk.download('stopwords')

2019-01-15T05:23:23.773241+00:00 app[web.1]:   �[0m

2019-01-15T05:23:23.773242+00:00 app[web.1]:   Searched in:

2019-01-15T05:23:23.773248+00:00 app[web.1]:     - '/code/nltk_data'

2019-01-15T05:23:23.773249+00:00 app[web.1]:     - '/usr/share/nltk_data'

2019-01-15T05:23:23.773251+00:00 app[web.1]:     - '/usr/local/share/nltk_data'

2019-01-15T05:23:23.773252+00:00 app[web.1]:     - '/usr/lib/nltk_data'

2019-01-15T05:23:23.773254+00:00 app[web.1]:     - '/usr/local/lib/nltk_data'

2019-01-15T05:23:23.773255+00:00 app[web.1]:     - '/usr/local/nltk_data'

2019-01-15T05:23:23.773257+00:00 app[web.1]:     - '/usr/local/share/nltk_data'

2019-01-15T05:23:23.773259+00:00 app[web.1]:     - '/usr/local/lib/nltk_data'

2019-01-15T05:23:23.773260+00:00 app[web.1]: **********************************************************************

2019-01-15T05:23:23.773261+00:00 app[web.1]:

@ksednew did you get any help with issue, I’m having the same with my linux os

Is anyone experiencing this when run on Heroku, or only locally?

I’m experiencing this when running on heroku only. Works fine in my local.

@raheebashraf Hi! Please could you open a new issue with steps to reproduce?

Experienced same error while working on google colab

@Jheel-patel Hi! Could you open a new issue with steps to reproduce?

I am using below code to use stopwords through jupyter hub, I have hosted jupyter hub on AWS DLAMI Linux server.

python3 -m nltk.downloader stopwords
python3 -m nltk.downloader words
python3 -m nltk.downloader punkt
from nltk.corpus import words
from nltk.corpus import stopwords

python3

from nltk.corpus import stopwords
stop_words = set(stopwords.words(«english»))
print(stop_words)
This works fine while running in python terminal.

But when I try below in Jupyternotebook its failing with error. Resource stopwords not found.
Please use the NLTK Downloader to obtain the resource:

$python3

import nltk
nltk.download(‘stopwords’) [nltk_data] Downloading package stopwords to /root/nltk_data… [nltk_data] Package stopwords is already up-to-date!

Well, I still have the same problem. But one way to work around this was to download it to my local machine and then via Dockerfile copy that to my container.
This solved the problem to Heroku.

Regarding the problem, I suspect that in the pushing process, the content is downloaded and saved in a different path. That’s the reason why the container couldn’t find the nltk_data.

I will investigate this and if I found a better solution I will update it.

Experienced same error while working on google colab

run the command «nltk.download(‘stopwords’)»in a separate cell just above the cell in which error occured.. i was facing the same issue in the colab but its running now.. after doing this.

Experienced same error while working on google colab

run the command «nltk.download(‘stopwords’)»in a separate cell just above the cell in which error occured.. i was facing the same issue in the colab but its running now.. after doing this.

it is working thank you.

i also facing the same issue if any one know the solution please let me know

LookupError Traceback (most recent call last)
File ~anaconda3libsite-packagesnltkcorpusutil.py:84, in LazyCorpusLoader.__load(self)
83 try:
—> 84 root = nltk.data.find(f»{self.subdir}/{zip_name}»)
85 except LookupError:

File ~anaconda3libsite-packagesnltkdata.py:583, in find(resource_name, paths)
582 resource_not_found = f»n{sep}n{msg}n{sep}n»
—> 583 raise LookupError(resource_not_found)

LookupError:

Resource stopwords not found.
Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download(‘stopwords’)

For more information see: https://www.nltk.org/data.html

Attempted to load corpora/stopwords.zip/stopwords/

Searched in:
— ‘C:UsersAbhishek Pandey/nltk_data’
— ‘C:UsersAbhishek Pandeyanaconda3nltk_data’
— ‘C:UsersAbhishek Pandeyanaconda3sharenltk_data’
— ‘C:UsersAbhishek Pandeyanaconda3libnltk_data’
— ‘C:UsersAbhishek PandeyAppDataRoamingnltk_data’
— ‘C:nltk_data’
— ‘D:nltk_data’
— ‘E:nltk_data’
— ‘path_to_nltk_data’

During handling of the above exception, another exception occurred:

LookupError Traceback (most recent call last)
Input In [20], in <cell line: 1>()
—-> 1 sw = stopwords.words(‘english’)

File ~anaconda3libsite-packagesnltkcorpusutil.py:121, in LazyCorpusLoader.getattr(self, attr)
118 if attr == «bases«:
119 raise AttributeError(«LazyCorpusLoader object has no attribute ‘bases‘»)
—> 121 self.__load()
122 # This looks circular, but its not, since __load() changes our
123 # class to something new:
124 return getattr(self, attr)

File ~anaconda3libsite-packagesnltkcorpusutil.py:86, in LazyCorpusLoader.__load(self)
84 root = nltk.data.find(f»{self.subdir}/{zip_name}»)
85 except LookupError:
—> 86 raise e
88 # Load the corpus.
89 corpus = self.__reader_cls(root, *self.__args, **self.__kwargs)

File ~anaconda3libsite-packagesnltkcorpusutil.py:81, in LazyCorpusLoader.__load(self)
79 else:
80 try:
—> 81 root = nltk.data.find(f»{self.subdir}/{self.__name}»)
82 except LookupError as e:
83 try:

File ~anaconda3libsite-packagesnltkdata.py:583, in find(resource_name, paths)
581 sep = «*» * 70
582 resource_not_found = f»n{sep}n{msg}n{sep}n»
—> 583 raise LookupError(resource_not_found)

LookupError:

Resource stopwords not found.
Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download(‘stopwords’)

For more information see: https://www.nltk.org/data.html

Attempted to load corpora/stopwords

This buildpack does not use Anaconda, and does not run on Windows — so has nothing to do with the more recent issues posted to this thread.

If you’re having issues with NLTK or Anaconda, please first read their docs and failing that, follow any support/issue reporting processes they document instead:
https://www.nltk.org/
https://www.anaconda.com/

If someone has an issue with using NLTK on Heroku during a Heroku build (not locally), please open a support ticket (https://help.heroku.com).

I’m locking this issue now, since otherwise people finding this thread via search engines are just going to keep commenting here even though it has nothing to do with their problem.

heroku

locked as spam and limited conversation to collaborators

Jan 27, 2023

Источник

Update

As Kenneth Reitz pointed out, a much simpler solution has been added to the heroku-python-buildpack. Add a nltk.txt file to your root directory and list your corpora inside. See https://devcenter.heroku.com/articles/python-nltk for details.

Original Answer

Here’s a cleaner solution that allows you to install the NLTK data directly on Heroku without adding it to your git repo.

I used similar steps to install Textblob on Heroku, which uses NLTK as a dependency. I’ve made some minor adjustments to my original code in steps 3 and 4 that should work for an NLTK only installation.

The default heroku buildpack includes a post_compile step that runs after all of the default build steps have been completed:

# post_compile#!/usr/bin/env bashif [ -f bin/post_compile ]; then    echo "-----> Running post-compile hook"    chmod +x bin/post_compile    sub-env bin/post_compilefi

As you can see, it looks in your project directory for your own post_compile file in the bin directory, and it runs it if it exists. You can use this hook to install the nltk data.

Create the bin directory in the root of your local project.

Add your own post_compile file to the bin directory.

# bin/post_compile#!/usr/bin/env bashif [ -f bin/install_nltk_data ]; then    echo "-----> Running install_nltk_data"    chmod +x bin/install_nltk_data    bin/install_nltk_datafiecho "-----> Post-compile done"

Add your own install_nltk_data file to the bin directory.

# bin/install_nltk_data#!/usr/bin/env bashsource $BIN_DIR/utilsecho "-----> Starting nltk data installation"# Assumes NLTK_DATA environment variable is already set# $ heroku config:set NLTK_DATA='/app/nltk_data'# Install the nltk data# NOTE: The following command installs the stopwords corpora, # so you may want to change for your specific needs.  # See http://www.nltk.org/data.htmlpython -m nltk.downloader stopwords# If using Textblob, use this instead:# python -m textblob.download_corpora lite# Open the NLTK_DATA directorycd ${NLTK_DATA}# Delete all of the zip filesfind . -name "*.zip" -type f -deleteecho "-----> Finished nltk data installation"

Add nltk to your requirements.txt file (Or textblob if you are using Textblob).
Commit all of these changes to your repo.
Set the NLTK_DATA environment variable on your heroku app.
```
$ heroku config:set NLTK_DATA='/app/nltk_data'
```
Deploy to Heroku. You will see the post_compile step trigger at the end of the deployment, followed by the nltk download.

I hope you found this helpful! Enjoy!

Источник

I am trying to run a webapp on Heroku using Flask. The webapp is programmed in Python with the NLTK (Natural Language Toolkit library).

One of the file has the following header:

import nltk, json, operator
from nltk.corpus import stopwords 
from nltk.tokenize import RegexpTokenizer

When the webpage with the stopwords code is called, it produces the following error:

LookupError: 
**********************************************************************
  Resource 'corpora/stopwords' not found.  Please use the NLTK  
  Downloader to obtain the resource:  >>> nltk.download()  
  Searched in:  
    - '/app/nltk_data'  
    - '/usr/share/nltk_data'  
    - '/usr/local/share/nltk_data'  
    - '/usr/lib/nltk_data'  
    - '/usr/local/lib/nltk_data'  
**********************************************************************

The exact code used:

#remove punctuation  
toker = RegexpTokenizer(r'((?<=[^ws])w(?=[^ws])|(W))+', gaps=True) 
data = toker.tokenize(data)  

#remove stop words and digits 
stopword = stopwords.words('english')  
data = [w for w in data if w not in stopword and not w.isdigit()]

The webapp on Heroku doesn’t produce the Lookup error when stopword = stopwords.words('english') is commented out.

The code runs without a glitch on my local computer. I have have installed the required libraries on my computer using

pip install requirements.txt

The virtual environment provided by Heroku was running when I tested the code on my computer.

I have also tried the NLTK provided by two different sources, but the LookupError is still there. The two sources I used are:
http://pypi.python.org/packages/source/n/nltk/nltk-2.0.1rc4.zip
https://github.com/nltk/nltk.git

The problem is that the corpus (‘stopwords’ in this case) doesn’t get uploaded to Heroku. Your code works on your local machine because it already has the NLTK corpus. Please follow these steps to solve the issue.

Create a new directory in your project (let’s call it ‘nltk_data’)
Download the NLTK corpus in that directory. You will have to configure that during the download.
Tell nltk to look for this particular path. Just add nltk.data.path.append('path_to_nltk_data') to the Python file that’s actually using nltk.
Now push the app to Heroku.

Hope that solves the problem. Worked for me!

Update

Original Answer

Here’s a cleaner solution that allows you to install the NLTK data directly on Heroku without adding it to your git repo.

The default heroku buildpack includes a post_compile step that runs after all of the default build steps have been completed:

# post_compile
#!/usr/bin/env bash

if [ -f bin/post_compile ]; then
    echo "-----> Running post-compile hook"
    chmod +x bin/post_compile
    sub-env bin/post_compile
fi

As you can see, it looks in your project directory for your own post_compile file in the bin directory, and it runs it if it exists. You can use this hook to install the nltk data.

Create the bin directory in the root of your local project.
Add your own post_compile file to the bin directory.
```
# bin/post_compile
#!/usr/bin/env bash

if [ -f bin/install_nltk_data ]; then
    echo "-----> Running install_nltk_data"
    chmod +x bin/install_nltk_data
    bin/install_nltk_data
fi

echo "-----> Post-compile done"
```

Add your own install_nltk_data file to the bin directory.

# bin/install_nltk_data
#!/usr/bin/env bash

source $BIN_DIR/utils

echo "-----> Starting nltk data installation"

# Assumes NLTK_DATA environment variable is already set
# $ heroku config:set NLTK_DATA='/app/nltk_data'

# Install the nltk data
# NOTE: The following command installs the stopwords corpora, 
# so you may want to change for your specific needs.  
# See http://www.nltk.org/data.html
python -m nltk.downloader stopwords

# If using Textblob, use this instead:
# python -m textblob.download_corpora lite

# Open the NLTK_DATA directory
cd ${NLTK_DATA}

# Delete all of the zip files
find . -name "*.zip" -type f -delete

echo "-----> Finished nltk data installation"

Add nltk to your requirements.txt file (Or textblob if you are using Textblob).
Commit all of these changes to your repo.
Set the NLTK_DATA environment variable on your heroku app.
```
$ heroku config:set NLTK_DATA='/app/nltk_data'
```
Deploy to Heroku. You will see the post_compile step trigger at the end of the deployment, followed by the nltk download.

I hope you found this helpful! Enjoy!

Источник

Cannot import NLTK stopwords after install #685

Comments

i also facing the same issue if any one know the solution please let me know

Update

Original Answer

Update

Original Answer

Читайте также:

Cannot import NLTK stopwords after install

#685