【Python报错处理记录1】用python进行数据分析-movielens数据集

最新推荐文章于 2026-06-02 04:45:39 发布

原创最新推荐文章于 2026-06-02 04:45:39 发布 · 3.1k 阅读

1 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#Python

Python 专栏收录该内容

5 篇文章

订阅专栏

在使用Python进行数据分析时，遇到了一些常见问题，包括安装pandas模块，降低numpy版本，处理windows路径，应对工具包更新导致的方法变更，以及在Jupyter中配置多个Python内核。此外，解决了NameError、ParserError和读写csv时的分隔符问题。通过设置错误处理参数和调整分隔符，成功进行数据处理。

1.解决：安装pandas模块

ImportError                               Traceback (most recent call last)
<ipython-input-1-7dd3504c366f> in <module>()
----> 1 import pandas as pd

ImportError: No module named pandas

2.降低numpy包版本（近期更新）

c:\python27\lib\site-packages\pandas\_libs\__init__.py:4: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from .tslib import iNaT, NaT, Timestamp, Timedelta, OutOfBoundsDatetime
c:\python27\lib\site-packages\pandas\__init__.py:26: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs import (hashtable as _hashtable,
c:\python27\lib\site-packages\pandas\core\dtypes\common.py:6: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs import algos, lib
c:\python27\lib\site-packages\pandas\core\util\hashing.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs import hashing, tslib
c:\python27\lib\site-packages\pandas\core\indexes\base.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs import (lib, index as libindex, tslib as libts,
c:\python27\lib\site-packages\pandas\tseries\offsets.py:21: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  import pandas._libs.tslibs.offsets as liboffsets
c:\python27\lib\site-packages\pandas\core\ops.py:16: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs import algos as libalgos, ops as libops
c:\python27\lib\site-packages\pandas\core\indexes\interval.py:32: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs.interval import (
c:\python27\lib\site-packages\pandas\core\internals.py:14: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs import internals as libinternals
c:\python27\lib\site-packages\pandas\core\sparse\array.py:33: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  import pandas._libs.sparse as splib
c:\python27\lib\site-packages\pandas\core\window.py:36: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  import pandas._libs.window as _window
c:\python27\lib\site-packages\pandas\core\groupby\groupby.py:68: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs import (lib, reduction,
c:\python27\lib\site-packages\pandas\core\reshape\reshape.py:30: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs import algos as _algos, reshape as _reshape
c:\python27\lib\site-packages\pandas\io\parsers.py:45: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  import pandas._libs.parsers as parsers
c:\python27\lib\site-packages\pandas\io\pytables.py:50: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs import algos, lib, writers as libwriters

pip show numpy 查看numpy版本;

pip install -U numpy==1.12.0，降低numpy的版本

3.windows下路径表示应改成绝对路径，转义无歧义，路径用/分隔(系统属性默认为\)或\\.

IOError                                   Traceback (most recent call last)
<ipython-input-2-d37f255501ae> in <module>()
----> 1 ratings=pd.read_csv("C:\data\datasets\movielens\ml-20m\ratings.csv",header=0)

c:\python27\lib\site-packages\pandas\io\parsers.pyc in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
    676                     skip_blank_lines=skip_blank_lines)
    677 
--> 678         return _read(filepath_or_buffer, kwds)
    679 
    680     parser_f.__name__ = name

c:\python27\lib\site-packages\pandas\io\parsers.pyc in _read(filepath_or_buffer, kwds)
    438 
    439     # Create the parser.
--> 440     parser = TextFileReader(filepath_or_buffer, **kwds)
    441 
    442     if chunksize or iterator:

c:\python27\lib\site-packages\pandas\io\parsers.pyc in __init__(self, f, engine, **kwds)
    785             self.options['has_index_names'] = kwds['has_index_names']
    786 
--> 787         self._make_engine(self.engine)
    788 
    789     def close(self):

c:\python27\lib\site-packages\pandas\io\parsers.pyc in _make_engine(self, engine)
   1012     def _make_engine(self, engine='c'):
   1013         if engine == 'c':
-> 1014             self._engine = CParserWrapper(self.f, **self.options)
   1015         else:
   1016             if engine == 'python':

c:\python27\lib\site-packages\pandas\io\parsers.pyc in __init__(self, src, **kwds)
   1706         kwds['usecols'] = self.usecols
   1707 
-> 1708         self._reader = parsers.TextReader(src, **kwds)
   1709 
   1710         passed_names = self.names is None

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

atings.csv does not existC:\data\datasets\movielens\ml-20m

4.由于工具包更新后，部分方法失效，需要用新的方法。

e.g有order属性,.用sort_value方法

AttributeError                            Traceback (most recent call last)
<ipython-input-12-dec372faf98a> in <module>()
      1 #降序排列
----> 2 rating_by_title.order(ascending=False)[:10]

c:\python27\lib\site-packages\pandas\core\generic.pyc in __getattr__(self, name)
   4374             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   4375                 return self[name]
-> 4376             return object.__getattribute__(self, name)
   4377 
   4378     def __setattr__(self, name, value):

AttributeError: 'Series' object has no attribute 'order'

5.在jupyter中装python2和python3两个内核

为Jupyter Notebook添加多个python内核

查看jupyter notebook内核列表

jupyter kernelspec list

安装或删除其他内核

ipython kernel install --name python2 #安装python2

jupyter kernelspec uninstall python2 #删除python2

6.NameError: name 'raw_input' is not defined

raw_input() was renamed to input()

python2中的raw_input方法在python3中为input()

7.安装scraoy难，用anaconda安装

8.用chrome下载文件速度慢（一般为国外资料），找国内镜像或者百度云

9.pandas读取csv处理时报错：ParserError: Error tokenizing data. C error: Expected 1 fields in line 29, saw 2

文件默认的是以逗号为分隔符，但是中文中逗号的使用率很高，爬取中文数据时就容易造成混淆，所以使用pandas写入csv时可以设置参数 sep=’\t’ ，即以tab为分隔符写入。毕竟tab在中文习惯里用的很少嘛。
那这样在后面读取csv进行数据处理时，一定记得加上一个参数delimiter：

delimiter="\t"
#这样读入：
df=pd.read_csv('path',delimiter="\t")

不然你把dataframe打印出来看看就是挤在一团，没有分列的，后面对csv进行处理的时候还可能会出现标题那样的错误

ParserError: Error tokenizing data. C error: Expected 1 fields in line 29, saw 2

这个方法可能不能成功列表，用下面参数较好

df_status0_invertory = pd.read_csv(inventory_dir + inventory_status0_file_name, delimiter=',', header=None,
                                   error_bad_lines=False)

解决方法：

加入参数error_bad_lines=False

6.Anaconda便捷安装

https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/