【Python报错处理记录1】用python进行数据分析-movielens数据集

在使用Python进行数据分析时,遇到了一些常见问题,包括安装pandas模块,降低numpy版本,处理windows路径,应对工具包更新导致的方法变更,以及在Jupyter中配置多个Python内核。此外,解决了NameError、ParserError和读写csv时的分隔符问题。通过设置错误处理参数和调整分隔符,成功进行数据处理。

1.解决:安装pandas模块

ImportError                               Traceback (most recent call last)
<ipython-input-1-7dd3504c366f> in <module>()
----> 1 import pandas as pd

ImportError: No module named pandas

2.降低numpy包版本(近期更新)

c:\python27\lib\site-packages\pandas\_libs\__init__.py:4: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from .tslib import iNaT, NaT, Timestamp, Timedelta, OutOfBoundsDatetime
c:\python27\lib\site-packages\pandas\__init__.py:26: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs import (hashtable as _hashtable,
c:\python27\lib\site-packages\pandas\core\dtypes\common.py:6: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs import algos, lib
c:\python27\lib\site-packages\pandas\core\util\hashing.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs import hashing, tslib
c:\python27\lib\site-packages\pandas\core\indexes\base.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs import (lib, index as libindex, tslib as libts,
c:\python27\lib\site-packages\pandas\tseries\offsets.py:21: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  import pandas._libs.tslibs.offsets as liboffsets
c:\python27\lib\site-packages\pandas\core\ops.py:16: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs import algos as libalgos, ops as libops
c:\python27\lib\site-packages\pandas\core\indexes\interval.py:32: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs.interval import (
c:\python27\lib\site-packages\pandas\core\internals.py:14: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs import internals as libinternals
c:\python27\lib\site-packages\pandas\core\sparse\array.py:33: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  import pandas._libs.sparse as splib
c:\python27\lib\site-packages\pandas\core\window.py:36: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  import pandas._libs.window as _window
c:\python27\lib\site-packages\pandas\core\groupby\groupby.py:68: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs import (lib, reduction,
c:\python27\lib\site-packages\pandas\core\reshape\reshape.py:30: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs import algos as _algos, reshape as _reshape
c:\python27\lib\site-packages\pandas\io\parsers.py:45: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  import pandas._libs.parsers as parsers
c:\python27\lib\site-packages\pandas\io\pytables.py:50: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected zd, got zd
  from pandas._libs import algos, lib, writers as libwriters

pip show numpy 查看numpy版本;

 

pip install -U numpy==1.12.0,降低numpy的版本

3.windows下路径表示应改成绝对路径,转义无歧义,路径用/分隔(系统属性默认为\)或\\.

IOError                                   Traceback (most recent call last)
<ipython-input-2-d37f255501ae> in <module>()
----> 1 ratings=pd.read_csv("C:\data\datasets\movielens\ml-20m\ratings.csv",header=0)

c:\python27\lib\site-packages\pandas\io\parsers.pyc in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, doublequote, delim_whitespace, low_memory, memory_map, float_precision)
    676                     skip_blank_lines=skip_blank_lines)
    677 
--> 678         return _read(filepath_or_buffer, kwds)
    679 
    680     parser_f.__name__ = name

c:\python27\lib\site-packages\pandas\io\parsers.pyc in _read(filepath_or_buffer, kwds)
    438 
    439     # Create the parser.
--> 440     parser = TextFileReader(filepath_or_buffer, **kwds)
    441 
    442     if chunksize or iterator:

c:\python27\lib\site-packages\pandas\io\parsers.pyc in __init__(self, f, engine, **kwds)
    785             self.options['has_index_names'] = kwds['has_index_names']
    786 
--> 787         self._make_engine(self.engine)
    788 
    789     def close(self):

c:\python27\lib\site-packages\pandas\io\parsers.pyc in _make_engine(self, engine)
   1012     def _make_engine(self, engine='c'):
   1013         if engine == 'c':
-> 1014             self._engine = CParserWrapper(self.f, **self.options)
   1015         else:
   1016             if engine == 'python':

c:\python27\lib\site-packages\pandas\io\parsers.pyc in __init__(self, src, **kwds)
   1706         kwds['usecols'] = self.usecols
   1707 
-> 1708         self._reader = parsers.TextReader(src, **kwds)
   1709 
   1710         passed_names = self.names is None

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()

atings.csv does not existC:\data\datasets\movielens\ml-20m

4.由于工具包更新后,部分方法失效,需要用新的方法。

e.g有order属性,.用sort_value方法

AttributeError                            Traceback (most recent call last)
<ipython-input-12-dec372faf98a> in <module>()
      1 #降序排列
----> 2 rating_by_title.order(ascending=False)[:10]

c:\python27\lib\site-packages\pandas\core\generic.pyc in __getattr__(self, name)
   4374             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   4375                 return self[name]
-> 4376             return object.__getattribute__(self, name)
   4377 
   4378     def __setattr__(self, name, value):

AttributeError: 'Series' object has no attribute 'order'

5.在jupyter中装python2和python3两个内核

Jupyter Notebook添加多个python内核

  • 查看jupyter notebook内核列表

jupyter kernelspec list

  • 安装或删除其他内核

ipython kernel install --name python2   #安装python2  

jupyter kernelspec uninstall python2   #删除python2  

6.NameError: name 'raw_input' is not defined

raw_input() was renamed to input()

python2中的raw_input方法在python3中为input()

7.安装scraoy难,用anaconda安装

8.用chrome下载文件速度慢(一般为国外资料),找国内镜像或者百度云

9.pandas读取csv处理时报错:ParserError: Error tokenizing data. C error: Expected 1 fields in line 29, saw 2

文件默认的是以逗号为分隔符,但是中文中逗号的使用率很高,爬取中文数据时就容易造成混淆,所以使用pandas写入csv时可以设置参数 sep=’\t’ ,即以tab为分隔符写入。毕竟tab在中文习惯里用的很少嘛。 
那这样在后面读取csv进行数据处理时,一定记得加上一个参数delimiter:

delimiter="\t"
#这样读入:
df=pd.read_csv('path',delimiter="\t")
  • 1
  • 2
  • 3
  • 4

不然你把dataframe打印出来看看就是挤在一团,没有分列的,后面对csv进行处理的时候还可能会出现标题那样的错误

ParserError: Error tokenizing data. C error: Expected 1 fields in line 29, saw 2

这个方法可能不能成功列表,用下面参数较好

df_status0_invertory = pd.read_csv(inventory_dir + inventory_status0_file_name, delimiter=',', header=None,
                                   error_bad_lines=False)

解决方法:

加入参数error_bad_lines=False

6.Anaconda便捷安装

https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值