python - words frequency using pandas and matplotlib -


how can plot word frequency histogram (for author column)using pandas , matplotlib csv file? csv like: id, author, title, language have more 1 authors in author column separated space

file = 'c:/books.csv' sheet = open(file) df = read_csv(sheet) print df['author'] 

use collections.counter creating histogram data, , follow example given here, i.e.:

from collections import counter import numpy np import matplotlib.pyplot plt import pandas pd  # read csv file, author names , counts. df = pd.read_csv("books.csv", index_col="id") counter = counter(df['author']) author_names = counter.keys() author_counts = counter.values()  # plot histogram using matplotlib bar(). indexes = np.arange(len(author_names)) width = 0.7 plt.bar(indexes, author_counts, width) plt.xticks(indexes + width * 0.5, author_names) plt.show() 

with test file:

$ cat books.csv  id,author,title,language 1,peter,t1,de 2,peter,t2,de 3,bob,t3,en 4,bob,t4,de 5,peter,t5,en 6,marianne,t6,jp 

the code above creates following graph:

enter image description here

edit:

you added secondary condition, author column might contain multiple space-separated names. following code handles this:

from itertools import chain  # read csv file,  df = pd.read_csv("books2.csv", index_col="id") authors_notflat = [a.split() in df['author']] counter = counter(chain.from_iterable(authors_notflat)) print counter 

for example:

$ cat books2.csv  id,author,title,language 1,peter harald,t1,de 2,peter harald,t2,de 3,bob,t3,en 4,bob,t4,de 5,peter,t5,en 6,marianne,t6,jp 

it prints

$ python test.py  counter({'peter': 3, 'bob': 2, 'harald': 2, 'marianne': 1}) 

note code works because strings iterable.

this code free of pandas, except csv-parsing part led dataframe df. if need default plot styling of pandas, there suggestion in mentioned thread.


Comments

Popular posts from this blog

Android layout hidden on keyboard show -

google app engine - 403 Forbidden POST - Flask WTForms -

c - Why would PK11_GenerateRandom() return an error -8023? -