python - Filtering a dataframe based on a regex -
say have dataframe my_df column 'brand' , drop rows brand either toyota or bmw . i thought following it: my_regex = re.compile('^(bmw$|toyota$).*$') my_function = lambda x: my_regex.match(x.lower()) my_df[~df['brand'].apply(my_function)] but error: valueerror: cannot index vector containing na / nan values why? how can filter dataframe using regex? i think re.match returns none when there no match , breaks indexing; below alternative solution using pandas vectorized string methods ; note pandas string methods can handle null values: >>> df = pd.dataframe( {'brand':['bmw', 'ford', np.nan, none, 'toyota', 'audi']}) >>> df brand 0 bmw 1 ford 2 nan 3 none 4 toyota 5 audi [6 rows x 1 columns] >>> idx = df.brand.str.contains('^bmw$|^toyota$', flags=re.ignorecase, regex=true, na=false) >>> idx 0 true 1 false 2 ...