How to use regex to parse a S3 bucket list of files - Python -
i have following method:
def scan_s3dir(dirname): try: cmd = "s3cmd ls {s3bucket} --recursive".format(s3bucket=dirname) output = subprocess.check_output([cmd], stdin=none, shell=true) #s3://dgsecure/test_data/ regex = "dgsecure/test_data/[^/]*/(\s+)*" installers = re.findall(regex, output) print installers except exception, e: print e sys.exit(2)
when execute s3cmd ls /path/to/bucket --recursive
get:
2014-02-14 02:21 0 s3://path/to/bucket/ 2014-02-14 17:32 236 s3://path/to/bucket/foo.txt 2014-02-26 23:31 6035 s3://path/to/bucket/bar.txt 2014-02-14 22:17 2960 s3://path/to/bucket/baz.txt
from regular expression, want produce list files, including subdir
present in //path/to/bucket/
example like:
s3://path/to/bucket/hello/world.txt
the output have returned is:
['s3://path/to/bucket/foo.txt', 's3://path/to/bucket/bar.txt', 's3:////path/to/bucket/baz.txt']
what missing in regular expression?
try running command :
s3cmd ls {s3bucket} --recursive | tr -s ' ' | cut -d " " -f 4
Comments
Post a Comment