This function finds bam and corresponding reference files in a given directory, and groups them by a common sample ID as well as by an individual ID.

phsc.find.bam.and.references(data.dir,
  regex.person = "^([A-Z0-9]+-[A-Z0-9]+)-.*$", regex.bam = "^(.*)\\.bam$",
  regex.ref = "^(.*)_ref\\.fasta$", verbose = 1)

Arguments

data.dir

Full path of data directory

regex.person

Regular expression with one set of round brackets, which identifies the person ID in the file name of bams and references

regex.bam

Regular expression that identifies bam files, with one set of round brackets that identifies the sample ID.

regex.ref

Regular expression that identifies ref files, with one set of round brackets that identifies the sample ID.

Value

data.table with rows 'IND' (individual identifier), 'SAMPLE' (sample identifier), 'BAM' (bam file), and 'REF' (reference file).