file_io

Submodules

Classes

CSVDataReader

A class to read in object data files stored as CSV or whitespace

HDF5DataReader

A class to read in object data files stored as HDF5 files.

Obs80DataReader

A class to read in object data files stored in the MPC's obs80

Package Contents

class CSVDataReader(filename, sep='csv', **kwargs)[source]

Bases: layup.utilities.file_io.ObjectDataReader.ObjectDataReader

A class to read in object data files stored as CSV or whitespace separated values.

Note that we require the header line to be the first line of the file

filename
sep = 'csv'
data_separator = ','
num_pre_header_lines = 0
header_row_index = 0
obj_id_table = None
obj_id_counts
get_reader_info()[source]

Return a string identifying the current reader name and input information (for logging and output).

Returns:

name – The reader information.

Return type:

string

get_row_count()[source]

Return the total number of rows in the [C|P|W]SV file.

Returns:

Total rows in the first key of the input [C|P|W]SV file.

Return type:

int

_validate_header_line()[source]

Read and validate the header line (first line of the file)

_check_header_line(header_line)[source]

Check that a given header line is valid and exit if it is invalid.

Parameters:

header_line (str) – The proposed header line.

_get_fixed_dtypes()[source]

Get a dictionary of the fixed dtypes for the columns in the CSV file.

Returns:

fixed_dtypes – A dictionary of the fixed dtypes for the columns in the CSV file. The keys are the column names and the values are assigned dtype.

Return type:

dict

_read_rows_internal(block_start=0, block_size=None, **kwargs)[source]

Reads in a set number of rows from the input.

Parameters:
  • block_start (integer, optional) – The 0-indexed row number from which to start reading the data. For example in a CSV file block_start=2 would skip the first two lines after the header and return data starting on row=2. Default =0

  • block_size (integer, optional, default=None) – The number of rows to read in. Use block_size=None to read in all available data. default =None

  • **kwargs (dictionary, optional) – Extra arguments

Returns:

res – The data read in from the file.

Return type:

numpy structured array

_build_id_map()[source]

Builds a table of just the object IDs

_read_objects_internal(obj_ids, **kwargs)[source]

Read in a chunk of data for given object IDs.

Parameters:
  • obj_ids (list) – A list of object IDs to use.

  • **kwargs (dictionary, optional) – Extra arguments

Returns:

res – The data read in from the file.

Return type:

numpy structured array

_process_and_validate_input_table(input_table, **kwargs)[source]

Perform any input-specific processing and validation on the input table. Modifies the input table in place.

Notes

The base implementation includes filtering that is common to most input types. Subclasses should call super.process_and_validate() to ensure that the ancestor’s validation is also applied.

Parameters:
  • input_table (numpy structured array) – A loaded table.

  • **kwargs (dictionary, optional) – Extra arguments

Returns:

input_table – Returns the input table modified in-place.

Return type:

numpy structured array

class HDF5DataReader(filename, **kwargs)[source]

Bases: layup.utilities.file_io.ObjectDataReader.ObjectDataReader

A class to read in object data files stored as HDF5 files.

filename
obj_id_table = None
obj_id_counts
get_reader_info()[source]

Return a string identifying the current reader name and input information (for logging and output).

Returns:

name – The reader information.

Return type:

string

get_row_count()[source]

Return the total number of rows in the first key of the input HDF5 file.

Returns:

Total rows in the first key of the input HDF5 file.

Return type:

int

_read_rows_internal(block_start=0, block_size=None, **kwargs)[source]

Reads in a set number of rows from the input.

Parameters:
  • block_start (integer, optional) – The 0-indexed row number from which to start reading the data. For example in a CSV file block_start=2 would skip the first two lines after the header and return data starting on row=2. Default=0

  • block_size (integer, optional) – the number of rows to read in. Use block_size=None to read in all available data. Default = None

  • **kwargs (dictionary, optional) – Extra arguments

Returns:

res_df – Dataframe of the object data.

Return type:

pandas dataframe

_build_id_map()[source]

Builds a table of just the object IDs

_read_objects_internal(obj_ids, **kwargs)[source]

Read in a chunk of data for given object IDs.

Parameters:
  • obj_ids (list) – A list of object IDs to use.

  • **kwargs (dictionary, optional) – Extra arguments

Returns:

res_df – The dataframe for the object data.

Return type:

Pandas dataframe

_process_and_validate_input_table(input_table, **kwargs)[source]

Perform any input-specific processing and validation on the input table. Modifies the input dataframe in place.

Notes

The base implementation includes filtering that is common to most input types. Subclasses should call super.process_and_validate() to ensure that the ancestor’s validation is also applied.

Parameters:
  • input_table (pandas dataframe) – A loaded table.

  • **kwargs (dictionary, optional) – Extra arguments

Returns:

input_table – Returns the input dataframe modified in-place.

Return type:

pandas dataframe

class Obs80DataReader(filename, **kwargs)[source]

Bases: layup.utilities.file_io.ObjectDataReader.ObjectDataReader

A class to read in object data files stored in the MPC’s obs80 format.

Note that we will ignore the header lines that might accompany the file.

filename
output_dtype
col_names
obj_id_table = None
obj_id_counts
_is_header_row(line)[source]

Check if the line is a header row.

Parameters:

line (str) – The line to check.

Returns:

True if the line is a header row, False otherwise.

Return type:

bool

get_reader_info()[source]

Return a string identifying the current reader name and input information (for logging and output).

Returns:

name – The reader information.

Return type:

string

get_row_count()[source]

Return the total number of rows in the file.

Note that the obs 80 format allows for two-line rows, so the number of lines used to store the data is not the same as the number of rows.

Returns:

Total rows in the file.

Return type:

int

_read_rows_internal(block_start=0, block_size=None, **kwargs)[source]

Reads in a set number of rows from the input.

Parameters:
  • block_start (integer, optional) – The 0-indexed row number from which to start reading the data. For example in a CSV file block_start=2 would skip the first two lines after the header and return data starting on row=2. Default =0

  • block_size (integer, optional, default=None) – The number of rows to read in. Use block_size=None to read in all available data. default =None

  • **kwargs (dictionary, optional) – Extra arguments

Returns:

res – The data read in from the file.

Return type:

numpy structured array

_build_id_map()[source]

Builds a table of just the object IDs

_read_objects_internal(obj_ids, **kwargs)[source]

Read in a chunk of data for given object IDs.

Parameters:
  • obj_ids (list) – A list of object IDs to use.

  • **kwargs (dictionary, optional) – Extra arguments

Returns:

res – The data read in from the file.

Return type:

numpy structured array

_process_and_validate_input_table(input_table, **kwargs)[source]

Perform any input-specific processing and validation on the input table. Modifies the input table in place.

Notes

The base implementation includes filtering that is common to most input types. Subclasses should call super.process_and_validate() to ensure that the ancestor’s validation is also applied.

Parameters:
  • input_table (numpy structured array) – A loaded table.

  • **kwargs (dictionary, optional) – Extra arguments

Returns:

input_table – Returns the input table modified in-place.

Return type:

numpy structured array

get_obs80_id(line)[source]

Get the object ID from the Obs80 line. Note that we have already confirmed that self.primary_id_column_name is in self.col_names. :param line: The line of obs80 data to extract the object ID from. :type line: str

Returns:

The object ID extracted from the line.

Return type:

str

convert_obs80(line, second_line=None)[source]

Converts a row of obs80 data to a tuple of values. The second line is optional and may contain the observatory position.

Parameters:
  • line (str) – The line of obs80 data to convert.

  • second_line (str, optional) – The optional second line of obs80 data to convert. Default is None.

Returns:

A tuple of values containing the object ID, ISO time, RA in degrees, Dec in degrees, magnitude, filter, observatory code, catalog, program, and observatory position (x, y, z).

Return type:

tuple