Skip to main content

How do I select a subset of a table?

One of the most common operations in data analysis is selecting a subset of your data. TinyFrameJS provides several methods for filtering rows and selecting columns.

Filtering Methods in TinyFrameJS

TinyFrameJS provides numerous methods for filtering and selecting data, suitable for different programming styles. The library is designed to be as flexible as JavaScript itself, while offering syntax familiar to users of Pandas, SQL, and other data analysis systems.

Selecting Columns

Using select(columns, options)

The select() method allows you to choose specific columns from a DataFrame:

import { DataFrame } from 'tinyframejs';

const df = new DataFrame([
{name: 'Alice', age: 25, city: 'New York', salary: 70000},
{name: 'Bob', age: 30, city: 'San Francisco', salary: 85000},
{name: 'Charlie', age: 35, city: 'Chicago', salary: 90000}
]);

// Select specific columns
const nameAndAge = df.select(['name', 'age']);
nameAndAge.print();

// Select columns without automatic output
df.select(['name', 'age'], { print: false });

Output:

┌───────┬─────────┬─────┐
│ index │ name │ age │
├───────┼─────────┼─────┤
│ 0 │ Alice │ 25 │
│ 1 │ Bob │ 30 │
│ 2 │ Charlie │ 35 │
└───────┴─────────┴─────┘

Using drop(columns, options)

The drop() method allows you to remove specific columns:

// Drop specific columns
const withoutCityAndSalary = df.drop(['city', 'salary']);
withoutCityAndSalary.print();

// Drop columns without automatic output
df.drop(['city'], { print: false });

Output:

┌───────┬─────────┬─────┐
│ index │ name │ age │
├───────┼─────────┼─────┤
│ 0 │ Alice │ 25 │
│ 1 │ Bob │ 30 │
│ 2 │ Charlie │ 35 │
└───────┴─────────┴─────┘

Using selectByPattern(pattern, options)

You can also select columns using patterns:

// Select columns that start with 'a'
const aColumns = df.selectByPattern('^a');
aColumns.print();

// Select columns containing 'a' without automatic output
df.selectByPattern('a', { print: false });

Output:

┌───────┬─────┐
│ index │ age │
├───────┼─────┤
│ 0 │ 25 │
│ 1 │ 30 │
│ 2 │ 35 │
└───────┴─────┘

Filtering Rows

TinyFrameJS offers several ways to filter rows to accommodate different user preferences.

For JavaScript Lovers: filter()

The filter() method uses the standard JavaScript approach with a predicate function, making it familiar to JavaScript developers:

// Filter rows where age is greater than 25
const olderThan25 = df.filter(row => row.age > 25);
olderThan25.print();

// Complex conditions
const olderAndHighPaid = df.filter(row => row.age > 25 && row.salary > 85000);
olderAndHighPaid.print();

// Without automatic output
df.filter(row => row.city === 'New York', { print: false });

Output:

┌───────┬─────────┬─────┬───────────────┬────────┐
│ index │ name │ age │ city │ salary │
├───────┼─────────┼─────┼───────────────┼────────┤
│ 1 │ Bob │ 30 │ San Francisco │ 85000 │
│ 2 │ Charlie │ 35 │ Chicago │ 90000 │
└───────┴─────────┴─────┴───────────────┴────────┘

For Modern JavaScript Lovers: expr$()

The expr$() method uses tagged template literals for an intuitive and expressive syntax:

// Filter rows where age is greater than 30
df.expr$`age > 30`;

// Complex conditions
df.expr$`age > 25 && salary > 80000`;

// String operations
df.expr$`city_includes("Francisco")`;
df.expr$`name_startsWith("A")`;

// Using variables
const minAge = 30;
df.expr$`age >= ${minAge}`;

For SQL Lovers: query()

The query() method allows you to filter rows using a SQL-like syntax:

// Filter using a SQL-like query
const fromNewYork = df.query("city == 'New York'");
fromNewYork.print();

// Complex SQL-like conditions
df.query("age > 30 AND salary > 80000");

Output:

┌───────┬───────┬─────┬──────────┬────────┐
│ index │ name │ age │ city │ salary │
├───────┼───────┼─────┼──────────┼────────┤
│ 0 │ Alice │ 25 │ New York │ 70000 │
└───────┴───────┴─────┴──────────┴────────┘

For Point Filtering: where()

The where() method allows you to filter rows using column-wise conditions:

// Filter using column conditions
const highSalary = df.where('salary', '>', 80000);
highSalary.print();

// Filter rows with a specific city
df.where('city', '==', 'Chicago');

Output:

┌───────┬─────────┬─────┬───────────────┬────────┐
│ index │ name │ age │ city │ salary │
├───────┼─────────┼─────┼───────────────┼────────┤
│ 1 │ Bob │ 30 │ San Francisco │ 85000 │
│ 2 │ Charlie │ 35 │ Chicago │ 90000 │
└───────┴─────────┴─────┴───────────────┴────────┘

You can chain multiple conditions:

// Chain multiple conditions
const filtered = df
.where('age', '>=', 30)
.where('city', '!=', 'Chicago');

filtered.print();

Output:

┌───────┬─────┬─────┬───────────────┬────────┐
│ index │ name│ age │ city │ salary │
├───────┼─────┼─────┼───────────────┼────────┤
│ 1 │ Bob │ 30 │ San Francisco │ 85000 │
└───────┴─────┴─────┴───────────────┴────────┘

Selecting by Index

Using at()

The at() method allows you to select a row by its index:

// Get a single row by index
const firstRow = df.at(0);
console.log(firstRow);
// Output: {name: 'Alice', age: 25, city: 'New York', salary: 70000}

Using iloc(rowIndices, columnIndices, options)

The iloc() method allows you to select rows and columns by their integer positions:

// Select rows 0 and 2, columns 1 and 3
const subset = df.iloc([0, 2], [1, 3]);
subset.print();

// Select first three rows
df.iloc([0, 1, 2]);

// Select without automatic output
df.iloc([0, 1], null, { print: false });

Output:

┌───────┬─────┬────────┐
│ index │ age │ salary │
├───────┼─────┼────────┤
│ 0 │ 25 │ 70000 │
│ 2 │ 35 │ 90000 │
└───────┴─────┴────────┘

Using loc(rowLabels, columnLabels, options)

The loc() method allows you to select rows and columns by their labels:

// Select rows with index 0 and 2, columns 'age' and 'salary'
const subset = df.loc([0, 2], ['age', 'salary']);
subset.print();

// Select rows with specific index values
df.loc([1, 3, 5]);

// Select rows and specific columns
df.loc([1, 3, 5], ['name', 'salary']);

Output:

┌───────┬─────┬────────┐
│ index │ age │ salary │
├───────┼─────┼────────┤
│ 0 │ 25 │ 70000 │
│ 2 │ 35 │ 90000 │
└───────┴─────┴────────┘

Sampling Data

Using head(n, options)

The head() method allows you to get the first N rows of a DataFrame:

// Get the first 5 rows (default)
const firstRows = df.head();
firstRows.print();

// Get the first 3 rows
const firstThreeRows = df.head(3);
firstThreeRows.print();

// Without automatic output
df.head(5, { print: false });

Output:

┌───────┬─────────┬─────┬──────────┬────────┐
│ index │ name │ age │ city │ salary │
├───────┼─────────┼─────┼──────────┼────────┤
│ 0 │ Alice │ 25 │ New York │ 70000 │
│ 1 │ Bob │ 30 │ Boston │ 85000 │
│ 2 │ Charlie │ 35 │ Chicago │ 92000 │
└───────┴─────────┴─────┴──────────┴────────┘

Using tail(n, options)

The tail() method allows you to get the last N rows of a DataFrame:

// Get the last 5 rows (default)
const lastRows = df.tail();
lastRows.print();

// Get the last 2 rows
const lastTwoRows = df.tail(2);
lastTwoRows.print();

Output:

┌───────┬─────┬─────┬─────────┬────────┐
│ index │ name│ age │ city │ salary │
├───────┼─────┼─────┼─────────┼────────┤
│ 8 │ Ivan│ 65 │ Miami │ 88000 │
│ 9 │ Judy│ 70 │ Atlanta │ 82000 │
└───────┴─────┴─────┴─────────┴────────┘

Using sample(n, options)

You can select a random sample of rows:

// Get a random sample of 2 rows
const sample = df.sample(2);
sample.print();

// With seed for reproducibility
df.sample(5, { seed: 123 });

Output (will vary):

┌───────┬─────────┬─────┬───────────────┬────────┐
│ index │ name │ age │ city │ salary │
├───────┼─────────┼─────┼───────────────┼────────┤
│ 0 │ Alice │ 25 │ New York │ 70000 │
│ 2 │ Charlie │ 35 │ Chicago │ 90000 │
└───────┴─────────┴─────┴───────────────┴────────┘

Using stratifiedSample(column, n, options)

You can also perform stratified sampling, which maintains the proportion of values in a specific column:

// Get a stratified sample based on the 'city' column
const stratifiedSample = df.stratifiedSample('city', 0.5);
stratifiedSample.print();

Additional Options

Most filtering methods accept an optional options parameter that allows you to customize the behavior of the method:

// Disable automatic output of results
const filteredData = df.filter(row => row.age > 30, { print: false });

// Later you can manually print the result
filteredData.print();

Available options:

  • print: if false, the result will not be automatically printed to the console (default is true)

Preserving Typed Arrays

TinyFrameJS automatically preserves typed arrays (Float64Array, Int32Array) when creating filtered DataFrames. This ensures efficient work with numerical data:

// Create a DataFrame with typed arrays
const typedDf = new DataFrame({
values: new Float64Array([1.1, 2.2, 3.3, 4.4, 5.5]),
indices: new Int32Array([10, 20, 30, 40, 50])
});

// Filter the data
const filteredTyped = typedDf.filter(row => row.values > 3);

// Result preserves typed arrays
console.log(filteredTyped.columns.values instanceof Float64Array); // true
console.log(filteredTyped.columns.indices instanceof Int32Array); // true

Metadata in Filtering Results

All filtering methods preserve important metadata:

const filtered = df.filter(row => row.age > 30);

// Access metadata
console.log(`Row count: ${filtered.rowCount}`);
console.log(`Column names: ${filtered.columnNames.join(', ')}`);
console.log(`Data types:`, filtered.dtypes);

Method Chaining

TinyFrameJS supports method chaining for convenient data processing:

// Chain operations
const result = df
.filter(row => row.age > 25)
.select(['name', 'salary'])
.sort('salary')
.head(3);

result.print();

Output:

┌───────┬─────────┬────────┐
│ index │ name │ salary │
├───────┼─────────┼────────┤
│ 0 │ Bob │ 85000 │
│ 1 │ Charlie │ 92000 │
│ 2 │ David │ 105000 │
└───────┴─────────┴────────┘

Choosing the Right Filtering Method

TinyFrameJS offers different filtering methods to accommodate various preferences:

  • filter(): For those who prefer standard JavaScript and functional programming

  • expr$(): For those who value modern and expressive JavaScript syntax

  • query(): For those who prefer SQL-like syntax

  • where(): For simple filtering conditions on a single column

The choice of method depends on your preferences and specific task. All methods provide the same functionality but with different syntax.

Next Steps

Now that you know how to select subsets of your data, you can: